Intra-observer and inter-observer errors in CT measurement of torsional profiles of lower limbs: a retrospective comparative study

Background The purpose of this study was to determine errors in measurement of torsional profiles (TP) (torsional femoral angle, torsional tibial angle, and femoral ankle angle) among four orthopedic surgeons, experts, and non-experts in measurement, and the learning curve. Methods Twenty-six lower extremities of 13 patients with spastic diplegia candidates for femoral/tibial derotational osteotomy had preoperative bilateral computer tomography (CT) scan grams to establish the TP. Each measurement was done by four orthopedic surgeons, two experienced clinicians and interpreters of CT imaging and two with limited clinical and imaging assessment experiences. Images were blinded and the surgeons made three determinations at least 5 days apart; the three angles were measured each time for each limb. Intra-observer and inter-observer variability were determined using bias, standard deviation, and interclass correlation coefficient. Results Significant inter-observer variability and bias were noted between experts and non-experts (average variability: ICC experts: 0.88 ± 0.15; ICC non-experts: 0.91 ± 0.09). For non-experts, excessive bias (25° and 14°) was observed. An associated improvement in bias with additional measurement experience indicated a potential significant learning curve for interpreting these studies. Less inter-observer variability was observed between experts. Conclusions Measurement of TP is a reliable tool when used by experienced personnel, and their use as a preoperative tool should be reserved to ones with experience with such image assessments. Non-experts’ measurements produced a weak agreement when compared to experts’.


Background
Lower extremity deformities in cerebral palsy (CP) include increased femoral anteversion and tibial medial or lateral torsion. The delay in normal physiologic resolution of torsion is primarily related to increased motor tone. Persistence of abnormal lower extremity bony torsion in combination with abnormal motor responses leads to dysfunction and clinical impairment. A femoral anteversion angle of >50°correlates well with the gross motor function classification system (GMFCS) score [1][2][3][4][5].
Standard clinical, routine radiographic assessments of lower extremity torsional abnormalities, ultrasound, or fluoroscopy may provide inadequate data to determine preoperative measurements in planned osteotomy correction cases [6][7][8]. Evaluations by computer tomography (CT) determinations for rotational deformities in the lower extremities have been recommended pre-surgery but the accuracy of such assessments has not been consistently documented [9].
To our knowledge, there are no previous reports comparing intra-and inter-observer agreement in measurement of torsional profiles (TP) with CT axial images with different levels of training in orthopedics. The purpose of this study was to assess the intra-and interobserver agreement and accuracy, determine errors between measurements of torsional femoral angle (TFA), torsional tibial angle (TTA), and femoral ankle angle (FAA) of lower limbs provided by expert and nonexpert orthopedic surgeons, and also determine the presence of a learning curve.

Methods
Our study was performed according to the ethical standards of the Declaration of Helsinki (1964) and its later amendments. Acknowledgements by the Hospital's Institutional Review Board were granted on February 2014, although using TP for pre-surgery decision in CP patients is routinely performed.
Inclusion criteria were diplegic patients who were candidates for derotational osteotomy of the lower limbs. All patients had CT scan grams using a multi-director row CT scanner (Somatomensaton Siemens, Germany) with 5.5 mm slice thickness. Patients were supine with the patella directly anterior with the hip and knee in as much extension as possible. No sedation was required for any case. Lower extremity restraints were used on the CT bed to maintain position throughout the study. A radiologist collected all the CT scan grams, concealed the patient's information, and saved the images electronically. Each reviewer using a DICOM Viewer did measurements. Each CT scan was numbered and each observer measured the blinded images on three occasions with at least 5 days between readings. Lines for calculation were drawn on transparent paper during each measurement. The paper was eliminated after every measurement. Line placement was the choice of the individual assessor.
Each measurement was performed by four orthopedic surgeons, two experienced clinicians and interpreters of CT imaging (expert 1 and 2), and two orthopedic surgeons in training; one a first year resident and the other in his final year of the program (non-experts 1 and 2).
The following angles were measured: TFA is the angle formed between a line passing through the center of the femoral neck and a tangent line passing from the distal posterior femoral condyles and represents the angle of femoral version. TTA is the angle formed between a line tangent to the posterior surface of the proximal tibial plateau and a line passing through the mid-point of the tibial and fibular malleoli and represents the angle of tibial version. FAA is formed between a line passing through the center of the femoral neck and a line between the center of the tibial and fibular malleoli and represents the foot progression angle.
Intra-observer variability was done using the Bland-Altam method [10]. For each angle and for each observer, measurements were calculated by the difference from one measurement with respect to the previous measurement, e.g., measurement 1 vs 2, 3 vs 2, and 3 vs 1. Bias was determined for each measurement by taking the average of measured differences for each patient in addition to the standard deviation (SD). A bias of >10°was considered significant for each observer. For each measurement, the number of significant bias was calculated. For correlation of consecutive measurements carried out by the same observer, the interclass correlation coefficient (ICC) was used [11]. ICC is interpreted assessing the range between −1 and 1; agreement is stronger when ICC is equal to 1 or −1 and weak when equal to 0. Measurements were compared for each angle and for each observer for the presence of significant bias and the SD of the observations between measurements 3 vs 2; 3 vs 1 was used to identify presence of a learning curve in determining the data.
Inter-observer variability was assessed using the data generated by each observer and each angle for the first reading. All data combinations of the first reading for all observers yielded a total of six combinations for each of the six angles considered, e.g., 1 vs 2, 1 vs 3, 1 vs 4, 2 vs 3, 2 vs 4, and 3 vs 4, which were compared and an ICC determined for each combination.

Results
From January 2013 to December 2013, 26 lower limbs of 13 patients with spastic diplegia scheduled for derotational osteotomy of the lower limbs were studied retrospectively. There were nine males and four females whose ages at the time of the study averaged 14.6 ± 6.6 years (range 8 to 35 years). Three patients, one male and two females were >16 years old at the time of the study (Table 1). Intra-observer variability is shown in Table 2, and ICC is listed in the >10°column. For each observer and for each angle, two readings were taken into consideration: 1 and 2. ICC was calculated and results are reported in Table 3.
The average values of ICC were: Intra-observer variability expert 1: ICC = 0.79 ± 0.16 Intra-observer variability expert 2: ICC = 0.97 ± 0.02 Average variability of experts: ICC = 0.88 ± 0.15 Intra-observer variability non-expert 1: ICC = 0.97 ± 0.03 Intra-observer variability non-expert 2: ICC = 0.87 ± 0.08 Average variability of non-experts: ICC = 0.91 ± 0.09 The learning curve data showed significant statistical status for bias. SD and SD averages are reported in Table 4. Starting from these data we calculated: The average ICC for each observer: The data demonstrate the non-expert excessive bias (25°and 14°). These observers also gained the most improvement in bias with additional measurement experience. Experts showed less inter-observer variability.

Discussion
Increased femoral anteversion and increased medial and tibial torsion are frequent abnormalities of the lower limbs in patients with CP. Persistence of lower extremity malalignment including torsional deformities leads to dysfunction and impairment of locomotion. Indications for surgical treatment are controversial. Table 2 Intra-observer variability: for each expert/non-expert, two readings were taken into consideration (one and two) and SD and ICC were calculated   CT is an accurate modality for rotational measurements and is frequently used to measure TP when difficulties arise to clinically assess such measurements [6]. Accurate measurements of the TP are fundamental to surgical correction planning.
In the literature, there are few works regarding accuracy and reproducibility of CT for the measurement of femoral and tibial torsion. Questions may arise on whether these CT assessments are reliable in the clinical setting and how much training is needed to provide accuracy of these critical measurements.
In our study, the first readings were used as reference measurements instead of an average of three assessments. A single individual in clinical practice to determine torsional deformities did this to mimic the single measurement.
A relatively high value (>10°) was chosen to determine when bias became positive. For example, a difference of >10°from normal values would lead to an indication for surgery if this value is used as a cutoff point. Regarding the implications for patients care, the presence of significant bias between two measurements could represent one situation where there was a surgical indication and the other, an indication for no surgery. Intra-observer variability had significant bias when experts and nonexperts were compared. Expert number 1 had 25°of bias and non-expert number 2 had 14°of bias. The SD between experts (3.14 vs 1.77) is lower than that seen between non-experts (5.76 and 4.59).
In assessing the learning curve, we saw an improvement in measurement accuracy especially when non-experts were compared with each other. Non-expert number 1, the resident with no prior experience with the method, had the maximum number of significant bias results, but, also, had the greatest improvement in his data accuracy with progressive measurement experience. Non-expert number 2 showed no diminution of bias data with values increased and decreased on various readings. The improvement seen in the more senior resident values may be due to his observing others carrying out such measurements but had never performed them directly. The younger resident had never seen nor performed these measurements.
Relative to inter-observer variability, there was greater agreement between the two non-experts in respect to expert-non-expert comparisons. This was due to the nonexperts providing inaccurate measurements in contrast to fewer measurement errors by the experts.
Our preliminary data suggest that TP measurements by CT provide a reliable tool when used by experienced personnel. The number of measurements and experience with CT TP determinations was not defined by our study. Additional study is required to differentiate when a transition from non-expert to expert occurs. In the interim, the ones with experience with such image assessments should use TP values of CT images as preoperative measures. Additional study must be done to develop the most accurate and reproducible method of determining femoral and tibial version, to compare measurements of radiologists and orthopedic surgeons, experts and non-experts, and highlight their differences. Table 4 Values of bias, SD, and average of the SD for each angle and for each observer (experts and non-experts)