Skip to main content

Intra-observer and inter-observer errors in CT measurement of torsional profiles of lower limbs: a retrospective comparative study



The purpose of this study was to determine errors in measurement of torsional profiles (TP) (torsional femoral angle, torsional tibial angle, and femoral ankle angle) among four orthopedic surgeons, experts, and non-experts in measurement, and the learning curve.


Twenty-six lower extremities of 13 patients with spastic diplegia candidates for femoral/tibial derotational osteotomy had preoperative bilateral computer tomography (CT) scan grams to establish the TP. Each measurement was done by four orthopedic surgeons, two experienced clinicians and interpreters of CT imaging and two with limited clinical and imaging assessment experiences. Images were blinded and the surgeons made three determinations at least 5 days apart; the three angles were measured each time for each limb. Intra-observer and inter-observer variability were determined using bias, standard deviation, and interclass correlation coefficient.


Significant inter-observer variability and bias were noted between experts and non-experts (average variability: ICC experts: 0.88 ± 0.15; ICC non-experts: 0.91 ± 0.09). For non-experts, excessive bias (25° and 14°) was observed. An associated improvement in bias with additional measurement experience indicated a potential significant learning curve for interpreting these studies. Less inter-observer variability was observed between experts.


Measurement of TP is a reliable tool when used by experienced personnel, and their use as a preoperative tool should be reserved to ones with experience with such image assessments. Non-experts’ measurements produced a weak agreement when compared to experts’.


Lower extremity deformities in cerebral palsy (CP) include increased femoral anteversion and tibial medial or lateral torsion. The delay in normal physiologic resolution of torsion is primarily related to increased motor tone. Persistence of abnormal lower extremity bony torsion in combination with abnormal motor responses leads to dysfunction and clinical impairment. A femoral anteversion angle of >50° correlates well with the gross motor function classification system (GMFCS) score [1-5].

Standard clinical, routine radiographic assessments of lower extremity torsional abnormalities, ultrasound, or fluoroscopy may provide inadequate data to determine preoperative measurements in planned osteotomy correction cases [6-8]. Evaluations by computer tomography (CT) determinations for rotational deformities in the lower extremities have been recommended pre-surgery but the accuracy of such assessments has not been consistently documented [9].

To our knowledge, there are no previous reports comparing intra- and inter-observer agreement in measurement of torsional profiles (TP) with CT axial images with different levels of training in orthopedics. The purpose of this study was to assess the intra- and inter-observer agreement and accuracy, determine errors between measurements of torsional femoral angle (TFA), torsional tibial angle (TTA), and femoral ankle angle (FAA) of lower limbs provided by expert and non-expert orthopedic surgeons, and also determine the presence of a learning curve.


Our study was performed according to the ethical standards of the Declaration of Helsinki (1964) and its later amendments. Acknowledgements by the Hospital’s Institutional Review Board were granted on February 2014, although using TP for pre-surgery decision in CP patients is routinely performed.

Inclusion criteria were diplegic patients who were candidates for derotational osteotomy of the lower limbs. All patients had CT scan grams using a multi-director row CT scanner (Somatomensaton Siemens, Germany) with 5.5 mm slice thickness. Patients were supine with the patella directly anterior with the hip and knee in as much extension as possible. No sedation was required for any case. Lower extremity restraints were used on the CT bed to maintain position throughout the study. A radiologist collected all the CT scan grams, concealed the patient’s information, and saved the images electronically. Each reviewer using a DICOM Viewer did measurements. Each CT scan was numbered and each observer measured the blinded images on three occasions with at least 5 days between readings. Lines for calculation were drawn on transparent paper during each measurement. The paper was eliminated after every measurement. Line placement was the choice of the individual assessor.

Each measurement was performed by four orthopedic surgeons, two experienced clinicians and interpreters of CT imaging (expert 1 and 2), and two orthopedic surgeons in training; one a first year resident and the other in his final year of the program (non-experts 1 and 2).

The following angles were measured:

  • TFA is the angle formed between a line passing through the center of the femoral neck and a tangent line passing from the distal posterior femoral condyles and represents the angle of femoral version.

  • TTA is the angle formed between a line tangent to the posterior surface of the proximal tibial plateau and a line passing through the mid-point of the tibial and fibular malleoli and represents the angle of tibial version.

  • FAA is formed between a line passing through the center of the femoral neck and a line between the center of the tibial and fibular malleoli and represents the foot progression angle.

Intra-observer variability was done using the Bland-Altam method [10]. For each angle and for each observer, measurements were calculated by the difference from one measurement with respect to the previous measurement, e.g., measurement 1 vs 2, 3 vs 2, and 3 vs 1. Bias was determined for each measurement by taking the average of measured differences for each patient in addition to the standard deviation (SD). A bias of >10° was considered significant for each observer. For each measurement, the number of significant bias was calculated. For correlation of consecutive measurements carried out by the same observer, the interclass correlation coefficient (ICC) was used [11]. ICC is interpreted assessing the range between −1 and 1; agreement is stronger when ICC is equal to 1 or −1 and weak when equal to 0. Measurements were compared for each angle and for each observer for the presence of significant bias and the SD of the observations between measurements 3 vs 2; 3 vs 1 was used to identify presence of a learning curve in determining the data.

Inter-observer variability was assessed using the data generated by each observer and each angle for the first reading. All data combinations of the first reading for all observers yielded a total of six combinations for each of the six angles considered, e.g., 1 vs 2, 1 vs 3, 1 vs 4, 2 vs 3, 2 vs 4, and 3 vs 4, which were compared and an ICC determined for each combination.


From January 2013 to December 2013, 26 lower limbs of 13 patients with spastic diplegia scheduled for derotational osteotomy of the lower limbs were studied retrospectively. There were nine males and four females whose ages at the time of the study averaged 14.6 ± 6.6 years (range 8 to 35 years). Three patients, one male and two females were >16 years old at the time of the study (Table 1).

Table 1 Population of the study

Intra-observer variability is shown in Table 2, and ICC is listed in the >10° column. For each observer and for each angle, two readings were taken into consideration: 1 and 2. ICC was calculated and results are reported in Table 3.

Table 2 Intra-observer variability: for each expert/non-expert, two readings were taken into consideration (one and two) and SD and ICC were calculated
Table 3 ICC values to determine the inter-observer variability

The average values of ICC were:

  • Intra-observer variability expert 1: ICC = 0.79 ± 0.16

  • Intra-observer variability expert 2: ICC = 0.97 ± 0.02

  • Average variability of experts: ICC = 0.88 ± 0.15

  • Intra-observer variability non-expert 1: ICC = 0.97 ± 0.03

  • Intra-observer variability non-expert 2: ICC = 0.87 ± 0.08

  • Average variability of non-experts: ICC = 0.91 ± 0.09

The learning curve data showed significant statistical status for bias. SD and SD averages are reported in Table 4. Starting from these data we calculated:

Table 4 Values of bias, SD, and average of the SD for each angle and for each observer (experts and non-experts)

The average ICC for each observer:

  • Expert vs expert: ICC = 0.97 ± 0.02

  • Expert vs non-expert: ICC = 0.65 ± 0.24

  • Non-expert vs non-expert: ICC = 0.85 ± 0.13

The average ICC for each angle:

  • R-TFA: 0.70 ± 0.39

  • L-TFA: 0.72 ± 0.31

  • R-TTA: 0.80 ± 0.13

  • L-TTA: 0.80 ± 0.16

  • R-FAA: 0.75 ± 0.28

  • L-FAA: 0.85 ± 0.19

The data demonstrate the non-expert excessive bias (25° and 14°). These observers also gained the most improvement in bias with additional measurement experience. Experts showed less inter-observer variability.


Increased femoral anteversion and increased medial and tibial torsion are frequent abnormalities of the lower limbs in patients with CP. Persistence of lower extremity malalignment including torsional deformities leads to dysfunction and impairment of locomotion. Indications for surgical treatment are controversial.

CT is an accurate modality for rotational measurements and is frequently used to measure TP when difficulties arise to clinically assess such measurements [6]. Accurate measurements of the TP are fundamental to surgical correction planning.

In the literature, there are few works regarding accuracy and reproducibility of CT for the measurement of femoral and tibial torsion. Questions may arise on whether these CT assessments are reliable in the clinical setting and how much training is needed to provide accuracy of these critical measurements.

In our study, the first readings were used as reference measurements instead of an average of three assessments. A single individual in clinical practice to determine torsional deformities did this to mimic the single measurement.

A relatively high value (>10°) was chosen to determine when bias became positive. For example, a difference of >10° from normal values would lead to an indication for surgery if this value is used as a cutoff point. Regarding the implications for patients care, the presence of significant bias between two measurements could represent one situation where there was a surgical indication and the other, an indication for no surgery. Intra-observer variability had significant bias when experts and non-experts were compared. Expert number 1 had 25° of bias and non-expert number 2 had 14° of bias. The SD between experts (3.14 vs 1.77) is lower than that seen between non-experts (5.76 and 4.59).

In assessing the learning curve, we saw an improvement in measurement accuracy especially when non-experts were compared with each other. Non-expert number 1, the resident with no prior experience with the method, had the maximum number of significant bias results, but, also, had the greatest improvement in his data accuracy with progressive measurement experience. Non-expert number 2 showed no diminution of bias data with values increased and decreased on various readings. The improvement seen in the more senior resident values may be due to his observing others carrying out such measurements but had never performed them directly. The younger resident had never seen nor performed these measurements.

Relative to inter-observer variability, there was greater agreement between the two non-experts in respect to expert-non-expert comparisons. This was due to the non-experts providing inaccurate measurements in contrast to fewer measurement errors by the experts.

Our preliminary data suggest that TP measurements by CT provide a reliable tool when used by experienced personnel. The number of measurements and experience with CT TP determinations was not defined by our study. Additional study is required to differentiate when a transition from non-expert to expert occurs. In the interim, the ones with experience with such image assessments should use TP values of CT images as preoperative measures. Additional study must be done to develop the most accurate and reproducible method of determining femoral and tibial version, to compare measurements of radiologists and orthopedic surgeons, experts and non-experts, and highlight their differences.


Measurement of TP is a reliable tool when used by experienced personnel and their use as a preoperative tool should be reserved to ones with experience with such image assessments. Non-experts’ measurements produced a weak agreement when compared to experts’.



cerebral palsy


femoral ankle angle


gross motor function classification system


interclass correlation coefficient


standard deviation


torsional femoral angle


torsional profiles


torsional tibial angle


  1. 1.

    Wren TA, Rethlefsen S, Kay RM. Prevalence of specific gait abnormalities in children with cerebral palsy: influence of cerebral palsy subtype, age, and previous surgery. J Pediatr Orthop. 2005;25:79–83.

    PubMed  Google Scholar 

  2. 2.

    O’Sullivan R, Walsh M, Hewart P, Jenkinson A, Ross LA, O’Brien T. Factors associated with internal hip rotation gait in patients with cerebral palsy. J Pediatr Orthop. 2006;26:537–41.

    Article  PubMed  Google Scholar 

  3. 3.

    Fabry G, Cheng LX, Molenaers G. Normal and abnormal torsional development in children. Clin Orthop Relat Res. 1994;00:22–6.

    CAS  Google Scholar 

  4. 4.

    Beals RK. Developmental changes in the femur and acetabulum in spastic paraplegia and diplegia. Dev Med Child Neurol. 1969;11:303–13.

    Article  CAS  PubMed  Google Scholar 

  5. 5.

    Gage J, Schwartz MH, Koop S, Novacheck T. The identification and treatment of gait problems in cerebral palsy, Clinics in Developmental Medicine. 2nd ed. Wiley-Blackwell: Mac keith Press; 2009. p. 180–1.

  6. 6.

    Terjesen T, Anda S, Svenningsen S. Femoral anteversion in adolescents and adults measured by ultrasound. Clin Orthop Relat Res. 1990;256:274–9.

    PubMed  Google Scholar 

  7. 7.

    Jeanmart L, Baert AL, Wackenheim A. Atlas of pathologic computer tomography, Computed tomography of neck, chest, spine and limbs. 3rd ed. Berlin: Springer; 1983.

    Google Scholar 

  8. 8.

    Stuberg W, Temme J, Kaplan P, Clarke A, Fuchs R. Measurement of tibial torsion and thigh foot angle using goniometry and computed tomography. Clin Orthop. 1991;272:208–12.

    PubMed  Google Scholar 

  9. 9.

    Jaarsma RL, Bruggeman AW, Pakvis DF, Verdonschot N, Lemmens JA, Van Kampen A. Computed tomography determined femoral torsion is not accurate. Arch Orthop Trauma Surg. 2004;124(8):552–4.

    Article  CAS  PubMed  Google Scholar 

  10. 10.

    Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;8476:307–10.

    Article  Google Scholar 

  11. 11.

    Fleiss JL, Cohen J. The equivalence of weighted kappa and the interclass correlation coefficient as measures of reliability. Educ Psychol Meas. 1973;33(3):613–9.

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Artemisia Panou.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All authors made substantial contribution to the conception and design of the study, analysis and interpretation of data, drafting and revising the article, and finally, all gave their approval of the version to be published.

Rights and permissions

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Panou, A., Stanitski, D.F., Stanitski, C. et al. Intra-observer and inter-observer errors in CT measurement of torsional profiles of lower limbs: a retrospective comparative study. J Orthop Surg Res 10, 67 (2015).

Download citation


  • Torsional profiles
  • Intra-observer errors
  • CT measurements
  • Neck-shaft angle
  • Tibial torsion