- Research article
- Open Access
Low inter-observer agreement among experienced shoulder surgeons assessing overstuffing of glenohumeral resurfacing hemiarthroplasty based on plain radiographs
Journal of Orthopaedic Surgery and Research volume 13, Article number: 299 (2018)
In a clinical setting, a visual evaluation of post-implant radiographs is often used to assess the restoration of glenohumeral joint anatomy after resurfacing hemiarthroplasty and is a part of the decision-making process, in combination with other parameters, when evaluating patients with inferior clinical results. However, the reliability of this method of visual evaluation has not been reported. The aim of this study was to investigate the inter- and intra-observer agreement among experienced shoulder surgeons assessing overstuffing, implant positioning, and size following resurfacing hemiarthroplasty using plain standardized radiographs.
Six experienced shoulder surgeons independently classified implant inclination, size of the implant and if the joint seemed overstuffed, in 219 cases of post-implant radiographs. All cases were classified twice 3 weeks apart. Only radiographs with an anterior-posterior projection with a freely visible joint space were used. Non-weighted Cohen’s kappa values were calculated for each coder pair and the mean used as an estimate of the overall inter-observer agreement.
The overall inter-observer agreement for implant size (kappa, 0.48 and 0.41) and inclination angle was moderate in both rounds (kappa, 0.46 and 0.44), but only a fair agreement was found concerning the evaluation for stuffing of the joint (kappa, 0.24 and 0.28). Intra-observer agreement for implant size and stuffing ranged from fair to substantial while the agreement for inclination was moderate to substantial.
Our results indicate that a visual evaluation of plain radiographs may be inadequate to evaluate overstuffing, implant positioning, and size following resurfacing hemiarthroplasty using plain standardized radiographs. Future studies may contribute to elucidate whether reliability increases if consensus on clear definitions and standardized methods of evaluation is made.
Resurfacing hemiarthroplasty (RHA) was developed to restore normal anatomy, and with a bone-preserving design and short operation time, it has often been preferred for the treatment of glenohumeral osteoarthritis [4, 10, 26, 27]. Some studies have reported good functional outcome and a low rate of revision [14,15,16, 22, 23, 28, 31], while others report a poor functional outcome and a high risk of revision [7, 8, 10, 11, 17, 24]. This has led to concerns that RHA may not adequately restore humeral anatomy .
Studies evaluating the restoration of glenohumeral joint anatomy following RHA have been conflicting. Some report that RHA restores humeral head anatomy [9, 18, 30] while others report increased lateral glenohumeral offset (LGHO) [14, 17,18,19,20, 28], displacement of the center of rotation , increased humeral head size , and a tendency to place the implant in varus [14, 18]. Despite a lack of a clear definition, the term overstuffing has also been widely used in the literature as a possible cause of persistent pain or a poor functional outcome following RHA [1,2,3, 6, 18, 19, 21, 25,26,27, 29, 30, 32]. Plain radiographs are the most common image modality when evaluating patients with a poor functional outcome or persistent pain following shoulder arthroplasty . In a clinical setting, a visual evaluation of post-implant radiographs is often used to assess the restoration of glenohumeral joint anatomy after RHA and is, in combination with other parameters, a part of the decision-making process, when evaluating patients with inferior clinical results. However, the reliability of this method of visual evaluation has not been elucidated.
Therefore, the aim of this study was to investigate the inter- and intra-observer agreement among experienced shoulder surgeons assessing overstuffing, implant inclination, and size following RHA using plain standardized radiographs.
Materials and methods
Three hundred eighty-two patients treated with primary RHA at one of four Danish university hospitals between January 2006 and December 2013 were retrospectively identified using the Danish Shoulder Arthroplasty Registry. Post-implant radiographs were digitally collected for each patient and evaluated for eligibility by the last author who was not an observer. Only cases with radiographs in an anterior-posterior projection with a freely visible joint space were included. One hundred sixty-three cases were excluded due to poor quality leaving 219 cases to be included in the study.
Six experienced shoulder surgeons, with a mean work experience of more than 10 years and a surgical volume of more than 50 shoulder arthroplasty procedures per year, were chosen as observers. The observers were all employed at one of the four hospitals providing radiographs. All radiographs were anonymized by digitally cropping out any patient data and hospital affiliation printed on the radiographs. The file names of the digital radiographs were then randomized using Excel (Microsoft, Redmond, Washington) before being sent to the observers for classification.
To ensure that all observers used similar visual characteristics of the categories 22 radiographs (10%) were retrieved by the same surgeon who evaluated the radiographs for eligibility and digitally presented to the group of observers. The group of observers then collectively chose one radiograph per classification they regarded as exemplifying the following; too small implant size, too large implant size, overstuffed, understuffed, valgus positioning, and varus positioning. This was done 3 weeks before the first classification round. The final results were calculated and analyzed both with and without the 22 cases used for the consensus classifications.
The observers independently classified all radiographs on two occasions 3 weeks apart. The order of the radiographs was randomized between the classification rounds. In both classification rounds, the observers were asked to evaluate the radiographs in terms of (1) inclination angle (varus, valgus, or anatomical), (2) size of RHA in the relation to the patient’s anatomy (too large, too small, or anatomical), and (3) stuffing of the joint (overstuffed, understuffed, or anatomical). The observers registered their evaluations for each radiograph in a separate spreadsheet for each classification round. The observers were not given any additional information about the patients, and they were not allowed to use any measurement tools in their evaluation. It was stressed that all radiographs should be evaluated also if the observer found the radiographs of insufficient quality.
The percentage of total observed agreement was calculated as the proportion of cases where all observers agreed upon the same classification. Non-weighted Cohen’s kappa values were calculated for each coder pair and the mean used as an estimate of the overall inter-observer agreement. The intra-observer agreement was calculated as the non-weighted Cohen’s kappa values for each individual observer between the two classification rounds. The agreement was calculated for inclination, implant size, and stuffing of the joint. Inter- and intra-observer agreement was calculated for both the 197 cases excluding the consensus cases and for the pooled answers of all 219 cases. Including the consensus, cases did not change the results or alter the conclusions of the present study. Therefore, the results are presented for all 219 cases. Kappa values were qualitatively interpreted using the ranges proposed by Landis and Koch with values less than 0 indicating poor agreement, 0.00–0.20 slight agreement, 0.21–0.40 fair agreement, 0.41–0.60 moderate agreement, 0.61–0.80 substantial agreement, and 0.81–1.00 excellent agreement .
Percentage of agreement and kappa values were calculated using the “irr” package and bootstrapping of the confidence intervals using the “boot” package in R statistical software version 3.3.2 (R foundation for statistical Computing, Vienna, Austria).
Compliance with ethical standards
The present study has been approved by the Danish Health and Medicines Authority (case no. 3-3013-1103/1/, 28/7-2015) and the Danish data protection agency (03663. ID no. HEH-2015-037, 20/04-2015).
The percentage of inter-observer agreement ranged from 17.8 to 43.4% (Table 1).
The overall inter-observer agreement for implant size and inclination was moderate, and for the stuffing category, the agreement was fair (Table 2).
The percentage of intra-observer agreement ranged from 68.0 to 90.4% (Table 3).
Intra-observer agreement for implant size and stuffing ranged from fair to substantial while the agreement for inclination was moderate to substantial (Table 4).
We observed moderate overall inter-observer agreement for the evaluation of implant size and inclination, but only fair agreement for overstuffing of the joint among experienced shoulder surgeons. Thus, indicating that a visual evaluation of plain radiographs may be inadequate to evaluate overstuffing, implant positioning, and size following RHA using plain standardized radiographs.
We observed moderate overall inter-observer agreement for the evaluation of size and inclination but only fair agreement for overstuffing of the joint among experienced shoulder surgeons.
With this study, we wanted to investigate if there was a uniform visual recognition pattern among shoulder surgeons for evaluating overstuffing, implant inclination, and size of the implant following RHA using plain standardized radiographs. Only a moderate inter-observer agreement was observed for the classification of implant size and inclination. Humeral head size and implant inclination based on plain radiographs following RHA are often reported in the literature [1, 14, 18, 27, 30]. However, data is lacking on the reliability of such evaluations following RHA. Our study indicates that a visual evaluation of both implant inclination and implant size might not be a reliable method of assessment.
Furthermore, we only demonstrated a fair agreement between the observers when evaluating overstuffing. We hypothesize that the lower agreement between the observers for the stuffing category is primarily caused by a general lack of clear definition between the observers causing them to rely on their own subjective pattern recognition. The higher intra-observer agreement indicates that the observers individually use their own subjective method to evaluate stuffing between the two classification rounds. However, it seems that their subjective pattern recognitions are different between the observers shown by the low inter-observer agreement. This tendency is also observed when examining the intra- and inter-observer agreements for inclination and implant size although the effect is less marked. This could indicate that a visual evaluation as a method is not sufficiently standardized and is heavily influenced by the surgeons’ own subjective way of pattern recognition, especially apparent regarding overstuffing. Therefore, a more standardized method of defining and measuring overstuffing might aid to increase the intra- and inter-observer agreement.
Multiple definitions of overstuffing following RHA have been used in the literature including a medial deviation of the center of rotation , increased LGHO [19, 20], and improper implant size [1, 2, 5, 6, 21, 28], but little data exist on the reliability of these measurements. A recent study by Kadum et al. investigated intra- and inter-observer agreement between four observers measuring LGHO on both computed tomography (CT) images and radiographs. The authors reported excellent inter- and intra-observer agreement when measuring LGHO on CT images. When measuring LGHO on plain radiographs only moderate inter-observer agreement was reported. When comparing measurements from CT images and radiographs, the authors found a tendency to underestimate the LGHO on radiographs with a mean difference of 5 mm . Another study by Thomas et al. also found low inter-observer agreement when measuring LGHO on plain radiographs. This was related to a systematic error with one observer locating the base of the coracoid more medially than the other. The authors concluded that their measurements of LGHO were unreliable . In an attempt to minimize such systematic errors, Stilling et al. created a modified method of measuring LGHO. The measurements were done by a radiologist, and intra-observer agreement was reported as being high but no data on the inter-observer agreement was reported . Based on these previous results, LGHO measured on plain radiographs does not seem like a viable method to evaluate the anatomical reconstruction or as a framework to define overstuffing.
Alolabi et al. used a spherical model mapped to preserved non-articular bone landmarks to assess the anatomical reconstruction of the center of rotation following RHA on pre- and post-implant radiographs. This was based on assessments done by four observers with cases evenly distributed between them, and therefore, no information on the inter-observer agreement was reported .
Based on the above, it currently seems that there are no reported methods of reliably assessing overstuffing following RHA, and thereby, no method of defining the term. Our study indicates that a uniform visual recognition pattern regarding overstuffing, implant inclination, and size of the implant following RHA does not exist and that each observer has their own subjective method of pattern recognition, especially regarding overstuffing. Therefore, a visual evaluation does not seem like a reliable method to define and asses overstuffing following RHA.
There are limitations to this study. Firstly, we had to exclude a significant amount of radiographs due to poor quality thereby introducing the possibility of selection bias. However, we hypothesize that had all the collected radiographs been included, we would probably have seen an even lower agreement. In a clinical setting, one would have the possibility of ordering supplemental radiographs if the quality was insufficient for clinical decision-making. Therefore, we believe the exclusion of radiographs without a visible joint space makes our results applicable to a clinical setting. Secondly, there is a possibility of recall bias when using the same radiographs between the two classification rounds. We tried to minimize this by including a high number of radiographs, randomizing the radiographs between classification rounds, and placing the two classification rounds 3 weeks apart. Despite this, it is possible that some of the observers could remember their answers for specific radiographs, thereby primarily affecting the intra-observer agreement. Thirdly, the external validity of the study may be questioned in terms of a lack of generalizability to less experienced observers as all the observers in the current study were experienced with more than 10 years of experience in shoulder surgery. Despite this, we decided to only include experienced shoulder surgeons as they are most likely to be involved in the decision-making process regarding patients with a poor functional outcome or persistent pain.
The present study only found a fair inter-observer agreement between experienced shoulder surgeons assessing stuffing of the shoulder joint and moderate inter-observer agreement when assessing the inclination and implant size based on plain radiographs. Thus, indicating that a visual evaluation of plain radiographs may be inadequate to evaluate overstuffing, implant positioning, and size following RHA using plain standardized radiographs. Future studies may contribute to elucidate whether reliability increases if consensus on clear definitions and standardized methods of evaluation can be made.
Al-Hadithy N, Domos P, Sewell MD, Naleem A, Papanna MC, Pandit R. Cementless surface replacement arthroplasty of the shoulder for osteoarthritis: results of fifty Mark III Copeland prosthesis from an independent center with four-year mean follow-up. J Shoulder Elb Surg. 2012;21:1776–81. https://doi.org/10.1016/j.jse.2012.01.024.
Alizadehkhaiyat O, Kyriakos A, Singer MS, Frostick SP. Outcome of Copeland shoulder resurfacing arthroplasty with a 4-year mean follow-up. J Shoulder Elb Surg. 2013;22:1352–8. https://doi.org/10.1016/j.jse.2013.01.027.
Alolabi B, Youderian AR, Napolitano L, Szerlip BW, Evans PJ, Nowinski RJ, Ricchetti ET, Iannotti JP. Radiographic assessment of prosthetic humeral head size after anatomic shoulder arthroplasty. J Shoulder Elb Surg. 2014;23:1740–6. https://doi.org/10.1016/j.jse.2014.02.013.
Australian Orthopaedic Association National Joint Replacement Registry Annual Report of the Australian Orthopaedic Association. Demographics and outcome of shoulder arthroplasty.2015
Wiater BP, JEJ M, Wiater JM. The evaluation of the failed shoulder arthroplasty. J Shoulder Elb Surg. 2014;23:745–58. https://doi.org/10.1016/j.jse.2013.12.003.
Bailie DS, Llinas PJ, Ellenbecker TS. Cementless humeral resurfacing arthroplasty in active patients less than fifty-five years of age. J Bone Joint Surg Am. 2008;90:110–7. https://doi.org/10.2106/JBJS.F.01552.
Bryant D, Litchfield R, Sandow M, Gartsman GM, Guyatt G, Kirkley A. A comparison of pain, strength, range of motion, and functional outcomes after hemiarthroplasty and total shoulder arthroplasty in patients with osteoarthritis of the shoulder. J Bone Joint Surg Am. 2005;87:1947–56. https://doi.org/10.2106/JBJS.D.02854.
Buchner M, Eschbach N, Loew M. Comparison of the short-term functional results after surface replacement and total shoulder arthroplasty for osteoarthritis of the shoulder: a matched-pair analysis. Arch Orthop Trauma Surg. 2008;128:347–54. https://doi.org/10.1007/s00402-007-0404-x.
Deladerrière JY, Szymanski C, Vervoort T, Budzik JF, Maynou C. Geometrical analysis results of 42 resurfacing shoulder prostheses: a CT scan study. Orthop Traumatol Surg Res. 2012;98:520–7. https://doi.org/10.1016/j.otsr.2012.03.010.
Fevang BTS, Lygre SHL, Bertelsen G, Skredderstuen A, Havelin LI, Furnes O. Pain and function in eight hundred and fifty nine patients comparing shoulder hemiprostheses, resurfacing prostheses, reversed total and conventional total prostheses. Int Orthop. 2013;37:59–66. https://doi.org/10.1007/s00264-012-1722-3.
Geervliet PC, van den Bekerom MPJ, Spruyt P, Curvers M, van Noort A, Visser CPJ. Outcome and revision rate of uncemented glenohumeral resurfacing (C.A.P.) after 5–8 years. Arch Orthop Trauma Surg. 2017;137:771–8. https://doi.org/10.1007/s00402-017-2688-9.
Kadum B, Sayed-Noor AS, Perisynakis N, Baea S, Sjödén GO. Radiologic assessment of glenohumeral relationship: reliability and reproducibility of lateral humeral offset. Surg Radiol Anat. 2015;37:363–8. https://doi.org/10.1007/s00276-015-1424-9.
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74. https://doi.org/10.2307/2529310.
Lebon J, Delclaux S, Bonnevialle N, Rongières M, Bonnevialle P, Mansat P. Stemmed hemiarthroplasty versus resurfacing in primary shoulder osteoarthritis: a single-center retrospective series of 78 patients. Orthop Traumatol Surg Res. 2014;100:S327–32. https://doi.org/10.1016/j.otsr.2014.05.012.
Levy O, Copeland SA. Cementless surface replacement arthroplasty of the shoulder. 5- to 10-year results with the Copeland mark-2 prosthesis. J Bone Joint Surg Br. 2001;83:213–21. https://doi.org/10.1016/S0021-9290(10)70074-9.
Levy O, Tsvieli O, Merchant J, Young L, Trimarchi A, Dattani R, Abraham R, Copeland SA, Narvani A, Atoun E. Surface replacement arthroplasty for glenohumeral arthropathy in patients aged younger than fifty years: results after a minimum ten-year follow-up. J Shoulder Elb Surg. 2015:1–12. https://doi.org/10.1016/j.jse.2014.11.035.
Maier MW, Hetto P, Raiss P, Klotz M, Bülhoff M, Spranz D, Zeifang F. Cementless humeral head resurfacing for degenerative glenohumeral osteoarthritis fails at a high rate. J Orthop. 2018;15:349–53. https://doi.org/10.1016/j.jor.2018.02.013.
Mansat P, Coutié AS, Bonnevialle N, Rongières M, Mansat M, Bonnevialle P. Resurfacing humeral prosthesis: do we really reconstruct the anatomy? J Shoulder Elb Surg. 2013;22:612–9. https://doi.org/10.1016/j.jse.2012.07.014.
Mechlenburg I, Amstrup A, Klebe T, Jacobsen SS, Teichert G, Stilling M. The Copeland resurfacing humeral head implant does not restore humeral head anatomy. A retrospective study. Arch Orthop Trauma Surg. 2013;133:615–9. https://doi.org/10.1007/s00402-013-1715-8.
Mechlenburg I, Klebe TM, Døssing KV, Amstrup A, Søballe K, Stilling M. Evaluation of periprosthetic bone mineral density and postoperative migration of humeral head resurfacing implants: two-year results of a randomized controlled clinical trial. J Shoulder Elb Surg. 2014;23:1427–36. https://doi.org/10.1016/j.jse.2014.05.012.
ML PEARL, KURUTZ S. Geometric analysis of commonly used prosthetic systems for proximal humeral replacement*. J Bone Jt Surg. 1999;81:660–71. https://doi.org/10.2106/00004623-199905000-00007.
Pritchett JW. Long-term results and patient satisfaction after shoulder resurfacing. J Shoulder Elb Surg. 2011;20:771–7. https://doi.org/10.1016/j.jse.2010.08.014.
Rai P, Davies O, Wand J, Bigsby E. Long-term follow-up of the Copeland mark III shoulder resurfacing hemi-arthroplasty. J Orthop. 2016;13:52–6. https://doi.org/10.1016/j.jor.2015.09.003.
Rasmussen JV, Olsen BS, Sorensen AK, Hróbjartsson A, Brorson S. Resurfacing hemiarthroplasty compared to stemmed hemiarthroplasty for glenohumeral osteoarthritis: a randomised clinical trial. Int Orthop. 2014;39:263–9. https://doi.org/10.1007/s00264-014-2505-9.
Rasmussen JV, Polk A, Sorensen AK, Olsen BS, Brorson S. Outcome, revision rate and indication for revision following resurfacing hemiarthroplasty for osteoarthritis of the shoulder: 837 operations reported to the Danish shoulder arthroplasty registry. Bone Jt J. 2014;96(B):519–25. https://doi.org/10.1302/0301-620X.96B4.31850.
Rasmussen JV, Polk A, Brorson S, Sørensen AK, Olsen BS. Patient-reported outcome and risk of revision after shoulder replacement for osteoarthritis. Acta Orthop. 2014;85:117–22. https://doi.org/10.3109/17453674.2014.893497.
Smith T, Gettmann A, Wellmann M, Pastor F, Struck M. Humeral surface replacement for osteoarthritis. Acta Orthop. 2013;84:468–72. https://doi.org/10.3109/17453674.2013.838658.
Soudy K, Szymanski C, Lalanne C, Bourgault C, Thiounn A, Cotten A, Maynou C. Results and limitations of humeral head resurfacing: 105 cases at a mean follow-up of 5 years. Orthop Traumatol Surg Res. 2017;103:415–20. https://doi.org/10.1016/j.otsr.2016.12.015.
Stilling M, Mechlenburg I, Amstrup A, Soballe K, Klebe T. Precision of novel radiological methods in relation to resurfacing humeral head implants: assessment by radiostereometric analysis, DXA, and geometrical analysis. Arch Orthop Trauma Surg. 2012;132:1521–30. https://doi.org/10.1007/s00402-012-1580-x.
Thomas SR, Sforza G, Levy O, S a C. Geometrical analysis of Copeland surface replacement shoulder arthroplasty in relation to normal anatomy. J Shoulder Elb Surg. 2005;14:186–92. https://doi.org/10.1016/j.jse.2004.06.013.
Thomas SR, Wilson AJ, Chambler A, Harding I, Thomas M. Outcome of Copeland surface replacement shoulder arthroplasty. J Shoulder Elb Surg. 2005;14:485–91. https://doi.org/10.1016/j.jse.2005.02.011.
Youderian AR, Ricchetti ET, Drews M, Iannotti JP. Determination of humeral head size in anatomic shoulder replacement for glenohumeral osteoarthritis. J Shoulder Elb Surg. 2014;23:955–63. https://doi.org/10.1016/j.jse.2013.09.005.
No funding was received.
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Ethics approval and consent to participate
The present study has been approved by the Danish Health and Medicines Authority (case no. 3-3013-1103/1/) and the Danish data protection agency (03663. ID no. HEH-2015-037).
Consent for publication
The Danish Health and Medicines Authority has approved publication of anonymized patient data including the images used in the present study (case no. 3-3013-1103/1/).
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Sandau, N., Brorson, S., Olsen, B.S. et al. Low inter-observer agreement among experienced shoulder surgeons assessing overstuffing of glenohumeral resurfacing hemiarthroplasty based on plain radiographs. J Orthop Surg Res 13, 299 (2018). https://doi.org/10.1186/s13018-018-1008-6
- Experienced Shoulder Surgeons
- Plain Standardized Radiographs
- Visible Joint Space
- Code Pairs