Skip to main content

Assessing responsiveness of the EQ-5D-3L, the Oxford Hip Score, and the Oxford Knee Score in the NHS patient-reported outcome measures

A Correction to this article was published on 05 February 2021

This article has been updated

Abstract

Background

The degree to which a validated instrument is able to detect clinically significant change over time is an important issue for the better management of hip or knee replacement surgery. This study examines the internal responsiveness of the EQ-5D-3L, the Oxford Hip Score (OHS), and the Oxford Knee Score (OKS) by various methods. Data from NHS patient-reported outcome measures (PROMs) linked to the Hospital Episodes Statistics (HES) dataset (2009–2015) was analysed for patients who underwent primary hip surgery (N = 181,424) and primary knee surgery (N = 191,379).

Methods

Paired data-specific univariate responsiveness was investigated using the standardized response mean (SRM), the standardized effect size (SES), and the responsiveness index (RI). Multivariate responsiveness was furthermore examined using the defined capacity of benefit score (i.e. paired data-specific MCID), adjusting baseline covariates such as age, gender, and comorbidities in the Box-Cox regression models. The observed and predicted percentages of patient improvement were examined both as a whole and by the patients' self-assessed transition level.

Results

The results showed that both the OHS and the OKS demonstrated great univariate and multivariate responsiveness. The percentages of the observed (predicted) total improvement were high: 51 (54)% in the OHS and 73 (58)% in OKS. The OHS and the OKS showed distinctive differences in improvement by the 3-level transition, i.e. a little better vs. about the same vs. a little worse. The univariate responsiveness of the EQ-5D-3L showed moderate effects in total by Cohen’s thresholds. The percentages of improvement in the EQ-5D-3L were moderate: 44 (48)% in the hip and 42 (44)% for the knee replacement population.

Conclusions

Distinctive percentage differences in patients’ perception of improvement were observed when the paired data-specific capacity of benefit score was applied to examine responsiveness. This is useful in clinical practice as rationale for access to surgery at the individual-patient level. This study shows the importance of analytic methods and instruments for investigation of the health status in hip and/or knee replacement surgery. The study finding also supports the idea of using a generic measure along with the disease-specific instruments in terms of cross-validation.

Background

The responsiveness of a health-related functional state is an important issue in arthroplasty surgery. Responsiveness is the ability of an instrument to detect clinically significant change in health status and as such reflects its impact on clinical practice over time [1,2,3]. It is well recognized that measurement properties can vary according to the study population of interest. This is particularly true of the generic measures, especially those measuring responsiveness. The decision to use a generic or disease-specific instrument to detect responsiveness will also depend on the study design, objectives, and evaluation of cost-effectiveness [4]. Generic health status measures seek a broad perspective that is not specifically related to the restricted scope of the health-related functional status of a particular disease. Generic measures allow the comparison of health status across different diseases and interventions [4, 5].

Assessing outcomes of hip and knee replacement surgery for both generic and specific measures is enabled by the EQ-5D-3L, the Oxford Hip Score (OHS), and the Oxford Knee Score (OKS). The EQ-5D is a well-known and widely used generic patient-reported outcome questionnaire [6]. The current UK version of EQ-5D-3L was introduced in 1997 as a generic measure of health for clinical and economic assessment [7]. It was designed to describe and value health by providing a single summary index-based value (utility; − 0.59 to 1) representing the overall health-related quality of life by quantifying a preference for the individual’s health state [8]. The questionnaire consists of a self-reported/descriptive system to describe the three-level health problems (no/some/extreme problem) on each five dimension: mobility (i.e. problem in walking), self-care (i.e. problem with self-care), usual activities (e.g. work, study, housework, family, or leisure activities), pain/discomfort, and anxiety/depression.

The OHS and the OKS focus on the disease being studied, allowing greater sensitivity to intervention related-change compared to generic measures [4, 9]. The OHS and the OKS consist of 12 Likert-type response items, which relate to pain and disability experienced over the past 4 weeks [10, 11]. Scores from each item are summed (responses coded from ‘None’ = 4 to ‘Severe’ = 0) providing a range of 0 to 48, with a higher score indicating greater health status [12, 13].

Husted et al. [14] defined the internal responsiveness as the ability of a measure to change over a pre-specified time frame. The external responsiveness was defined as the extent to which changes in measure relate to changes in other measures of health status, and it measures rather the relationship between change in the measure and change in the external standard [14]. The external responsiveness between independent groups and cross-validation between measuring systems were explored in the previous studies. The ability of these instruments to detect responsiveness is required to examine using the paired group specific statistics as previous studies did not specify the internal responsiveness for a single group. The aim of this study is to evaluate the paired data-specific responsiveness of the EQ-5D-3L, the OHS, and the OKS using various analytic methods, and to discuss which analytic methods and instruments should be used for the reporting system in arthroplasty surgery.

Methods

Data sources

Responsiveness was accessed for the population from the NHS patient-reported outcome measures (PROMs) data who have undergone hip or knee surgery in the UK (ref: NIC-392690-F7H2Q). Follow-up was measured 6 months after the hip or knee surgeries. The NHS PROMs linked to Hospital Episodes Statistics (HES) (2009–2015) data recorded the pre- and the 6 months post-operative PROMs outcomes. The outcomes include the EQ-5D-3L and the respective hip and knee Oxford scores for all individuals who underwent hip and knee surgery in England [15].

The inclusion criteria were patients who had not received revision (primary surgery only)Footnote 1 and who had not had previous surgery, using the ‘Q1_PREVIOUS_SURGERY’ question (N = 575,980). In addition, patients who completed both pre- and post-questionnaires were included, using the ‘Q1 and Q2 Complete’ questions (N = 443,262) [16]. For hip surgery, patients who submitted specific data were included for both the pre- and post-operative Oxford questionnaires to derive scores with sufficient procedure, using the ‘HR Q1 and HR Q2 Score Complete’ questions (N = 209,761) and the ‘Q1 and Q2 EQ-5D Health Scale Complete Indicators’ (N = 181,424). The same approach was applied for those undergoing knee surgeries (N = 191,379).

Outcome and predictor variables

The change scores (the difference between the post- and pre-operative scores) of the EQ-5D-3L, the OHS and the OKS were used as the main outcomes, respectively. The pre-operative EQ-5D-3L, OHS, and OKS scores were used as the main predictor variables for the change scores. Patients' age, gender and important clinical exposures, namely, 12 individual comorbidities (heart disease, high blood pressure, stroke, circulation, lung disease, diabetes, kidney, nervous system, liver disease, cancer, depression, arthritis) were used as other prognostic variables.

Transition question

The MCID (minimally clinically important difference), which can be linked to the improvement concept, was calculated using the patients’ self-assessment of the 6 months post-operative outcomes relative to the pre-operation. The MCID allows an estimation of the probability of a relevant improvement in the instrument of an intervention [17]. The assumption of the MCID is that the mean change score needed to obtain a medium or large effect size is clinically meaningful [18]. Clinically meaningful refers to a change that indicates the efficacy of the intervention in domains of a health-related functional status instrument [4]. The MCID can be calculated for the group reflecting level (using the anchor-based transition in which the concept of ‘minimal importance’ is explicitly incorporated) and also for the distribution-based individual level (using the standardized response mean (SRM) applied paired data-specific MCID) [17, 19, 20]. In this paper, a combined approach, firstly, the SRM applied paired data-specific MCID was used to estimate the threshold for improvement, and secondly, patients’ perception of improvement was estimated by the level of the transition in the multivariate regression models (Table 1).

Table 1 The patient-reported success of the surgery in the Oxford hip (or knee) score questionnaire

The NHS PROMs contains the post-operative satisfaction and success questions, and the success question was applied in this study since it is considered more objective than the satisfaction question asking ‘How would you describe the results of your operation? Excellent/Very good/Good/Fair/Poor’.

For the paired data-specific univariate responsiveness, the SRM, the standardized effect size (SES), and the responsiveness index (RI) were calculated.

Univariate responsiveness measures

In the present study, internal responsiveness was investigated focusing on internal standard of an individual using the pre- and post-operation (paired) data and compared as the psychometric property of the EQ-5D-3L, the OHS, and the OKS. The internal responsiveness was assessed by calculating different formula of responsiveness in terms of a critical assessment: the SRM, the SES, and the RI for the univariate statistics.

SRM for the paired data [4, 20,21,22]

$$ \mathrm{The}\ \mathrm{paired}\ \mathrm{data}-\mathrm{specific}\ \mathrm{SRM}:\frac{\left({\mathrm{Mean}}_{\mathrm{change}\ \mathrm{score}}/{\mathrm{SD}}_{\mathrm{change}\ \mathrm{score}}\right)}{\surd 2\times \surd \left(1-r\right)} $$
(1)

where r is a correlation coefficient between the pre- and post-operative scores [4].

The pre- and post-operation data-specific SRM is the ratio between the mean change score and the variability (SD) of that change score within the same group (Meanchange score/SDchange score), and the difference between means for the independent data is standardized (i.e. divided) by a value √2 ×  √ (1 − r) (as large as would be the case were they independent) [4, 21] (The SRM for the independent data is simply Meanchange score/SDchange score between the two groups [20]).

SES for the paired data

The SES was calculated using the patients’ self-assessed transition level, i.e. much better, a little better, about the same, a little worse, and much worse [4].

$$ \mathrm{Standardized}\ \mathrm{Effect}\ \mathrm{Size}\ \left(\mathrm{SES}\right)=\frac{{\mathrm{Mean}}_{\mathrm{pre}-\mathrm{op}.\mathrm{score}}-{\mathrm{Mean}}_{\mathrm{post}-\mathrm{op}.\mathrm{score}\ \left(\mathrm{of}\ \mathrm{the}\ \mathrm{success}\ \mathrm{level}\right)}}{{\mathrm{SD}}_{\mathrm{pre}-\mathrm{op}.\mathrm{score}\ \left(\mathrm{of}\ \mathrm{the}\ \mathrm{success}\ \mathrm{level}\right)}} $$
(2)

RI for the paired data

The RI was proposed as the ratio of average change produced by a treatment to the between subject variability of difference scores in stable subjects [23]. The RI was calculated using the patients’ self-assessed transition-based (i.e. a little better vs. about the same) MCID, assuming the patients’ perception of change over time is meaningful [4, 24].

$$ \mathrm{Responsiveness}\ \mathrm{Index}\ \left(\mathrm{RI}\right)=\frac{{\mathrm{MCID}}_{\mathrm{anchor}-\mathrm{based}}}{{\mathrm{SD}}_{\mathrm{change}\ \mathrm{score}\ \left(\mathrm{of}\ \mathrm{the}\ \mathrm{stable}\ \mathrm{level}\right)}} $$
(3)

where the MCID here is according to a criterion (i.e. the difference in change score between those who perceived a little better vs. about the same)

In addition to the univariate responsiveness measures, the patients’ perception of improvement was estimated using the modelling approach using the Box-Cox regressions based on log-likelihood while adjusting responsiveness with patient characteristics, including age, gender, and 12 individual comorbidities. For the robust analytic approach, the paired data-specific MCID was defined as the threshold for improvement in the models.

Multivariate responsiveness measures

The threshold for improvement with the MCID for the paired data

Cohen introduced the matched pairs effect size [21], which was later renamed the standardized response mean (SRM) by Liang et al. [4, 20].

The paired data-specific MCID (i.e. Meanchange score) applied the SRM [Eq. 1], as a desired effect size [25]:

$$ \mathrm{The}\ \mathrm{paired}\ \mathrm{data}-\mathrm{specific}\ \mathrm{SRM}\ \left[\mathrm{Eq}\ .1\right]\times \surd 2\times \surd \left(1-r\right)\times {\mathrm{SD}}_{\mathrm{change}\ \mathrm{score}} $$
(4)

The independent data MCID, using Cohen’s medium (0.5) or large (0.8) effect size for the independent samples, is Cohen’s d (i.e. 0.5 or 0.8) × √2 ×  √ (1 − r) × SDchange score.

Multivariate responsiveness using the regression models

The percentage improvement based on the paired data-specific MCID [Eq. 4] was examined as multivariate responsiveness of the EQ-5D-3L, the OHS and the OKS to examine which instrument is sensitive to detect the changes of improvement for the paired data. The result was additionally examined by the patients' self-assessed transition level, i.e. much better, a little better, about the same, a little worse, and much worse. The observed and estimated percentage improvements were examined separately where regression approaches were applied, adjusting patient baseline covariates such as age, gender, and comorbidities. Adjusting the covariates is one of the strengths in comparison to the responsiveness statistics described in the previous sections. The 3rd and the 2nd degree Box-Cox regressions based on log-likelihood were fitted to estimate the patients’ perception of improvement. The impact of baseline covariates, i.e. age (as a continuous variable), gender, and individual comorbidities, were examined in total and by the transition level population (Fig. 1).

Fig. 1
figure1

The OHS and EQ-5D-3L – total population (1, 3) and the transition level (2, 4). Fitted 3rd degree Box-Cox regression lines 1 for the OHS total population and 2 by the patients’ self-assessed transition level. The 2nd degree Box-Cox regression estimates 3 for the EQ-5D-3L total hip surgery population and 4 by the patients’ self-assessed transition level. All the graphs are presented by age group additionally. Colourful dots indicate 50th percentile for each category, and grey dots indicate actual observations. Grey horizontal lines indicate each defined score improvement (e.g. 22 for the OHS and 0.428 for the hip EQ-5D-3L). Percentiles of the EQ-5D-3L show all over disperse patterns by the transition level whereas percentiles of the OHS show disperse patterns in ‘A little worse’ and ‘Much worse’ transition level. Model performance of the OKS and the knee EQ-5D-3L is provided in Supplementary Figure 1

The Box-Cox regression models were selected among other statistical average models (e.g. polynomial regressions) and median-based models (e.g. quantile regressions), after the model diagnostic assessments. The model is robust for a non-normal dependent variable, transforming it into a normal shape. The observational and estimated percentage improvements for the average population were calculated to examine if the instrument has a good discriminative ability. The individual level post-operative scores were modelled as a function of the transformed variables pre-operative linear, quadratic, and cubic terms and of the untransformed age, gender, and individual comorbidities. In comparison to the models with only pre-operative score terms, circulation and depression (which chi-squared statistics are greater than 2000 in the models and coefficients are significantly large, i.e. greater than absolute value 200) were selected to be adjusted for the hip outcomes. Circulation, diabetes, and depression were selected for the knee outcomes based on the same criteria.

The 3rd degree left-hand-side-only model obtaining the maximum likelihood estimates is as below for the OHS:

$$ {y}_i^{\theta }={\beta}_0+{\beta}_1{x}_i+{\beta}_2{x}_i^2+{\beta}_3{x}_i^3+{\gamma}_1{z}_{1i}+{\gamma}_2{z}_{2i}+{\gamma}_3{z}_{3i}+{\gamma}_4{z}_{4i}+{\varepsilon}_i $$
(5)

where ε~N(0, σ2). y indicates the changed-operative score, and x indicates pre-operative score. y is subject to a Box-Cox transform with parameter θ. z1, z2, z3 are untransformed age, gender, circulation, and depression [26].

Results

Demographics

In total, 181,423 had hip replacement surgeries; over half (N = 106,493; 59%) were female with ages ranging from 13 to 100 years (SD 10.5; male, 15–99, SD 10.4), with a mean age of 68.6 years (male, 67.2 years). At baseline, of the total, 14% (N = 24,945) patients reported no comorbidity, 38.2% (N = 69,249) reported that they have one comorbidity, and 17.8% (N = 3234) have more than three comorbidities. 5.4% (N = 9866) reported circulation, diabetes 8.7% (N = 15,816), and depression 7.3% (N = 13,252).

For the knee replacement population, over half (N = 107,127; 56%) were female with ages ranging from 18 to 99 years (SD 9.1; male, 16–102, SD 8.6), with a mean age of 69.3 years (male, 69.3 years). At baseline, of the total, 9.3% (N = 17,712) patients reported no comorbidity, 33.3% (N = 63,804) reported that they have one comorbidity, and 23.6% (N = 45,200) have more than three comorbidities. Seven percent (N = 13,438) have reported circulation, diabetes 12.4% (N = 23,696), and depression 8.3% (N = 15,823) (Table 2).

Table 2 Baseline covariates

Transition level

For the hip replacement surgery population, a great number of 155,899 (85.9%) patients answered much better. 15,565 (8.6%) patients answered a little better. Relatively smaller number of patients answered about the same 3891 (2.1%), a little worse 2382 (1.3%), and much worse 1633 (0.9%). For the knee replacement surgery population, 138,407 (72.3%) and 31,650 (16.5%) patients answered much better and a little better, respectively. 8985 (4.7%) patients answered about the same. 7029 (3.7%) patients answered a little worse and 4610 (2.4%) patients answered much worse (Table 3; Supplementary Table 1).

Table 3 The transition question (change score)

The Spearman’s rank correlation coefficients for the pre- and post-operative scores, r, are provided by the transition level in Table 4. The large correlations between of the pre- and post-operative scores are observed in patients with the transition level of about the same, a little worse, and much worse.

Table 4 Spearman’s rank correlation coefficients (95% CIs) for the change (pre- and post-operative) scores

Univariate responsiveness measures for the paired data

The OHS and the OKS showed great univariate responsiveness in total, i.e. SRM [Eq. 1], SES [Eq. 2], and RI [Eq. 3] in total: 1.8, 2.8, and 0.6 (~ 0.7) in the OHS and 1.4, 2.5, and 0.7 in the OKS. In addition, the OHS and the OKS showed distinctive differences in the SRM [Eq. 1] by the 3-level transition, in particular, a little better vs. about the same vs. much worse: 1.5 (~ 1.6) vs. 0.8 (~ 0.9) vs. 0.3 (~ 0.4) in the OHS and 1.5 vs. 0.8 (~ 0.9) vs. 0.3 (~ 0.4) in the OKS. There was little difference among the 3-level transition for the SES: 1.7 vs. 1.3 (~ 1.4) vs. 1 (~ 1.1) in the OHS and 1.7 vs. 1.2 vs. 1 in the OKS (Tables 5 and 6).

Table 5 Hip – the SRM, SES, and RI (with 95% CIs) for the OHS and the EQ-5D-3L (by the transition)
Table 6 Knee – the SRM, SES, and RI (with 95% CIs) for the OKS and the EQ-5D-3L (by the transition)

The univariate responsiveness in total for the generic instrument EQ-5D-3L were 1.1, 1.6, and 0.3 (~ 0.4) for the hip and 0.8 (~ 0.9), 1.3, and 0.3 for the knee replacement. The SRMs [Eq. 1] by the 3-level transition were 0.8 vs. 0.5 vs. 0.1 (~ 0.2) for the hip and 0.7 (~ 0.8) vs. 0.4 (~ 0.5) vs. 0.1 (~ 0.2) for the knee replacement. The SES values were similar to each other among the 3-level transition: 1.4 vs. 1.3 vs. 1.1 for the hip and 1.2 vs. 1.3 vs. 1.2 for the knee replacement.

The RI [Eq. 3] was calculated in total only as the calculation incorporates with the 2-level transition (i.e. a little better vs. about the same) in it. The RIs [Eq. 3] in total were 0.6 (~ 0.7) in the OHS and 0.7 in the OKS, which are moderate practical effects by Cohen’s thresholds (i.e. > 0.8 large, 0.5 to 0.8 moderate, and < 0.5 small) [21, 27]. The RIs [Eq. 3] in total for EQ-5D-3L showed negligible practical effects, 0.3 (~ 0.4) for the hip and 0.3 for the knee replacement. The SRM [Eq. 1] and SES [Eq. 2] can be interpreted similarly. The SRM [Eq. 1] and SES [Eq. 2] of ‘A little better’ in the OHS were 1.6 and 1.7, respectively. Both can be interpreted as a crucial difference in the ‘successful’ percentage in each of the two groups (r) of 0.62 [28]. The SRM [Eq. 1] and SES [Eq. 2] of ‘A little better’ in the EQ-5D-3L were 0.8 and 1.4, respectively, which can be interpreted as moderate and crucial differences in the ‘successful’ percentage in each of the two groups (r) of 0.37 and 0.57 [28]. This implies the SRM [Eq. 1] shows a good discriminative ability for the different severities in comparison to the SES [Eq. 2], and EQ-5D-3L is less responsive in comparison to the OHS.

The paired data-specific MCID as the threshold for improvement

The paired data-specific MCID [Eq. 4] was calculated, applying the SRM [Eq. 1] as a desired ES. Multivariate responsiveness was examined using the defined capacity of benefit score as improvement (i.e. 22 for the OHS, and 0.428 for the hip EQ-5D-3L; 16 for the OKS and 0.309 for the knee EQ-5D-3L)Footnote 2, adjusting covariates. Various ways to assess the improvement for the independent data are presented in Supplementary Table 2. Those scores are smaller than the capacity of benefit scores for the paired data. The SRM applied MCIDs for the independent data are 6 for the OHS, and 0.196 for the hip EQ-5D-3L, using Cohen’s medium (0.5) effect size. The MDCs (minimal detectable changes, defined as the minimal change that falls beyond the measurement error in the measurement score [29]) are 6 for the OHS and 0.234 for the hip EQ-5D-3L, with ICC 0.9. The anchor-based MCIDs are 9 for the OHS, and 0.101 for the hip EQ-5D-3L, using the short distance. The mean change scores using the anchor are 6 for the OHS, and 0.106 for the hip EQ-5D-3L. A greater capacity of benefit score is required for the paired data in comparison to the independent data, to detect how likely the surgery is to distinguish an actual effect from one of chance in the pre- and post-operative outcomes.

Multivariate responsiveness measures – observed and predicted improvement

The percentage improvements based on patients’ perceptions were high in the OHS and the OKS (Tables 7 and 8). The percentages of the observed (predicted) total improvement were 51 (54)% in the OHS and 73 (58)% in the OKS. In addition, the OHS and the OKS showed distinctive percentage differences by the 3-level transition, i.e. a little better vs. about the same vs. a little worse. As an example, the observed percentages of the 3-level transition were 10% vs. 4% vs. 1% in the OHS and 21% vs. 6% vs. 3% in the OKS. The percentages of the observed (predicted) total improvement in the generic instrument EQ-5D-3L were 44 (48)% for the hip and 42 (44)% for the knee replacement population. The observed (predicted) percentages of the 3-level transition in the EQ-5D-3L were 39 (41)% vs. 29 (11)% vs. 21 (4)% for the hip and 39 (45)% vs. 32 (36)% vs. 26 (14)% for the knee replacement population.

Table 7 Hip – patients’ perception of improvement (%) (using the paired data-specific MCID [Eq. 4])
Table 8 Knee – patients’ perception of improvement (%) (using the paired data-specific MCID [Eq. 4])

The observed (predicted) percentage improvements applied the Cohen’s ES (0.5 and 0.8) are additionally provided in Supplementary Table 3 and 4 for the independent data. The observed (predicted) percentages for the medium improvement were 93 (99)% in the OHS, and 85 (98)% in the OKS. The observed (predicted) percentage improvements in the EQ-5D-3L were 75 (74)% for the hip and 60 (58)% for the knee replacement population. The observed (predicted) percentages of the 3-level transition were 78 (90)% vs. 52 (57)% vs. 34 (19)% in the OHS, and 73 (85)% vs. 46 (42)% vs. 29 (8)% in the OKS. The observed (predicted) percentages of the 3-level transition in the EQ-5D-3L were 50 (52)% vs. 38 (50)% vs. 29 (41)% for the hip and 45 (48)% vs. 36 (47)% vs. 29 (42)% for the knee replacement population.

A great number of patients (86% for hip and 72% for knee) answered much better for success of the surgery (Table 2). In addition, the greater capacity of benefit score was applied for the calculation of the paired data-specific percentage improvement. Therefore, overall percentages (%) of patients’ perception of improvement are lower in comparison to the improvement for the independent data. There were much distinctive percentage differences by the transition level when the paired data-specific capacity of benefit score was applied for the calculation.

Model performance

The area under the ROC curve (AUC) with 95% binomial exact confidence intervals was calculated to examine discriminative ability with each MCID assuming as the true improvement status, using the patient rating instruments, i.e. OHS, OKS, and EQ-5D-3L (Tables 7 and 8) for the observational data.

Internal validation

Internal validation was performed by examining what sensitivity there is within the dataset to the period: NHS PROMs linked to HES 2009–2011 vs. 2012–2015. There was no significant sensitivity by two-period (Supplementary Figure 2).

Discussion

The paired data-specific sensitivity of the EQ-5D-3L, the OHS and the OKS were investigated to detect changes in the health state over time for the population who underwent hip or knee surgeries in the UK. To ensure accuracy of the health status and instrument evaluation in hip and/or knee replacement surgery, the paired data-specific SRM was examined for the univariate responsiveness. In addition, the SES and the RI were calculated using the patients' self-assessed transition. Multiple responsiveness metrics were applied, including a robust modelling approach that adjusted significant baseline covariates to estimate percentage improvements. From the modelling approach, the paired data-specific observed (and the predicted) percentages of improvement were distinctive by the transition level (Tables 7 and 8). The multivariate modelling method provided robust responsiveness statistics in terms of adjusting the patient demographic information and comorbidities. Responsiveness from the models was interpretable with a percentage scale of improvement.

A greater capacity of benefit score is applied to a calculation of improvement for a paired data. Therefore, overall percentages (%) of patients’ perception of improvement are relatively low. The missing cases of predicted improvement by certain transition levels are inevitable for the Oxford questionnaires which have ceiling effects where a greater study population answered much better after the surgery.

Disease-specific and generic instruments are both available in the PROMS data in the UK, and they showed reasonable responsiveness as a health-related instrument that measures functional state. A previous study using the NHS patient-reported outcome measures (PROMs) supports moderate correlations (0.3 to 0.6) between the EQ-5D-3L and other measures of patient-reported health changes, including the OHS and the OKS [30]. Nonetheless, there has been a lack of evidence to support the ability to discriminate. In terms of detecting clinically significant changes in arthroplasty surgery, although it has not been firmly fixed yet, a number of studies indicated that disease-specific instruments are more responsive than generic instruments [4, 31,32,33,34,35]. The present study showed that, although the responsiveness was greater and more distinctive in the disease-specific instruments, the responsiveness of the EQ-5D-3L for hip and knee surgery are reasonably good. The EQ-5D would be useful in terms of short completion time and good validity [3]. Nevertheless, it may not be sufficiently sensitive to be used solely in hip and/or knee replacement surgery, either to discriminate between cases of differing severities by a transition question or to detect the changes in severity or functional status over time [21].

The accurate identification and the early stage of stratification of patients undergoing hip and/or knee replacement are one of the greatest unmet needs. A robust and precise measurement instrument will be effective in the management of arthroplasty surgery for particular group of patients. The OHS and the OKS have been provided evidence that the instruments are able to contribute to the better management of arthroplasty surgery. In general, arthroplasty surgery is based on an individual level in terms of a patient’s expectations, symptoms, diagnoses, and degree of pain. Although the excellence of the Oxford questionnaires over other patient-reported questionnaires was examined, the Oxford questionnaires have a ceiling effect, and the threshold levels are always a trade-off between sensitivity and specificity. Moreover, the current version of the OHS or the OKS does not contain a psychological measurement such as depression or anxiety which is also important in health outcome. Further investigation is required about their potential roles of clinical or trial use, cost-effectiveness, and their effects on referral patterns.

Strengths and limitations

The strength of this study includes using a large cohort data linked to HES on both hip and knee replacement surgeries that provided enough power to support the research outcomes. Although the sample size is large enough to validate the improvement values using complete-case analysis, validation by an external data set was not conducted. The study design may be suboptimal compared to a well-blinded randomized clinical trial. Additional care may be required in the interpretation of patients’ socio-demographics, clinical/treatment and other unobserved covariates that may not be adjusted.

A secondary transition was not used in the study. The NHS PROMs data contains only one-point transition measurement (6 months post-operation) and a more objective point assessment may need to be considered [36]. The mean change score using a patient-reported transition (i.e. an anchor approach) has a limitation, in that the one-point transition measurement relies on a patient’s memory in global health status, and it could be a more subjective change measurement in contrast with each of the pre- and post- point assessments [36]. In addition, the measurement errors should account for repeatedly measured patient-reported outcomes. There will be several ways to control the errors such as use of the MDC approach (i.e. the threshold for improvement adjusted for measurement error) or applying advance statistical inference approaches such as Bayesian models with computational methods. Potential limitations or difficulties would be the fact that it is not easy to precisely estimate a percentage improvement using the model fitting with the EQ-5D-3L due to the nature of the real number scales (− 0.59 to 1), and the scale is very dispersed (Supplementary Figure 3).

Conclusions

The paired data-specific responsiveness was investigated in the population from the NHS PROMs data who underwent hip or knee surgery in the UK. The OHS and the OKS showed good discriminative abilities in the clinically significant changes, and the EQ-5D-3L also showed comparatively moderate responsiveness. Using the paired data-specific capacity of benefit scores, the OHS and the OKS showed distinctive differences of clinically significant chances by the level of the transition, in particular for the 3-level transition, i.e. a little better, about the same, and a little worse. This is useful in clinical practice as rationale for access to surgery at the individual-patient level. The study finding supports the idea of using a precise estimation of improvement and appropriate instruments in arthroplasty surgery. It seems that a generic measure would be beneficial to use along with the disease-specific instruments in terms of cross-validation unless an enhanced instrument has been developed, or a specific reason is required in the reporting system.

Availability of data and materials

The datasets generated and/or analysed during the current study are not publicly available due to the licence. The online supplementary document for research findings is available.

Change history

Notes

  1. 1.

    A couple of patients (N = 230) that have the same procedure (as identified by “PROMS_PROC_CODE”) carried out on the same side of the body more than once, e.g. two primary hip replacements of the left hip. It was decided that it might be best to treat them as revisions; these duplicates were further sorted by the variables “EPISODE_MATCH_RANK” and “EPIKEY”. The former variable will ensure that the highest quality match between HES and PROMs is listed first, and the latter variable will ensure that a unique sorting is produced. Based on this sorting, the second observations were dropped out from the duplicates one (N = 602,287).

  2. 2.

    Rounded up to nearest whole number: 21.1 for the OHS and 15.8 for the OKS

Abbreviations

OHS:

Oxford Hip Score

OKS:

Oxford Knee Score

PROMs:

Patient-reported outcome measures

HES:

Hospital Episodes Statistics

SRM:

Standardized response mean

SES:

Standardized effect size

RI:

Responsiveness index

MCID:

Minimally clinically important difference

MDC:

Minimal detectable change

ROC:

Receiver operating characteristic

AUC:

Area under the ROC curve

References

  1. 1.

    Duncan, P.W., Chapter 9 - Outcome measures in stroke rehabilitation, in Handbook of clinical neurology, M.P. Barnes and D.C. Good, Editors. 2013, Elsevier. p. 105-111.

  2. 2.

    Ranganathan P, Pramesh CS, Buyse M. Common pitfalls in statistical analysis: clinical versus statistical significance. Perspectives in clinical research. 2015;6(3):169–70.

    Article  Google Scholar 

  3. 3.

    Noyes J, Edwards RT. EQ-5D for the assessment of health-related quality of life and resource allocation in children: a systematic methodological review. Value Health. 2011;14(8):1117–29.

    CAS  Article  Google Scholar 

  4. 4.

    Middel B, van Sonderen E. Statistical significant change versus relevant or important change in (quasi) experimental design: some conceptual and methodological problems in estimating magnitude of intervention-related change in health services research. Int J Integrated Care. 2002;2:e15.

    Article  Google Scholar 

  5. 5.

    Stewart AL, et al. Functional status and well-being of patients with chronic conditions. Results from the Medical Outcomes Study. Jama. 1989;262(7):907–13.

    CAS  Article  Google Scholar 

  6. 6.

    Devlin NJ, Brooks R. EQ-5D and the EuroQol Group: past, present and future. Appl Health Econ Health Policy. 2017;15(2):127–37.

    Article  Google Scholar 

  7. 7.

    Dolan P. Modeling valuations for EuroQol health states. Med Care. 1997;35(11):1095–108.

    CAS  Article  Google Scholar 

  8. 8.

    Horowitz E, et al. EQ-5D as a generic measure of health-related quality of life in Israel: reliability, validity and responsiveness. Israel Med Assoc J. 2010;12(12):715.

    Google Scholar 

  9. 9.

    Mandy van Reenen, M.O. EQ-5D-3L User Guide. Basic information on how to use the EQ-5D-3L instrument, 2015.

  10. 10.

    Trust OUHN. Hip Surgery Questionnaire; 2008.

    Google Scholar 

  11. 11.

    Smith, S., et al., Patient-reported outcome measures (PROMs) for routine use in treatment centres: recommendations based on a review of the scientific evidence. 2005.

    Google Scholar 

  12. 12.

    Dawson J, et al. Questionnaire on the perceptions of patients about total hip replacement. J Bone Joint Surg. 1996;78(2):185.

    CAS  Article  Google Scholar 

  13. 13.

    Murray DW, et al. The use of the Oxford hip and knee scores. J Bone Joint Surg. 2007;89(8):1010.

    CAS  Article  Google Scholar 

  14. 14.

    Husted JA, et al. Methods for assessing responsiveness: a critical review and recommendations. J Clin Epidemiol. 2000;53(5):459–68.

    CAS  Article  Google Scholar 

  15. 15.

    England, N., National PROMs Programme Guidance. 2017, NHS England: Insight & Feedback Team, NHS England.

  16. 16.

    Price A, et al. The Arthroplasty Candidacy Help Engine tool to select candidates for hip and knee replacement surgery: development and economic modelling. Health Technol Assess. 2019;23(32):1–216.

    Article  Google Scholar 

  17. 17.

    Keurentjes JC, et al. Minimal clinically important differences in health-related quality of life after total hip or knee replacement: a systematic review. Bone Joint Res. 2012;1(5):71–7.

    CAS  Article  Google Scholar 

  18. 18.

    Portney, L.G. and M.P. Watkins, Foundations of clinical research : applications to practice. 2000, Upper Saddle River, N.J.: Prentice Hall Health.

  19. 19.

    de Vet HC, et al. Minimally important change determined by a visual method integrating an anchor-based and a distribution-based approach. Qual Life Res. 2007;16(1):131–42.

    Article  Google Scholar 

  20. 20.

    Liang MH, Fossel AH, Larson MG. Comparisons of five health status instruments for orthopedic evaluation. Medical Care. 1990;28(7):632.

    CAS  Article  Google Scholar 

  21. 21.

    Cohen J. Statistical power analysis for the behavioural sciences. 2nd ed. New Jersey: Lawrence Erlbaum Associates; 1977.

    Google Scholar 

  22. 22.

    Guyatt GH, Bombardier C, Tugwell PX. Measuring disease-specific quality of life in clinical trials. Cmaj. 1986;134(8):889–95.

    CAS  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chronic Dis. 1987;40(2):171–8.

    CAS  Article  Google Scholar 

  24. 24.

    Tuley MR, Mulrow CD, McMahan CA. Estimating and testing an index of responsiveness and the relationship of the index to power. J Clin Epidemiol. 1991;44(4-5):417–21.

    CAS  Article  Google Scholar 

  25. 25.

    Oeffinger D, et al. Outcome tools used for ambulatory children with cerebral palsy: responsiveness and minimum clinically important differences. Developmental Med Child Neurol. 2008;50(12):918–25.

    CAS  Article  Google Scholar 

  26. 26.

    Lp, A.S.a.P.P.a.S., Stata base reference manual release 14. 2013, A Stata Press Publication: Stata Press, 4905 Lakeway Drive, College Station, Texas 77845.

  27. 27.

    Sivan M. Interpreting effect size to estimate responsiveness of outcome measures. Stroke. 2009;40(12):e709.

    Article  Google Scholar 

  28. 28.

    Coe R. It's the effect size, stupid: what effect size is and why it is important; 2002.

    Google Scholar 

  29. 29.

    Kovacs FM, et al. Minimum detectable and minimal clinically important changes for pain in patients with nonspecific neck pain. BMC Musculoskelet Disord. 2008;9:43.

    Article  Google Scholar 

  30. 30.

    Feng Y, Parkin D, Devlin N. Assessing the performance of the EQ-VAS in the NHS PROMs programme. Int J Quality Life Aspects Treatment Care Rehabilitation - Official J Int Society Quality Life Res. 2014;23(3):977–89.

    Article  Google Scholar 

  31. 31.

    Hawley DJ, Wolfe F. Sensitivity to change of the health assessment questionnaire (HAQ) and other clinical and health status measures in rheumatoid arthritis: results of short-term clinical trials and observational studies versus long-term observational studies. Arthritis Care Res. 1992;5(3):130–6.

    CAS  Article  Google Scholar 

  32. 32.

    Bessette L, et al. Comparative responsiveness of generic versus disease-specific and weighted versus unweighted health status measures in carpal tunnel syndrome. Med Care. 1998;36(4):491–502.

    CAS  Article  Google Scholar 

  33. 33.

    Vaile JH, et al. Generic health instruments do not comprehensively capture patient perceived improvement in patients with carpal tunnel syndrome. J Rheumatol. 1999;26(5):1163–6.

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Gliklich RE, Hilinski JM. Longitudinal sensitivity of generic and specific health measures in chronic sinusitis. Quality Life Research. 1995;4(1):27–32.

    CAS  Article  Google Scholar 

  35. 35.

    Wright JG, Young NL. A comparison of different indices of responsiveness. J Clin Epidemiol. 1997;50(3):239–46.

    CAS  Article  Google Scholar 

  36. 36.

    Angst F, et al. Minimal clinically important rehabilitation effects in patients with osteoarthritis of the lower extremities. J Rheumatol. 2002;29(1):131–8.

    Google Scholar 

Download references

Acknowledgements

The author is grateful to NHS Digital for providing the rich source of data to analyse. Professor Andrew Price, University of Oxford, contributed to the study design (i.e. use of the NHS PROMs linked to HES) and suggested the research theme (i.e. responsiveness of the EQ-5D-3L, the OHS, and the OKS). The author is also grateful to Professor Jonathan Cook, University of Oxford, who contributed to the initial stage review of the manuscript.

Funding

No funding was received.

Author information

Affiliations

Authors

Contributions

The author has conducted the design of the study, data management, and analyses. The author developed the methods and wrote the article in the Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Sujin Kang.

Ethics declarations

Ethics approval and consent to participate

The NHS PROMs/HES-linked data from Health and Social Care Information Centre was approved to access (ref: NIC-392690-F7H2Q). Formal ethic committee approval was not required as patient identifiable data was not requested for the analyses.

Consent for publication

Not applicable.

Competing interests

There are no conflicts of interest to disclose regarding any financial or personal relationship.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original version of this article was revised: the author notified us that the Supplementary document should be updated.

The research was conducted in The Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Windmill Road, Headington, Oxford, OX3 7LD, UK.

Supplementary Information

Additional file 1: Supplementary Table 1.

Descriptive statistics. Supplementary Table 2. The estimated improvement by other definitions for the independent data. Supplementary Table 3. Hip – patients’ perception of improvement (%) (using the Cohen’s ES (0.5 and 0.8) applied MCID). Supplementary Table 4. Knee – patients’ perception of improvement (%) (using the Cohen’s ES (0.5 and 0.8) applied MCID). Supplementary Figure 1. The OKS and EQ-5D-3L – total population (1, 3) and the transition level (2, 4). Supplementary Figure 2. The hip (1, 2) and the knee (3, 4) – total population in the NHS PROMs years. Supplementary Figure 3. Histograms of the OHS and the OKS changes (1, 3); Histograms of the EQ-5D-3L changes showed multimodal distributions (2, 4). Supplementary Figure 4. The OHS (1, 3, 5, 7, 9) and the OKS (2, 4, 6, 8, 10) proportion and probabilities of improvement by the transition level in the 4th degree fractional polynomial logistic regressions (using the Cohen’s medium ES (0.5) applied MCID).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kang, S. Assessing responsiveness of the EQ-5D-3L, the Oxford Hip Score, and the Oxford Knee Score in the NHS patient-reported outcome measures. J Orthop Surg Res 16, 18 (2021). https://doi.org/10.1186/s13018-020-02126-2

Download citation

Keywords

  • Hip and knee replacement
  • Patient-reported outcome
  • EQ-5D-3L Index
  • OHS
  • OKS
  • Internal responsiveness