Cross-cultural adaptation, reliability, validity and responsiveness of the Michigan Hand Outcomes Questionnaire (MHQ-Sp) in Spain

Martínez-Fernández, María Visitación; Sarabia-Cobo, Carmen María; Sánchez-Labraca, Nuria

doi:10.1186/s13018-024-04723-x

Research article
Open access
Published: 22 April 2024

Cross-cultural adaptation, reliability, validity and responsiveness of the Michigan Hand Outcomes Questionnaire (MHQ-Sp) in Spain

Journal of Orthopaedic Surgery and Research volume 19, Article number: 256 (2024) Cite this article

571 Accesses
Metrics details

Abstract

Background

The Michigan Hand Outcomes Questionnaire (MHQ) is a self-report tool widely recognized for measuring the health status of patients with hand and wrist problems from a multidimensional perspective. The aim of this study is to translate and culturally adapt the MHQ and validate its psychometric properties of validity, reliability, and responsiveness for different hand problems in Spain.

Methods

The MHQ was translated and culturally adapted following the recommendations of the American Association of Orthopaedic Surgeons. The validation process adhered to the current Consensus-Based Standards for the Selection of Health Status Measurement Instruments (COSMIN) group and was conducted on 262 hand patients. Reliability was assessed through internal consistency using Cronbach's alpha. The study evaluated the test–retest reliability of the measurements using the intraclass correlation coefficient (ICC). Additionally, the measurement error was calculated using the standard error of measurement (SEM) and smallest detectable change (SDC). To assess the structural validity, confirmatory factor analysis (CFA) was employed, while construct validity was evaluated using Pearson's correlation coefficient. Finally, responsiveness was assessed using effect size (ES), standardized response mean (SRM), and minimum clinically important difference (MCID).

Results

The reliability of the test was confirmed through internal consistency analysis, with a good Crombach's Alpha (0.82–0.85), and test–retest analysis, with good values of ICC (0.74–0.91). The measurement error was also assessed, with low values of SEM (1.70–4.67) and SDC (4.71–12.94)). The CFA confirmed the unidimensionality of each scale with goodness of fit indices, while the MHQ showed a high and negative correlation with DASH (r = − 0.75, P < 0.001) and DASH-work (r = − 0.63, P < 0.001) and was irrelevant with EQ-5D (r = − 0.01, P > 0.005) and grip strength (r = 0.05, P > 0.005). At week 5, all 222 patients across the three diagnosed hand subgroups showed moderate to high values above 0.92 for ES and SRM, with one MCID above 6.85.

Conclusions

The MHQ-Sp was culturally adapted, and the results of this version showed good reliability and validity as well as high responsiveness for a wide range of hand conditions after surgical or conservative treatment in Spain.

Background

Wrist and hand are common areas of upper limb injury [1, 2]. In Spain in 2022, fractures of the hand accounted for 29.7% of upper limb fractures, 7.6% of fractures in trauma emergencies [3] and 21.85% of work injuries [4]. Faced with these high figures, healthcare professionals are challenged to measure the extent and impact of these effects [5, 6]. For this reason, patient-reported outcome measures (PROMs) have been developed from the patient's perspective and without the intervention of a healthcare professional [7,8,9]. Adding another dimension to the results of clinical evaluation or treatment effectiveness [7].

With this in mind, the Michigan Hand Outcomes Questionnaire (MHQ) was created at the University of Michigan in 1998 by Chung et al. [10]. The MHQ has been developed under rigorous psychometric principles as a multi-dimensional measure of the health status of patients with all types of hand and wrist impairments. This questionnaire assesses the right and left hand separately to avoid the dominance effect, differentiates between functional status and symptoms, and provides two unique scales such as aesthetics and satisfaction [11]. The MHQ, along with the Disability of Arm, Shoulder and Hand (DASH) [12], is the most widely used hand PROM [6]. Their validity, reliability and responsiveness have been demonstrated in a wide range of conditions including carpal tunnel syndrome (CTS) [13, 14], distal radius fractures [14, 15], osteoarthritis [16] and Dupuytren disease [17].

On the other hand, we must consider that the use of PROMs is related to the role of the hand in different cultures, which affects task performance and therefore scale responses and scores, as well as psychometric properties [18]. This means that a validated and adapted instrument is more accurate in clinical and research practice [18]. The MHQ has been officially translated and validated in 14 countries [11, 19,20,21,22,23,24,25,26,27,28,29,30,31,32].

There is currently a need for a specific outcome measure for hand and wrist injuries in Spain that is valid, reliable and able to detect clinical changes. Therefore, the general objective of this study is to create a Spanish version of the MHQ through a first process of cultural adaptation and a second process of validation of its psychometric properties.

Methods

This descriptive, cross-sectional, psychometric validation study was conducted in a first stage of translation and cultural adaptation and a second stage of validity, reliability and responsiveness analysis. Permission was first requested and obtained from the authors with code MHQ IR code #3372.

All participants were randomly selected and previously diagnosed by a hand surgeon at the Mutua Montañesa Hospital in Santander, between January 2021 and September 2022. Inclusion criteria included patients aged between 18 and 65 years, of both genders, with acute trauma or neuromusculoskeletal involvement of the hand or wrist, and with sufficient Spanish to understand and complete the questionnaires. Exclusion criteria included patients with central nervous system problems, mental illness, behavioural disorders or involvement above the wrist. The recommended sample size was a minimum of 259 patients, following the principle of 4 to 10 patients per item for samples larger than 100 patients [33, 34].

Physiotherapy treatments included hydrotherapy, electrotherapy and manual therapy following surgical or conservative treatment.

Outcome measures. Sociodemographic data such as age, gender, dominant and affected hand were collected concurrently with baseline clinical measures. In order to ensure the response and participation rate, a continuous and personalised follow-up was carried out using self-administered and online electronic means, contacting patients in case of non-response, thus ensuring the total response rate. The following measures were used in this study:

Grip strength using the Baseline® Hydraulic Hand Dynamometer, which provides the average of the three measurements on each hand in a standardised seated position.

The MHQ [10] consists of 37 items assessing 6 domains: Overall hand function, activities of daily living (ADL), work performance, pain, aesthetics and satisfaction with hand function. It includes a Likert scale with response options from 1 to 5, with raw scores per domain converted to a range of 0 to 100 and the pain domain inverted. The total score is calculated as the sum of the six scores divided by six. The logarithm for its calculation is provided by the authors on page of the questionnaire [10, 35]. Higher scores indicate better function and, for the pain domain, greater severity. In this analysis, scores were recorded for the affected hand [10].

The DASH [12, 36], consists of a core module of 30 items measuring function and symptoms, as well as two optional 4-item modules that focus on music/sports and work. Each item consists of 5 response options, scored from 1 to 5. In the core module, the score ranges from 30 to 150 points, translated into a scale from 0 or no disability to 100 or more disability [12, 36]. The DASH-work module contains 5 items on a Likert scale from 1 to 5, all of which must be answered for a score to be calculated. It is scored from 0 to 100, with lower scores indicating better work ability.

European Quality of Live- 5 Dimensions (EQ-5D-3L) [37], is a generic instrument that includes in a first questionnaire five dimensions of health-related quality of life with three response options. Its calculation is based on a 5-digit number converted into a single index, with values recorded in Spain ranging from − 0.224 to 1 [38, 39], the lowest values indicating the worst health. For their use, permission and registration was requested from the authors via their website [40]. The EQ-5D includes a second section or "Visual Analogue Scale" (EQ-VAS) with scores from 0 to 100, ranging from worst to best health [41, 42].

Pain was measured using a visual analogue scale (VAS) on a scale of 0 to 100, where 0 is no pain and 100 is the most unbearable pain [43].

Translation and cross‑cultural adaptation

This initial stage followed the steps recommended by the guidelines of the American Association of Orthopaedic Surgeons (AAOS) [18] (See Additional file 1).

Stage 1: A direct translation and summary was carried out by two native Spanish translators (a hand surgeon and a philologist) to obtain the first Spanish version of the MHQ (T1 and T2). The translators (T1 and T2) conducted their translations independently.

Stage 2: Synthesis of translations. Comparison of the two documents and consensus synthesis with a hand surgeon produced the version (T-12).

Stage 3: A back-translation from Spanish to English was then carried out by two native English translators (English teachers) who produced the BT-12 version.

Stage 4: Committee of experts. The team consisted of a methodologist, a philologist, two hand surgeons, and two translators who evaluated the idiomatic, semantic, experiential, and conceptual equivalences. The report's pre-final version was obtained. The semi-structured interview was conducted by the principal researcher of the study.

Stage 5: A pre-test or pilot study was conducted on a sample of 30–40 patients not included in the general sample [18, 44,45,46]. Content validity was then assessed by expert judgement by calculating Kendall's w concordance index, where 1 is perfect agreement and 0 is total disagreement[47]. Semi-structured interviews were conducted with patients to assess difficulty. Observations with more than 15% difficulty were considered for modification. The time taken to complete the questionnaire was recorded.

Stage 6: Finally, the final version of MHQ-Sp was produced and submitted with all reports to the authors for final approval (See Additional file 2).

Psychometric testing of the MHQ-Sp

In this second part, the recommendations of the current Consensus-Based Standards for the Selection of Health Status Measurement Instruments (COSMIN) Group [48,49,50] were followed.

Internal consistency

Internal consistency is the degree of interrelationship between items of the same measurement construct [51]. It was calculated using Cronbach's α index for the baseline scores, with values between 0 and 1, with ≥ 0.70 considered adequate, up to 0.9 good and above 0.9 redundant [52].

Test–retest reliability

Test–retest reliability is the degree to which repeated measurements show similar results, based on the stability of patients on the construct [51]. It was estimated by the intraclass correlation coefficient (ICC) with values between 0 and 1, with > 0.70 considered as good reliability [52]. The MHQ was administered a second time at 10–15 days [53], without treatment, under the same conditions of administration and without prior knowledge of the previous measurements. The MHQ questionnaire was administered once the patient was diagnosed and before starting the first physiotherapy session.

Meauremnet error

The measurement error expresses systematic or random errors in the scores that are not due to changes in the construct [51]. The standard error of measurement (SEM) with the formula: the difference between the test–retest per √2 and the smallest detectable change (SDC) with the formula: 1.96 × √2 × SEM was used [50, 54].

Structural validity

Is defined as the degree to which the scores on the instrument reflect the dimensionality of the construct [51]. First, an exploratory factor analysis (EFA) was carried out using principal component analysis with Varimax orthogonal rotation. A confirmatory factor analysis (CFA) was then carried out to check whether the factor structure had correct goodness of fit indices, using the metrics: Tucker-Lewis Index (TLI) (0.95–1); Comparative Fit Index (CFI) (0.95–1); Standardized Root Mean Square Residual (SRMR) < 0.08; Root Mean Square Error of Approximation (RMSEA) < 0.06; Akaike Information Criterion (AIC); Expected Cross-Validation Index (ECVI); Chi-square (χ2) and chi-square divided by degrees of freedom χ2/gl (1.5–3) to assess the model fit [55].

Construct validity

Is the relationship of an instrument's scores to other measures according to the theoretical hypothesis about the constructs being measured [52, 56]. In hypothesis testing, the instruments chosen for convergent validity, or measures with similar constructs [50], were DASH and DASH-work [12, 36] and for discriminant validity, or measures with different constructs [50] were EQ-5D [37] and grip strength, using Pearson's correlation coefficient. In accordance with the recommendations of the COSMIN group [48,49,50], three hypotheses have been proposed for convergent validity: (1) MHQ and DASH correlate highly and negatively, (2) MHQ-work correlates at least moderately and negatively with DASH-work, and (3) MHQ function correlates at least moderately with MHQ-ADL. For discriminant validity, two hypotheses were formulated: (1) the MHQ correlates weakly with grip strength and (2) the MHQ correlates weakly and negatively with the EQ-5D. For the size of the correlation, the following rule of thumb was used: low 0.30 < r < 0.50; medium 0.50 < r < 0.70 and high 0.70 < r < 0.90 [57].

Responsiveness

Is the ability of a PROM to detect clinically important changes in the measured construct over time [52, 56]. Analysis was performed at 5 weeks post-treatment, starting with the weighting of changes between baseline and post-treatment scores by: descriptive analytical approach in box plots for subgroups, t-student for average of differences, effect size (ES) calculated by: mean change/DE baseline measurement and standardised response mean (SRM): mean change/DE change. The ES and SRM values reflect sensitivity to change with 0.20 indicating low, 0.50 moderate and 0.80 high [58, 59]. The minimun clinically important difference (MCID) was then calculated to indicate the effectiveness of physiotherapy in the three subgroups. The anchor method was used by observing the sample at one point in time and grouping patients into categories according to external criteria of satisfaction [60]. According to the COSMIN group recommendations [51], the hypotheses were: (1) improvement in nerve injury patients would be less than in radius fracture patients, and (2) the ES of MHQ-work and DASH-work would be equivalent.

Finally, the interpretability [51] was assessed by the area under the curve (AUC) using the receiver operating characteristic (ROC) curve to discriminate between different levels of function. AUC values range from 1 to 0.5, indicating better to worse discriminatory ability [61].

Statistical análisis. Means and standard deviations (SD) were used for quantitative variables and frequencies and percentages for categorical variables. IBM SPSS Statistics 25.0 was used to calculate psychometric properties, ROC curve and box plots (Figs. 1, 2) and SPSS Amos 20 was used for factor analysis. The sample size was calculated according to the recommendations of Terwee et al. [52] and Vet et al. [34] of 7 patients per item and samples larger than 100 patients.

Results

The sample of patients who completed all items of the questionnaires was 262 patients with various musculoskeletal conditions of the hand or wrist, out of 286 invited to participate (Table 1). (24 out of 286 patients were excluded from the study because they did not return the questionnaire once it was administered or they returned it unanswered or their treatment was changed after administration).

Table 1 Demographic data of patients (n = 262)

Full size table

Of the 262 patients, 145 (55.34%) received surgery and physiotherapy, 105 (40.07%) received conservative treatment with physiotherapy, and 12 (4.59%) received other conservative treatments such as immobilization or medication. Physiotherapy treatments included hydrotherapy, electrotherapy and manual therapy following surgical or conservative treatment.

Translation and cross-cultural adaptation

Translation and back-translation processes were carried out during the preparation of their reports. The changes were minor and agreed by consensus and concerned the response options in the function domain:—In the 1st domain Function: Within the options of this domain the adjectives "fair, poor or very poor" translated as "fair, scarce or very scarce", the adverbs "regular, bad or very bad" were chosen, being more appropriate to the question.

In the 4th domain Pain: In the 3rd question of the pain domain "interfere" was replaced by "caused alterations" and in the 5th question the adjective "unhappy" was replaced by "negatively affected his mood”.

The pre-final version was used in the pilot study with a sample of 33 patients, in which content validity was performed with an inter-expert agreement of Kendall's (w = 0.8, p < 0.001). The final version and all reports were sent to the authors for approval by the Michigan Center for Hand Outcomes and Innovation Research. Difficulty was less than 15% and the average time taken to complete the MHQ was 12 min.

Validation of psychometric properties

Internal consistency, test–retest reliability and measurement error

Internal consistency was calculated using the baseline scores of the 262 patients with adequate Cronbach's alpha values ranging from 0.821 to 0.858, which did not improve when we eliminated any of the domains. Test–retest reliability was assessed on a sample of 64 patients from the general sample who completed the MHQ for the second time. Good reliability was obtained with ICC values ranging from 0.74 to 0.91. In the measurement error analysis, the SEM was 1.8 and the SDC was 4.99 for the total MHQ score, indicating a tendency towards consistency for the individual scores and relatively little effect of measuring error (Table 2).

Table 2 Internal consistency, test–retest reliability, measurement error and floor/ceiling effects of MHQ

Full size table