A meta-analysis of measurement properties of the Western Ontario Meniscal Evaluation Tool (WOMET)
Journal of Orthopaedic Surgery and Research volume 15, Article number: 569 (2020)
We provide a meta-analysis for clinicians and researchers regarding the psychometric properties of the WOMET as a patient-reported outcome measure (PROM) for patients with meniscal pathologies.
A comprehensive literature search identified 6 eligible papers evaluating WOMET measurement properties in patients with different meniscal injuries and meniscal treatments following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. The quality of the included studies was evaluated using the four-point Consensus-based Standard for the selection of health Measurement Instruments (COSMIN) Checklist for good measurement properties. The checklist was specifically developed for studies on health-related PROMs.
Our meta-analysis suggests that the WOMET can be used to evaluate patients with different meniscal injuries and meniscal treatments, especially acute or chronic meniscal injuries and traumatic or degenerative meniscal injuries treated operatively or conservatively. The WOMET shows satisfactory internal consistency, test-retest reliability, and construct validity. Due to limitations in both sample sizes and methodologies of the included studies, no conclusions can be drawn regarding the WOMET’s content validity, structure validity, cross-cultural validity, measurement error, or responsiveness. A further limitation of the studies included in this meta-analysis is the lack of cross-cultural validation, although recommended by the COSMIN Standards.
The first meta-analysis on measurement properties of the WOMET demonstrates satisfactory internal consistency, test-retest reliability, and construct validity. Further studies are needed, focusing on the methodological deficiencies highlighted in this meta-analysis. To ensure that the WOMET adequately reflects the symptoms, functions, and quality of life of patients with meniscal tears based on COSMIN criteria, it is necessary to assess the structural validity and content validity of this PROM.
In the diagnosis and treatment of knee pain, patient-reported outcome measures (PROMs) are often used to assess symptoms and effects of therapeutic interventions. PROMs thereby focus on the patients’ evaluation of their status of health. Since PROMs are a central part of medical research and clinical practice, it seems crucial to assess their measurement quality.
Meniscal pathologies are a common knee injury, which can be classified into traumatic or sports-related tears or degenerative knee lesions . The treatment outcomes of knee pathologies have traditionally focused on clinical examination, radiographic imaging, or assessment of range of motion. In recent years, PROMs have gained more importance in evaluating treatment effects, considering patients’ expectations and evaluations of interventions.
One such PROM to evaluate patients with meniscal tears is the Western Ontario Meniscal Evaluation Tool (WOMET). The WOMET was developed by Kirkley et al. in 2007  to assess health-related quality of life (HRQoL) in patients with meniscal tears. The WOMET consists of 16 items along three dimensions (section A: physical symptoms (9 items); section B: sports/recreation/work/lifestyle (4 items); section C: emotions (3 items)). All items are measured and weighed on VAS (visual analog scales); the maximum score is 100, which is converted into a percentage score. Although the WOMET has been extensively used, only a few studies have investigated its psychometric properties.
In the present article, we evaluate the quality of the WOMET, using the COSMIN (Consensus-based Standard for the selection of health Measurement Instruments) Checklist for good measurement properties. The COSMIN is a validated and well-accepted tool to rate the quality of PROMs . Specifically, the COSMIN guidelines have been developed to assist researchers to determine the clinimetric and psychometric soundness of health-related, patient-reported outcomes [3,4,5]. This is the first meta-analysis that has assessed the psychometric properties of the WOMET as a PROM for patients with meniscal pathologies. By this, we aimed to provide statistical evidence for the quality of the WOMET using pooled data from single studies that investigated the measurement qualities of the WOMET .
This meta-analysis was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement . To evaluate measurement properties, we used the COSMIN Standard.
A structured search of the electronic databases MEDLINE via PubMed, EMBASE via OVID, and Cochrane Library was conducted in January 2019 with no language restrictions by two authors (NK and MB). In accordance with COSMIN recommendations, we applied the search filter described by Terwee et al.  and a Google Scholar reference check for additional studies. The structured search strategies were designed using the following search terms in the categories: Constructs (HR-PRO or HRPRO or HRQL or HRQoL or QL or QoL or quality of life or health status), Target population (meniscal pathology or meniscal tears or Tibial Meniscus Injuries or Menisci, Tibial), Measurement Instrument (WOMET OR Western Ontario Meniscal Evaluation Tool AND meniscal tears OR meniscal injuries), and Measurement properties (sensitive COSMIN search filter for measurement properties in MEDLINE).
We included original research articles, systematic reviews, and validation studies with the aim to evaluate at least one measurement property (i.e., reliability, validity, or responsiveness) in accordance with the COSMIN taxonomy . Participants in the selected studies were suffering from any form of meniscal pathologies. Thus, meniscal pathology and WOMET were the constructs of interest for the present meta-analysis (i.e., relevant records). Included articles assessed the measurement properties, development, or interpretability of the WOMET in a majority patient population of adults with meniscus pathologies. The majority is defined as equal to or greater than 50% of the sample.
We excluded studies that evaluated the treatment efficacy without assessing measurement properties and studies in which the WOMET was used to validate another instrument (i.e., inappropriate study designs). Also, studies with less than 50% of patients having a meniscal tear as the primary diagnosis (i.e., without other significant knee pathologies, for example, concomitant anterior cruciate ligament (ACL) rupture) were excluded unless the meniscal tear group was reported separately.
The results of the database searches were examined regarding their titles and abstracts. Then, the selected full-text articles were examined regarding the previously described eligibility criteria (Fig. 1).
We used spreadsheets to extract sample characteristics and measurement property results for each selected study.
Evaluation of studies and measurement properties
The quality of the included studies was evaluated by the authors using the four-point COSMIN Checklist for good measurement properties [5, 8]. The checklist was specifically developed for studies on health-related PROMs. It can be used to assess whether a study meets the standard for good methodological quality regarding the following properties: internal consistency, reliability, measurement error, content validity, construct validity, criterion validity, and responsiveness. The checklist is a validated tool comprising 10 sections, with each assessing a separate measurement property . Scoring of single studies follows the concept of “worst score counts” in order to account for poor methodological aspects in the meta-analysis . For example, if for a validation study, one item in the COSMIN check box is rated as “inadequate”, the overall methodological quality of that validation study is rated as “inadequate.” For studies for which quality ratings already existed on the COSMIN website, we adopted those pre-existing scores for our meta-analysis (i.e., the scores obtained in the systematic review by Abram et al. ). We also rated the evidence from the studies regarding its quality and bias according to the COSMIN risk of bias checklist (see Appendix, Tables 3 and 4).
We conducted a meta-analysis consulting the MAVIS Meta-Analysis via Shiny software (version 2.1; Hamilton, Aydin, and Mizumoto, 2014 ), to assess the general effect sizes of the measurement properties of the WOMET. We used a random effects model to analyze all six studies. The random effects model assumes that the studies in the meta-analysis come from populations with different average effect sizes; that is, population effect sizes can be explained as being sampled from a larger universe of studies . Random-effects models therefore allow to generalize the findings beyond the studies included. We used the random effects model because the studies were drawn from populations that differ from each other in ways that could impact the effects (e.g., patients with traumatic or degenerative meniscal tears, meniscal tears with or without osteoarthritis).
We calculated the effect sizes using Pearson’s correlation coefficient r. This coefficient is a standardized form of covariance between two variables and is able to measure the strength of the relationship between continuous variables. After calculating the effect sizes from each study, we tested the heterogeneity of the effect sizes for each measurement property.
After the literature search, the evaluation of measurement properties, and the quality rating of the six included studies [2, 11,12,13,14,15], we tested the heterogeneity of the effect sizes of the included studies in order to examine the overall effect (see Table 1 for an overview of the psychometric properties reported in the single studies and Table 2 for an overview of the methodological qualities of the single studies).
The test for heterogeneity revealed that the effect sizes for the measurement property internal consistency did not significantly differ between the five studies included, p = 0.677, indicating homogeneous effects of the five studies (H = 1.00, 95% CI [1.00, 1.67], I2 = 0%). We excluded the study by van der Wal et al. , because it did not report overall internal consistency of the WOMET, but instead reported internal consistency of its subscales. The estimated overall correlation coefficient was r = 0.91, 95% CI [0.90, 0.92], z = 42.20 p < 0.001 (see Fig. 2 in Appendix). The test revealed no indication for publication bias, with 3005 non-significant studies necessary to make the result non-significant.
The test for heterogeneity revealed that the effect sizes for the measurement property test-retest reliability did significantly differ between the five studies included, p < 0.001, indicating heterogeneous effects of the five studies (H = 2.57, 95% CI [1.72, 3.83], I2 = 84%). The estimated overall correlation coefficient was r = 0.88, 95% CI [0.80, 0.93], z = 9.90 p = 0.001 (see Fig. 3 in Appendix). The test revealed no indication for publication bias, with 1231 non-significant studies necessary to make the result non-significant.
The included studies evaluated construct validity using different scales. The studies by Celik  and Tong  tested the WOMET against different subscales of the short-form health survey 36 (SF-36) . Kirkley , Celik , and Shivonen  tested the WOMET against the Lysholm Scale , and Tong  and van der Waal  used the International Knee Documentation Committee Subjective Knee Form (IKDC)  to assess the WOMET’s construct validity.
The following subscales from the SF-36 were investigated: physical functioning, bodily pain, role-emotional, and mental health.
For the subscales physical functioning, bodily pain, and mental health, the test for heterogeneity revealed that the effect sizes for construct validity did not significantly differ between the two studies, ps > 0.103, indicating homogeneous effects of the two studies (physical functioning: H = 1.62, I2 = 62%; bodily pain: H = 1.00, I2 = 0%; mental health: H = 1.00, I2 = 0%). The estimated overall correlation coefficients were moderate to high (physical functioning: r = 0.62, 95% CI [0.46, 0.74], z = 6.42, p = 0.001; bodily pain: r = 0.64, 95% CI [0.55, 0.71], z = 10.91 p = 0.001; mental health: r = 0.33, 95% CI [0.20, 0.44], z = 4.95 p = 0.001; see Figs. 4, 5, and 6 in Appendix). The tests revealed no indication for publication bias, with a moderate to high number of non-significant studies necessary to make the result non-significant (physical functioning n = 80; bodily pain n = 86; mental health n = 16).
For the subscale role-emotional, the test for heterogeneity revealed that the effect sizes for construct validity did significantly differ between the two studies, p = 0.036, indicating heterogeneous effects of the two studies (H = 2.10, I2 = 77%). The estimated overall correlation coefficient is r = 0.18, 95% CI [− 0.11, 0.43], z = 1.23 p = 0.217 (see Fig. 7 in Appendix). There was no indication for publication bias (n = 4 studies necessary for non-significant results).
For the Lysholm Score, the test for heterogeneity revealed that the effect sizes for construct validity did significantly differ between the three studies p = 0.343, indicating heterogeneous effects of the two studies (H = 1.03, 95% CI [1.00, 3.04], I2 = 6.3%). The estimated overall correlation coefficient was r = 0.56, 95% CI [0.46, 0.64], z = 9.60 p = 0.001 (see Fig. 8 in Appendix). The test revealed no indication for publication bias, with 108 non-significant studies necessary to make the result non-significant.
For the IKDC, the test for heterogeneity revealed that the effect sizes for the construct validity did not significantly differ between the two studies, p = 0.300, indicating homogeneous effects of the two studies (H = 1.03, I2 = 6.6%). The estimated overall correlation coefficient was r = 0.72, 95% CI [0.65, 0.78], z = 12.44 p = 0.001 (see Fig. 9 in Appendix). There further was no indication for publication bias (n = 122 studies necessary for non-significant results).
Overall, using pooled data from multiple studies, our results show that the WOMET shows high internal consistency, with values ranging from 0.90 to 0.92. Further, when administered multiple times, scores of the WOMET were reliable across studies (r = 0.80–0.93). Lastly, the WOMET showed high construct validity when compared with other scales such as the SF-36, the Lysholm Score, or the IKDC.
By means of this meta-analysis, we combined previous results within a new statistical framework. The results suggest that the WOMET can be used as a patient-reported outcome measure for the evaluation of patients with different meniscal injuries. Our paper is the first to present statistical evidence on the psychometric properties of the WOMET. The WOMET showed good psychometric properties across the studies included and the meta-analytic results suggest that effect sizes are not due to publication or selection bias. The WOMET shows satisfactory internal consistency, test-retest reliability, and construct validity. Due to limitations in both sample sizes and methodologies of the included studies, no conclusions can be drawn regarding the WOMET’s content validity, structure validity, cross-cultural validity, measurement error, or responsiveness. A further limitation of the studies included in this meta-analysis is the lack of cross-cultural validation, although recommended by the COSMIN Standards . A PROM is seen as cross-culturally valid if, for example, a Dutch and an English version are comparable in that they obtain similar results in a comparable population. Future studies should evaluate existing and new language versions of the WOMET regarding their cross-cultural validity.
Abram et al.  published a systematic review to evaluate PROMs for patients with meniscal pathologies. They found that the Lysholm Score , the IKDC , and the KOOS (Knee Osteoarthritis Score)  showed limited ability to assess symptoms and functional status in patients with meniscal tears. In contrast, they found that the WOMET had high content validity. In accordance with the results obtained by Abram et al. , our meta-analysis provides the first statistical evidence to recommend the WOMET as an instrument to evaluate treatment effects in patients with different meniscal injuries and meniscal treatments, especially acute or chronic meniscal injuries and traumatic or degenerative meniscal injuries treated operatively or conservatively. Whereas Abram et al. qualitatively summarized results from previous studies on measurement properties of the WOMET, we performed quantitative analyses on statistically pooled data from multiple studies in order to draw conclusions about overall effects.
Compared to other instruments, such as the IKDC or the Lysholm Score, the WOMET shows superior content validity, because it is the only measurement instrument which was developed based on patient rating regarding its comprehensiveness, feasibility, and comprehensibility. In contrast, the Lysholm Score was developed as a disease-specific measurement for patients suffering from knee ligament injuries, whereas the IKDC score was developed as a global knee-specific measurement. One study suggests that measurement error may limit the ability of the WOMET to detect the MIC in score for meniscal patients . Generally, strong methodological evaluations of the structural validity of many PROMs are still lacking. When conducting a systematic review or a meta-analysis, the lack of studies that report methodological details is an essential problem, because it limits the possibility to assess the methodological quality of the studies. In the areas of randomized controlled trials or diagnostic research, there exist guidelines for primary studies, such as the Consolidated Standards of Reporting Trials (CONSORT) statement  or the Standards for Reporting of Diagnostic accuracy studies (STARD) statement . In research on measurement properties, however, there exist wide variations in names given to specific measurement properties and different definitions are used for the same property. For example, the measurement of property reliability has also been referred to as reproducibility or stability . We therefore use and recommend the COSMIN checklist  as an adequate guideline on evaluating measurement properties.
For most of the studies included in this meta-analysis, the COSMIN methodology rating was poor for the reported measurement properties. Internal consistency was even rated poor in all studies. A key reason for this is the failure of most studies to analyze the factor structure of the WOMET. Without the assessment of the factor structure, there is no possibility of a clear interpretation of internal consistency (see also Tables 3 and 4 in the Appendix). The same holds for the interpretation of change scores: future studies should assess change scores of the WOMET in individual patients. Further, potential publication bias should be taken into account when considering the results of the present meta-analysis.
This is the first meta-analysis on measurement properties of the WOMET. We found satisfactory internal consistency, test-retest reliability, and construct validity.
Due to the lack of methodological quality as recommended by the COSMIN standards, we are unable to report structure validity or content validity of the WOMET. To ensure that the WOMET adequately reflects the symptoms, functions, and quality of life of patients with meniscal tears based on COSMIN criteria, it is necessary to assess the structural validity and content validity of this PROM if further studies.
Availability of data and materials
Since no original data were collected for this research, we did not store the data in a repository. Any materials and data of the present meta-analysis are available on request from the first author.
Patient-reported outcome measures
Western Ontario Meniscal Evaluation Tool
Health-related quality of life
Consensus-based Standard for the selection of health Measurement Instruments
Preferred Reporting Items for Systematic Reviews and Meta-Analyses
Short-form health survey 36
International Knee Documentation Committee Subjective Knee Form
Knee Osteoarthritis Score
Consolidated Standards of Reporting Trials
Standards for Reporting of Diagnostic accuracy studies
Rodkey WG, Stone KR, Steadman JR. Replacement of the irreparably injured meniscus. Sports Med Arthrosc. 1993;1:168–76.
Kirkley A, Griffin S, Whelan D. The development and validation of a quality of life-measurement tool for patients with meniscal pathology: the Western Ontario Meniscal Evaluation Tool (WOMET). Clin J Sport Med. 2007;17:349–56.
Mokkink LB, de Vet HCW, Prinsen CAC, Patrick DL, Alonso J, Bouter LM, Terwee CB. COSMIN risk of bias checklist for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018;27:1171–9.
Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, Bouter LM, de Vet HC. The COSMIN study reached international consensus on taxonomy, terminology and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63:737–45.
Terwee CB, Mokkink LB, Knol DL, Ostelo RW, Bouter LM, de Vet HC. Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist. Qual Life Res. 2012;21:651–7.
Abram SGF, Middleton R, Beard DJ, Price AJ, Hopewell S. Patient-reported outcome measures for patients with meniscal tears: a systematic review of measurement properties and evaluation with the COSMIN checklist. BMJ Open. 2017:7e. https://doi.org/10.1136/bmjopen-2017-017247.
Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gotzsche PC, Ioannidis JPA, Clarke M, Deveraux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. PLoS Med. 2009;6. https://doi.org/10.1371/journal.pmed.10000100..
Terwee CB, Jansma EP, Riphagen II, de Vet HCW. Development of a methodological PubMed search filter for finding studies on measurement properties of measurement instruments. Qual Life Res. 2009;18:1115–23.
Hamilton W, Aydin B & Mizumoto A. MAVIS: R package for running a meta-analysis though an interactive web interface with Shiny. Retrieved from http://kylehamilton.net/shiny/MAVIS/ [Accessed 8 Jan 2019].
Kovalchik S. Tutorial on meta-analysis in R 2013. URL: http://edii.uclm.es/~useR-2013/Tutorials/kovalchik/kovalchik_meta_tutorial.pdf [Accessed 8 Jan 2019].
Shivonen R, Järvelä T, Aho H, Järvinen TLN. Validation of the Western Ontario Meniscal Evaluation Tool (WOMET) for patients with a degenerative meniscal tear. J Bone Joint Surg Am. 2012;e65:1–8.
Celik D, Demirel M, Kus G, Erdil M, Özdincler AR. Translation, cross-cultural adaption, reliability and validity of the Turkish version of the Western Ontario Meniscal Evaluation Tool (WOMET). Knee Surg Sports Traumatol Arthrosc. 2015;23:816–25.
Tong WW, Wang W, Cu WD. Development of a Chinese version of the Western Ontario Meniscal Evaluation Tool: cross-cultural adaption and psychometric evaluation. J Orthop Surg Res. 2016;11. https://doi.org/10.1186/s13018-016-0424-8.
van der Wal RJP, Heeemskerk BTJ, van Arkel ERA, Mokkink LB, Thomassen BJW. Translation and validation of the Dutch Western Ontario Meniscal Evaluation Tool. J Knee Surg. 2017;30:314–22.
Sgroi M, Däxle M, Kocal S, Reichel H, Kappe T. Translation, validation and cross-cultural adaption of the Western Ontario Meniscal Evaluation Tool (WOMET) into German. Knee Surg Sports Traumatol Arthrosc. 2017. https://doi.org/10.1007/s00167-017-4535-5.
Ware JE, Sherbourne CD. The MOS 36-item short form health Survey (SF-36).I. Conceptual framework and item selection. Med Care. 1992;30:472–38.
Lysholm J, Gillquist J. Evaluation of knee ligament surgery results with special emphasis on use of a scoring scale. Am J Sports Med. 1982;10:150–4.
Hefti F, Müller W, Jakoab RP, Stäubli HU. Evaluation of knee ligament injuries with the IKDC form. Knee Surg Sports Traumatol Arthroscopy. 1993;1:226–34.
Roos EM, Roos PH, Lohmander LS, Eckdahl C, Beynonn BD. Knee Injury and Osteoarthritis Outcome Score (KOOS): development of a self-administrated outcome measure. J Orthop Sports Phys Ther. 1998;28:88–96.
Schulz KF, Altman DG, Moher D. the CONSORT Group. CONSORT 2010 statement: updated guidelines for reporting parallel group randomized trials. Ann Intern Med. 2010;152:726–32.
Bossuyt PM, Reitsma JB, Bruns DE, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD Initiative. Ann Intern Med. 2003;138:40–4.
Terwee CB, Dekker FW, Wiersinga WM, Prummel MF, Bossuyt PM. On assessing responsiveness of health-related quality of life instruments: guidelines for instrument evaluation. Qual Life Res. 2003;12:349–62.
This research did not receive any grant from funding agencies in the public, commercial, or non-profit sectors.
Ethics approval and consent to participate
Since this research is a meta-analysis of existing studies, we did not obtain ethical approval or consent to participate (not applicable).
Consent for publication
Since this research did not involve any individual person’s data, we did not obtain consent for publication (not applicable).
All authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Krott, N.L., Betsch, M. & Wild, M. A meta-analysis of measurement properties of the Western Ontario Meniscal Evaluation Tool (WOMET). J Orthop Surg Res 15, 569 (2020). https://doi.org/10.1186/s13018-020-02103-9