The relationship between thoracic kyphosis and age, and normative values across age groups: a systematic review of healthy adults

Background Thoracic kyphosis is reported to increase with ageing. However, this relationship has not been systematically investigated. Peoples’ kyphosis often exceeds 40°, but 40° is the widely accepted cut-off and threshold for normality. Consequently, patients may be misclassified. Accurate restoration of kyphosis is important to avoid complications following spinal surgery. Therefore, specific reference values are needed. The objective of the review is to explore the relationship between thoracic kyphosis and age, provide normative values of kyphosis for different age groups and investigate the influence of gender and ethnicity. Methods Two reviewers independently conducted a literature search, including seven databases and the Spine Journal, from inception to April 2020. Quantitative observational studies on healthy adults (18 years of age or older) with no known pathologies, and measuring kyphosis with Cobb’s method, a flexicurve, or a kyphometer, were included. Study selection, data extraction, and study quality assessment (AQUA tool) were performed independently by two reviewers. The authors were contacted if clarifications were necessary. Correlation analysis and inferential statistics were performed (Microsoft Excel). The results are presented narratively. A modified GRADE was used for evidence quality assessment. Results Thirty-four studies (24 moderate-quality, 10 high-quality) were included (n = 7633). A positive moderate correlation between kyphosis and age was found (Spearman 0.52, p < 0.05, T5-T12). Peoples’ kyphosis resulted greater than 40° in 65% of the cases, and it was significantly smaller in individuals younger than 40 years old (x < 40) than in those older than 60 years old (x > 60) 75% of the time (p < 0.05). No differences between genders were found, although a greater kyphosis angle was observed in North Americans and Europeans. Conclusion Kyphosis increases with ageing, varying significantly between x < 40 and x > 60. Furthermore, kyphosis appears to be influenced by ethnicity, but not gender. Peoples’ thoracic sagittal curvature frequently exceeds 40°. Trial registration The review protocol was devised following the PRISMA-P Guidelines, and it was registered on PROSPERO (CRD42020175058) before study commencement. Supplementary Information The online version contains supplementary material available at 10.1186/s13018-021-02592-2.

1. Explore the relationship between kyphosis and age 2. Provide reference values of kyphosis for different age groups 3. Examine data for differences between genders or ethnic groups

Protocol and registration
The review's protocol followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines for protocols (PRISMA-P) [11] and was registered on PROSPERO (CRD42020175058). The methods were informed by the Cochrane Handbook [12]. The manuscript adhered to the PRISMA [13] and the Synthesis Without Meta-analysis (SWiM) guidelines [14] for reporting.

Eligibility criteria
The research question was informed by the Sample, Phenomenon of Interest, Design, Evaluation, Research type (SPIDER) tool [15], whose details are in Table 1.

Information sources
Two reviewers (MZ/SL) independently searched for eligible articles on MEDLINE, EMBASE and PsycINFO through Ovid, and on AMED, The Index of Chiropractic Literature and CINAHL through EBESCO, from inception to April 2020. The Spine Journal, the reference list of the studies included in the review, and grey literature on SIGLE, through Open Grey, were also searched. The research was limited to studies published in English.

Search
Keyword selection was informed by scoping review and researcher expertise (NRH). The search strategy was individualised for each database, combining keywords, Medical Subject Headings, and Boolean operators, and following consultation with a librarian. Keywords selected were middle back, dorsal spine, middle spine, mid-back, thoracic spine, kyphosis, hyperkyphosis, Dowager's hump, hunchback, rounded back, and sagittal curvature (see Additional file 1 for search strategy examples).

Study selection
The screening process was conducted independently by MZ and SL, then agreement was sought. In case of disagreement, a third reviewer (NRH) acted as a moderator.
The studies were screened from their title and abstract first, then from their full text [8].

Data collection
The data collection process was informed by the Cochrane Handbook [16]. The data extraction form was piloted with data extraction performed independently by MZ and SL and then cross-checked. If further information was necessary to reach a consensus among the research team, the authors were contacted by MZ.

Data items
Data extraction was informed by the recommendations for reviews in clinical anatomy [8]. This included study title, author's name, publication year, method for measuring kyphosis, degrees of kyphosis and range, sample size, age, age range, gender, body mass index, the standard deviation (SD) of the measures and ethnicity, defined as a group of people sharing cultural, geographical and social attributes.

Risk of bias in individual studies
The studies' quality assessment was performed independently by MZ and SL; NRH acted as a moderator in case of disagreement. The Anatomical Quality Assessment (AQUA) tool, devised for assessing the quality of anatomical studies [17], was used. As suggested by Chhapola et al. [18], a supplementary table to improve the tool's performance was created (see Additional file 2). The AQUA tool is composed of 5 domains (i.e. objective(s) and subject characteristics, study design, methodology characterisation, descriptive anatomy, reporting of results); each of them has a specific set of questions whose answers could be either yes, no or unclear to enable the readers to evaluate the study's quality.
Currently, only indications about how to evaluate each individual domain of the AQUA tool exist. To be considered at low risk of bias in a single domain, the study must receive yes answers to all the questions of that specific domain; otherwise, the study would be considered at high risk [17]. Each domain was evaluated following the procedure just described. However, since no guidance exists on how to classify the overall quality of the evaluated study, the research team agreed that for a study to be considered, overall, high-quality, this must be at low risk of bias in all five domains. If at low risk in three or four domains they were considered moderatequality, otherwise low-quality. The tool was then piloted before study commencement by MZ and SL on five articles and interrater agreement computed according to McHugh [19]. Perfect agreement was achieved (κ = 1).

Summary measures
Data was analysed with Microsoft Excel of the Microsoft Office 365 package. Since kyphosis varies depending on the body references used to calculate it [6,20], analysis was performed comparing the measurements for the same body references. The mean kyphosis and age were used for correlation analysis. Either the Pearson's or Spearman's correlation coefficient was computed, depending on whether the data were normally distributed or not. Data distribution was investigated with the Kolmogorov-Smirnov test, and correlation was interpreted as recommended [21].
The means and their precision estimates were used to calculate the reference/normative values, or ranges, of kyphosis for each age group. Since SDs represent the dispersion of the values around their means, whereas confidence intervals are used to assess a treatment's efficacy [22], SDs were deemed to be more appropriate to establish ranges. The mean kyphosis was utilised for group comparisons. Previous evidence regarding the relationship between kyphosis and age [2,4,6] was used to create the groups for analysis. These were people younger than 40 years old (x < 40), people between 40 and 60 (40 < x < 60), people older than 60 (x > 60), people younger than 50 (x < 50), and those older than 50 years old (x > 50). Inferential statistics was performed using the independent two-tailed t-test, for two group comparisons (x < 50, x > 50), or one-way ANOVA, for multiple group comparison (x < 40, 40 < x < 60, x > 60). Gender and ethnic group differences were investigated comparing each individual age group using the independent two-tailed t-test. Levene's test was used to assess between groups' equality variances. The selected alpha level was 0.05, and the Bonferroni correction was applied for post hoc analysis, after ANOVA, to reduce the chances of type I error [23,24].

Synthesis of results and risk of bias across studies
Since important clinical and methodological heterogeneity were observed during the scoping review, metaanalysis was not performed [25]. Data were synthesised narratively, and descriptive statistics presented [26]. The overall level of evidence was evaluated using a modified Grading of Recommendations, Assessment, Development and Evaluation (GRADE) system [27]. Whilst limited to observational studies, if the results were consistent (> 80% concordant results) [28], precise, and obtained predominantly from high-quality studies, the overall quality was upgraded from low to moderate. For correlation analysis, consistency was assessed by evaluating the direction of the correlation (positive or negative). For the reference values and for gender and ethnic group comparisons, statistical significance between groups' means was used. Correlation analysis to be precise must be statistically significant, whereas for the normative values and for gender and ethnic group comparisons, the ranges of the groups with statistically significant different means must not overlap. Furthermore, their difference must be greater than the standard error of measurements for the modality employed to calculate kyphosis. These values were 2.4°for the kyphometer [29], 0.4 cm for the flexicurve [30], and 3°for Cobb's method [7]. If the results were inconsistent, imprecise and coming primarily from low-quality studies, the results' quality was downgraded to very low.

Study selection
A total of 12,366 studies were retrieved, and 68 selected for full-text screening. Thirty-eight studies were excluded after the full-text screening, and four added following reference review, resulting in a total of 34 studies included in the review [6,7,20, (Fig. 1).

Relationship between kyphosis and age
Only studies measuring kyphosis using Cobb's method were included in the analysis because of the greater sample size, which provides greater statistical power [23], and those using a flexicurve included only women, limiting their generalisability. No analysis was performed for C7-T12 and T3-T12 because data came from single studies.
A positive correlation between kyphosis and age was found (see Table 4). The strength of the correlation was moderate for T5-T12 (Spearman 0.52) and low for T4-T12 (Spearman 0.45). The sample size for T5-T12 was more than double that for T4-T12 [25], giving more confidence in the findings for T5-T12. Table 4 provides details of the mean kyphosis and normative values of kyphosis for different age groups, as well as between-group mean difference in kyphosis and the sample sizes. The same studies utilised to investigate the relationship between kyphosis and age were also used for calculating the reference values. Only 12 studies divided their sample by age groups [6,7,20,31,32,35,41,43,48,50,58,59]. The ranges surpassed 40°in people < 60 years old 58.3% of the time and 75% in those older, questioning the accuracy of the current cut-off for normality.

Gender and ethnic group differences
Fourteen studies specified sample ethnicity [20, 32, 34-38, 41-46, 59]; consequently, geographical provenience was the main determinant for ethnic group subdivision. Two studies were excluded from the sub-analysis between ethnicities. One study [60] did not divide their sample by age groups and did not report mean's SD, whereas in the other study [48], the sample size was too small to exclude the chance of committing type II error. Fifteen of the included studies presented their results according to gender [6, 31, 32, 34, 36, 37, 40-43, 45, 48, 53, 58, 59], and only eight of those divided their sample by age [6,31,32,41,43,48,58,59]. The results are reported in Table 4. No differences between genders were observed, but North Americans and Europeans showed a greater thoracic curvature than Asians (Fig. 2).

Synthesis of results
There is moderate-quality evidence that a moderate positive correlation between age and kyphosis exists and that kyphosis does not differ between genders. The quality of the evidence for the normative values presented, and for the differences in kyphosis observed between ethnicities is low ( Table 5).

Discussion
This is the first review exploring the relationship between kyphosis and age, in addition to providing normative kyphosis values for different ages, ethnic groups and genders. Findings evidence a positive correlation between kyphosis and age, as well as the influence of ethnicity on kyphosis. Gender, instead, does not appear to influence thoracic sagittal curvature.

Relationship between kyphosis and age
Muscle strength, vertebral body shape and intervertebral disc morphology can affect kyphosis angle [3]. However, vertebral body shape and intervertebral disc morphology account for 86-93% thoracic spine curvature [62]. Disc morphology has a stronger negative correlation with ageing than vertebral morphology [62,63]. Therefore, the increase in thoracic kyphosis observed with ageing may be related to the changes occurring in intervertebral discs. Most of these changes occur in the middle section of the thoracic spine [64], which can explain why statistical significance was reached only when kyphosis was measured from T4/5. For these reasons, and due to the        Table 3 Risk of bias within studies   technical difficulties with visualising the vertebrae above T4 from lateral radiographs [2], measuring kyphosis from T5 may provide more accurate measurements.

Normative values
The normative values surpassed 40°in 65% of the analysis. This finding challenges the accuracy of the current threshold used for defining normality (i.e. 40°). This cutoff was first introduced by Roaf in 1960 [1], but without supporting evidence for it. Despite subsequent studies showing that healthy children, adolescents and adults could have thoracic curvatures exceeding 40° [6,65], this value is still used in practice [3,4]. Some authors suggested moving this cut-off to 50° [2]. However, even this suggestion may not decrease the chances of misclassifying patients, since 35% of the ranges presented in this review surpassed 50°. Using a range of 20-60° [9] may seem more appropriate, since the ranges provided never exceeded 60°. Nonetheless, people x < 40 appeared to have a significantly smaller kyphosis than those x > 60. Consequently, using the same reference values for both groups may lead to misclassification anyway. When kyphosis was measured between T4/5 and T12, its value significantly differed also between people x < 50 and x > 50. This may indicate a higher measurement precision when those body references were used. Thoracic kyphosis varied depending on the body references selected to calculate it, with a trend showing that including higher vertebrae leads to greater values. Therefore, using specific reference values, like those presented in this review, which account for age and body references, could be the most accurate alternative for clinicians.

Gender and ethnic group differences
Thoracic kyphosis does not seem to be influenced by gender, since the between-group mean difference never reached statistical significance. Although the precision of the results could have been affected by the small number of studies subdividing their sample by age groups and gender, these findings align with previous evidence [7,57]. Significant differences in kyphosis between the ethnic groups were seen, with Europeans and North Americans showing a greater kyphosis than Asians. Genetic differences may explain this result. A twins study found that thoracic kyphosis is influenced by genetics and that it also negatively correlates with bone mineral density [66], also related to genetics [67]. However, other lifestyle factors, such as sports, could also influence thoracic curvature [68], but no data were available to investigate those relationships. Since only 14 studies specified the sample ethnicity [20, 32, 34-38, 41-46, 59], people were grouped according to geography. This can represent a limitation since some areas have habitants from different socio-cultural backgrounds. Most of the studies that specified sample ethnicity included people from Asia [32, 34, 35, 37, 38, 41-43, 45, 59] or Europe [20,36,38,44], which further affects the reliability of the results for North America.

Strengths and limitations
This reviewed employed rigorous methods, with transparent reporting (PRISMA and SWiM guidelines), and a completed PRISMA checklist relative to this article can be found in Additional file 3. The main strength of this review lies in the high quality of studies included and the large sample size utilised for computing the values presented. These factors strengthen the confidence in study findings. No information about kyphosis measured with a kyphometer or flexicurve was provided because of poor information retrieval, perhaps due to the limited sensitivity of the search tool [15]. The AQUA tool was utilised to assess study quality, but data regarding its validity and reliability is lacking [17]. Since clinical and methodological heterogeneity can preclude a metaanalysis [25], and concerns regarding the reliability of Fig. 2 Ethnic group comparison. Data presented as mean standard deviation. *Statistical significance for p < 0.05 (t-test). x < 40, people younger than 40 years old; 40 < x < 60, people between 40 and 60 years old; x > 60, people older than 60 years old; x < 50, people younger than 50 years old; x > 50, people older than 50 years old (−) Ranges did not overlap in 2 out of 6 cases.
In 1 of the 2 cases, the between-range difference was greater than the SEM.  the results of the meta-analysis carried out on observational studies exist [69], the authors considered a narrative synthesis most appropriate. Finally, the sample utilised to create the normative values presented was not randomly selected from the general population, but it was created by combining the samples of the individual studies included in the review, and this could represent a form of selection bias. However, the rigorous methodology employed, the size and the heterogeneity of the sample may partially mitigate this limitation.

Clinical implications
Surgical interventions aiming to correct adult spinal deformities are recommended in those cases with progressive deformities, significant neural compromising, pain or functional limitations, and that did not respond to conservative management [9]. To help these patients, different surgical approaches are available, from minimally invasive operations, such as laminectomies, to deformity correction and vertebral fusion surgeries. These more invasive interventions may target only a limited and specific number of vertebrae in mild and moderate cases or extensive portions of the thoracic and lumbar spine in more severe cases [70], reaching as high as T3-T4 in some instances [71]. These more invasive interventions are associated with high risk of complications and worse functional outcomes if the surgical correction is suboptimal; thus, careful surgical planning is paramount [70]. Among the individual patient's characteristics to be considered when planning for surgery, there are patient's age [72] and ethnicity [71]; consequently, we believe that the normative values provided in this review, which account specifically for these characteristics, despite being supported by low-quality evidence, may prove beneficial in a clinical context. This information may help clinicians deciding and planning their interventions.

Conclusion
This review provides evidence that a positive correlation between kyphosis and age exists. It also shows that thoracic kyphosis seems to not be influenced by gender, but to vary depending on ethnicity, age, and the body references used to measure it. The normative values of kyphosis currently used in clinical practice may not reduce the chances of misclassifying patients, since they do not account for those characteristics, and they may not be precise enough to correctly inform clinicians when planning and performing corrective spinal surgeries. Therefore, using specific reference values, such as those presented in this study, which account for body reference, age, and ethnicity, when assessing and treating patients may represent the most accurate solution for clinicians.