Construction and validation of nomogram to predict distant metastasis in osteosarcoma: a retrospective study

Background Osteosarcoma is most common malignant bone tumors. OS patients with metastasis have a poor prognosis. There are few tools to assess metastasis; we want to establish a nomogram to evaluate metastasis of osteosarcoma. Methods Data from the Surveillance, Epidemiology, and End Results (SEER) database of patients with osteosarcoma were retrieved for retrospective analysis. We identify risk factors through univariate logistic regression and multivariate logistic regression analysis. Based on the results of multivariate analysis, we established a nomogram to predict metastasis of patients with osteosarcoma and used the concordance index (C-index) and calibration curves to test models. Results One thousand fifteen cases were obtained from the SEER database. In the univariate and multivariate logistic regression analysis, age, primary site, grade, T stage, and surgery are risk factors. The nomogram for metastasis was constructed based on these factors. The C-index of the training and validation cohort was 0.754 and 0.716. This means that the nomogram predictions of patients with metastasis are correct, and the calibration plots also show the good prediction performance of the nomogram. Conclusion We successfully develop the nomogram which can reliably predict metastasis in different patients with osteosarcoma and it only required basic information of patients. The nomogram that we developed can help clinicians better predict the metastasis with OS and determine postoperative treatment strategies.


Introduction
Osteosarcoma (OS) originates from skeleton system throughout the body, especially in children and adolescents during bone growth [1] and a second incidence peak after 50 years [2]. OS is always the most common primary malignant tumor pathology of the skeleton system. It is generally believed that metastasis is an important factor affecting the prognosis of osteosarcoma patients [3]. Since the chemotherapy was applied to cancer therapy, the prognosis of non-metastatic osteosarcoma patients was obviously improved [4]. However, the osteosarcoma with metastasis is still poor [3,5]. For instance, comparing to the 70% 5-year overall survival of non-metastasis osteosarcoma, the OS patients with lung metastasis is only 30% [6].
The treatment strategy has some difference between metastatic and non-metastatic OS patients. Surgical treatment has always been the standard treatment for osteosarcoma [7]. However, when the tumor recurs locally or metastases to the lungs and cannot be removed, radiotherapy or chemotherapy can be considered firstly [8]. However, there is no useful method to evaluate metastasis status. Therefore, it is urgent to develop tools to predict the distant metastasis of osteosarcoma to guide clinical work.
The nomogram is a statistical tool; it can combine all independent risk factors to evaluate the endpoint accident in which we are interested. Nowadays, nomograms have been widely applied to predict the metastasis of other cancer patients such as renal cell carcinoma [9], gastrointestinal stromal tumor [10], and thyroid carcinoma [11].

Data source and inclusion criteria
Demographic and clinicopathological characteristics of osteosarcoma patients were obtained from Surveillance, Epidemiology, and End Results (SEER) database. The data on cancer patients that is freely available in the SEER database comes from cancer registries in 18 regions, which account for approximately 30% of the US population. The database includes patients' demographic characteristics, tumor pathological characteristics, therapy details, and follow-up records [12]. We finally selected 1015 osteosarcoma cases from the SEER database according to the following included criteria: (a) all patients were diagnosed between 2010 and 2015, (b) all patients were diagnosed with primary osteosarcoma by pathology or clinical, (c) the metastasis status is clear, and (d) completed follow-up. Exclusion criteria were as follows: (a) pathology is not osteosarcoma and (b) unknown age, race, sex, tumor size, primary site, grade, and T stage.

Study variables
The variables we included in the study are age at diagnosis, sex, race, tumor size (CS tumor size, 2004+), primary site, metastasis, T stage, and surgery. Age at diagnosis was divided into under 20 years old, 20-49 years old, and over 50 years old. Race classification is white, black, and other. The pathological grade was divided into high grade (including grades I-II) and low grade (including grades III-IV) according to the variable "ICD-O-3 grade". The tumor size is classified to <5cm, 5-10cm, and ≥10cm in terms of the variable. The primary site was classified into external, axial, and other. The T stage includes T1, T2, and T3 according to Derived AJCC T, 7th ed. Surgery means the surgery information of primary site.

Statistical analysis
All patients (n = 1015) were randomly divided into training cohort (n = 610) and validation cohort (n = 405) to construct and validate the nomogram, separately. Kaplan-Meier survival analysis was performed between metastasis and non-metastasis patients. Metastasis means that osteosarcoma metastasizes to distant site. We used univariate logistic regression analysis and log rank test to identify potential factors that impact on metastasis of patients. The meaningful risk factors which were selected from logistic regression analysis were further analyzed by the multivariate logistic regression analysis to confirm independent risk factors. The logistic regression model is used to calculate the hazard ratio of each variable with a corresponding 95% confidence interval (CI). To estimate the effect of multicollinearity, we calculate the kappa coefficient of the model in train cohort and the value is 8.081531 which represents multicollinearity is weak. We apply the stepwise regression to select the best logistic regression model. Randomized grouping and univariate and multivariate regression were performed using R version 4.0.2 (https://www.r-project. org/).

Development and validation of nomogram
Based on the results obtained from the multivariate logistic regression analysis, we constructed the nomogram to predict the metastasis risk. This study constructed the nomogram through the training cohort and then validated it through the validation queue to test its accuracy. The index of concordance (C-index) which reflects on the possibility of consistency between predicted probability and observed outcome can be used to evaluate the predictive performance of nomogram. The C-index value ranges from 0.5 to 1.0, where 0.5 represents random and 1.0 represents a perfect match. The higher the C-index value, the higher the consistency between the prediction and the observed result. The C-index is at least 0.7 and the nomogram prediction is meaningful. At the same time, internal calibration plot and external validation cohort are also used to evaluate the predictive ability of nomogram. The nomogram, receiver operating characteristic (ROC) curve, and calibration curve were performed using R version 4.0.2 (https://www.r-project. org/). A two-sided P<0.05 was considered statistically significant.

Kaplan-Meier survival analysis and univariate and multivariate logistic regression analysis
The results of Kaplan-Meier survival analysis show that in the overall survival of osteosarcoma patients, patients with metastasis are significantly poor than that of patients without metastases (Fig. 1). Logistic regression analysis was applied to filter factors which affect metastasis. According to the univariate logistic regression analysis and the log-rank test of osteosarcoma patients (Table 2), external, low grade, and high T stage have more metastasis risk. However, patients who received surgery and aged 20-49 have less risk to metastasis. There were no significant differences in race and sex.
On the basis of univariate logistic regression analysis, factors with P<0.05 that may affect the metastasis risk of osteosarcoma were selected to perform multivariate logistic regression analysis to identify independent risk variables. Finally, age, size, primary site, grade, T stage, and surgery were used to perform multivariate logistic regression analysis. The multivariate analysis demonstrated that age 20

Construction and validation of nomogram
Based on the results of multivariate logistic regression analysis, we construct the nomogram with age, primary site, T stage, grade, and surgery (Fig. 2).
In order to verify the accuracy of the nomogram, we performed internal and external validation through concordance indices (C-index) and calibration plots. The C-indexes of training and validation cohort are 0.754 and 0.7169 (Fig. 3). It means that the prediction of the nomogram is great for osteosarcoma metastasis. Besides, the prediction and observed outcomes for tumor metastasis which the calibration plots show in Fig. 4 are highly consistent both in the training and validation cohort. These results indicate that nomogram shows significantly superior prediction performance.

Discussion
In 2020, there will be approximately 3600 new bone tumor patients, and about 1720 patients will die from the malignant cancer in America [13]. Osteosarcoma is the most common cancer type. Former studies focused on finding factors that influence the osteosarcoma prognosis to evaluate the overall survival or cancer-specific survival [14,15]. There is still no study which focuses on the metastasis in osteosarcoma, which is the most important factor for cancer prognosis [16]. However, nomogram has been applied to predict metastasis in other cancer types. Study by Cai et al. combined age, grade, histology, T stage, lymph node metastasis, and tumor size to predict the metastasis in T1 and T2 gallbladder cancer [17]. In pancreatic ductal adenocarcinoma, demographic and clinicopathological characteristics were used to construct nomogram to evaluate metastasis [18]. However, there is still no similar study in osteosarcoma. The nomogram is an accurate and convenient mathematical model which can predict a specific end point [19]. It is a reliable tool to quantify and assess risks, which can help clinicians better diagnose and determine treatment options. Therefore, it is imperative to evaluate metastasis status through nomogram. Study by Cao et al. [20] has identified several metastasis-associated genes and this way may be effective for OS patients. The skeletal microenvironment composed of mesenchymal stem cells (MSC), osteoblasts, osteoclasts, osteocytes, fibroblasts, fat cells, etc., provides an ideal growth place for many cancers. For example, the most common metastasis sites for breast and prostate cancer are bones [21,22]. This special tumor microenvironment is an ideal place for the occurrence, development, and metastasis of osteosarcoma. The tumor microenvironment also changes with age, tumor location, size, and grade. However, genomic sequencing is quite expensive and not every patient can afford it. Therefore, it is urgent to develop an economical model to evaluate metastasis status. In this study, age, tumor size, primary site, grade, T stage, and surgery were meaningful factors for OS metastasis in univariate logistic regression analysis. After stepwise logistic regression, age, grade, primary site, T stage, and surgery were identified as most meaningful factors. Age is usually thought to be a factor which affects prognosis [23,24]. Nowadays, it has been proved to be related to lung metastasis in OS [3]. Our results also display that OS patients aged 20-49 have fewer metastases comparing to children and older patients. We think that may be triggered by body development status. The child's body is not fully developed and old people are aging. Human aging is accompanied by cell aging, which includes changes of nuclear genome instability, protein, and metabolism [25,26]. These changes may be involved in the occurrence and development of tumors [27]. Tumor grade is the description of a tumor based on how abnormal the tumor cells and the tumor tissue look under a microscope. It is an indicator of how quickly a tumor is likely to grow and spread according to National Cancer Institute (https://www.cancer.gov/ about-cancer/diagnosis-staging/prognosis/tumor-gradefact-sheet). In our study, the low grade can be a risk factor for metastasis. Except tumor grade, there are other potential candidates initially associated with the tumor for identifying high-risk patients, such as tumor size, location, histological subtype, and biological characteristics [28]. The study by Kim et al. [28] demonstrated that initial tumor size is related to the histological response and survival time of patients with osteosarcoma.  Surgery is the core treatment for osteosarcoma [29]. Although surgery effect affecting by many factors, complete resection of the primary tumor blocks the progression of tumors including metastases in some extent [30,31]. In our study, results demonstrate that surgery effectively prevents OS metastasis. However, the ability of a single factor to affect the metastasis of osteosarcoma is limited, so we combine multiple prognostic factors to predict metastasis. The nomogram which can combine the multiple variables to predict tumor risk has long been widely accepted. Finally, we developed the nomogram to predict metastasis with age, primary site, T stage, and surgery. According to the set ratio, each prognostic factor has a corresponding value. Based on the personalized information and its corresponding value, we can get a total score, which is used to predict metastasis risk. For example, for patients with osteosarcoma, you can find the corresponding points in the nomogram based on patients' information, add all the points, and correlate the total score with the probability of the event we are trying to predict.
Our research also has some limitations. We only searched the patient's medical records in the SEER database. Although the SEER database represents 30% of the US population, it is inevitable that some patients have missing information; if we include other databases, some grey literature resources, meeting records, or non-English articles, we may find some other information that can make prediction results more accurate, in spite of the possibility is very small. Second, some patients with osteosarcoma lack some information to analyze, for example, surgical margin status, and the radiotherapy and chemotherapy data in the SEER database are limited, which may lead to inaccurate inferences.
In conclusion, the nomogram is more accurate when tested in internal and external validation cohorts. If others can use our nomogram in some prospective studies or other databases, it may be more conducive to verify the accuracy of this model. The nomogram developed by us helps clinicians better predict metastasis risk and determine postoperative treatment strategies for patients with osteosarcoma.