Artificial intelligence to diagnosis distal radius fracture using biplane plain X-rays

Oka, Kunihiro; Shiode, Ryoya; Yoshii, Yuichi; Tanaka, Hiroyuki; Iwahashi, Toru; Murase, Tsuyoshi

doi:10.1186/s13018-021-02845-0

Research article
Open access
Published: 25 November 2021

Artificial intelligence to diagnosis distal radius fracture using biplane plain X-rays

Kunihiro Oka ORCID: orcid.org/0000-0002-7770-4634¹,
Ryoya Shiode¹,
Yuichi Yoshii²,
Hiroyuki Tanaka¹,
Toru Iwahashi¹ &
…
Tsuyoshi Murase¹

Journal of Orthopaedic Surgery and Research volume 16, Article number: 694 (2021) Cite this article

2580 Accesses
15 Citations
Metrics details

Abstract

Background

Although the automatic diagnosis of fractures using artificial intelligence (AI) has recently been reported to be more accurate than those by orthopedics specialists, big data with at least 1000 images or more are required for deep learning of the convolutional neural network (CNN) to improve diagnostic accuracy. The aim of this study was to develop an AI system capable of diagnosing distal radius fractures with high accuracy even when learning with relatively small data by learning to use bi-planar X-rays images.

Methods

VGG16, a learned image recognition model, was used as the CNN. It was modified into a network with two output layers to identify the fractures in plain X-ray images. We augmented 369 plain X-ray anteroposterior images and 360 lateral images of distal radius fractures, as well as 129 anteroposterior images and 125 lateral images of normal wrists to conduct training and diagnostic tests. Similarly, diagnostic tests for fractures of the styloid process of the ulna were conducted using 189 plain X-ray anteroposterior images of fractures and 302 images of the normal styloid process. The distal radius fracture is determined by entering an anteroposterior image of the wrist for testing into the trained AI. If it identifies a fracture, it is diagnosed as the same. However, if the anteroposterior image is determined as normal, the lateral image of the same patient is entered. If a fracture is identified, the final diagnosis is fracture; if the lateral image is identified as normal, the final diagnosis is normal.

Results

The diagnostic accuracy of distal radius fractures and fractures of the styloid process of the ulna were 98.0 ± 1.6% and 91.1 ± 2.5%, respectively. The areas under the receiver operating characteristic curve were 0.991 {n = 540; 95% confidence interval (CI), 0.984–0.999} and 0.956 (n = 450; 95% CI 0.938–0.973).

Conclusions

Our method resulted in a good diagnostic rate, even when using a relatively small amount of data.

Background

Distal radius fracture is an injury that occurs frequently among the elderly [1]. In the elderly, favorable wrist function can be maintained through appropriate immobilization with a cast or splint for minor displaced fractures. However, residual deformations can result in complications, such as wrist pain, a restriction in the range of wrist motion, and decrease in grip strength. Thus, appropriate initial diagnosis is essential [2, 3]. Artificial intelligence (AI) has been applied to various medical technologies. In medical image processing, techniques such as the automatic segmentation of each internal organ from computed tomography (CT) data [4] and diagnosis of lesions from skin images [5] are already in practical application. The highly accurate automatic diagnosis of fractures using plain X-rays by AI has also been investigated [6, 7]. One of the obstacles faced while developing high reliable AI for diagnosing disease is that big data of at least 1000 images must be used to train these AIs. Usually, orthopedic surgeons use two-direction plain X-ray images to diagnose fractures. In this study, we hypothesized that developing an AI system with a highly accurate diagnostic ability for distal radius fractures using plain two-direction X-rays is possible even when learning with relatively small data.

Methods

This study was approved by the Ethical Review Board of our institution (approval number: 18137). We followed the principles of the Helsinki Declaration, revised in 2000. Each author has affirmed that the organization to which they belong has approved the protocols for humans. Each step performed in this study follows the ethical principles of research.

Data collection

We obtained 369 plain X-ray anteroposterior images and 360 lateral images of distal radius fractures from 369 patients of over 18 years of age with distal radius fractures and 129 plain X-ray anteroposterior images and 125 lateral images of normal wrist of 129 people from three affiliated hospitals. Since not all patients with fractures had been examined with the plain X-rays of the normal wrist, the plain X-ray diagnosed as tenosynovitis (sprain of wrist without fractures) was included in the image data of normal side. Further, nine lateral fracture images were excluded because those were taken in an oblique position due to severe wrist pain. The image size of each plain X-ray image was 500 pixels × 625 pixels, and the pixel size was 0.4 mm × 0.4 mm. The clinical l diagnosis results by specialized orthopedic surgeons based on situations due to injuries, clinical findings, and imaging from clinical settings were used as the gold standard for fracture diagnosis. Images of the normal side of the patients with distal radius fracture without existing disorders, such as trauma, arthritis, or bone tumor, were used as images of normal wrists.

Convolutional neural network

VGG16, which is a model with already learned image recognition, was used as the CNN. [8] It is an open network that is already trained, comprises 16 layers, and identifies 1000 types of images. It was then modified into a network with two output layers to identify the existence of fractures in plain X-ray images (Fig. 1). To execute the CNN, we used Python as the programming language and Keras and TesorFlow as the software libraries.

Dataset

We used digital imaging and communications in medicine (DICOM) data of plain X-ray of a wrist as the original file with 16 bits per pixel. The images of the left wrist were reversed to enable every X-ray image to show a right wrist. The 729 fracture images (369 anteroposterior and 360 lateral images) and 254 normal wrist images (129 anteroposterior and 125 lateral images) were randomly selected to produce the following three patterns of datasets (A, B, C) to consider the effect of data selection on results. The dataset for training contains 569 fracture images (299 anteroposterior and 270 lateral images) and 174 normal wrist images (91 anteroposterior and 83 lateral images). In addition, the dataset for validation contains 80 fracture images (30 anteroposterior and 50 lateral images) and 40 normal wrist images (18 anteroposterior and 22 lateral images). The dataset for tests contains 80 fracture images (40 anteroposterior and 40 lateral images) and 40 normal wrist images (20 anteroposterior and 20 lateral images). These datasets are presented in Table 1. To increase the learning data, data were augmented by adding stretching, rotation, shearing, and parallel translation to the original images using affine transformation {\(\left(\begin{array}{c}{x}^{^{\prime}}\\ {y}^{^{\prime}}\end{array}\right)=\left(\begin{array}{cc}a& b\\ c& d\end{array}\right)\left(\begin{array}{c}x\\ y\end{array}\right)+\left(\begin{array}{c}tx\\ ty\end{array}\right)\), x and y are the original coordinates; x′ and y′ are the coordinates after the conversion; tx and ty are the parallel translations}. These produced data with 3245 fracture images and 3210 normal wrist images are used for training and validation (Fig. 2). To adjust VGG16 for learning, conversion into 224 pixels × 224-pixels image size was conducted using a floating point of 32 bits per pixel. Moreover, to identify the fractures of the styloid process of the ulna, which often accompanies distal radius fractures, 189 anteroposterior images of the fractures of the styloid process of ulna and 302 images of the styloid process of the ulna without fractures were augmented into 845 and 1360 images, respectively, and three patterns (A, B, C) of datasets were prepared using the same approach (Table 1).

Table 1 Dataset

Full size table

Training

Learning was conducted by entering the training dataset after image augmentation into the network. Subsequently, the validation dataset was used for validation, and weighting was conducted to enable the output to approximate the correct answer using the back-propagation (\(w\leftarrow w-\eta \frac{\partial E}{\partial w}, b\leftarrow b-\eta \frac{\partial E}{\partial b}\)). Three learnings of approximately 40 epochs were conducted with each dataset for three patterns (A, B, C) because even if the same data set was used for training, there would be slight differences in the test results. Thus, nine learnings were conducted. The parameter of the epoch number where the diagnostic rate peaked during each learning was adopted, and three diagnostic tests for each pattern (a total of nine tests) were conducted. The nonaugmented original image data were used for the diagnostic tests of distal radius fractures and fractures of the styloid process of the ulna.

Test and diagnosis

The method to determine a distal radius fracture involved entering an anteroposterior image of the wrist for testing into the trained AI. If it identifies a fracture, it is diagnosed as the same. When the anteroposterior image is determined as normal, the lateral image of the same patient is entered. If a fracture is identified in the lateral image, the final diagnosis is fracture; if the lateral image is identified as normal, final diagnosis is normal (Fig. 3). As the fracture in the lateral image of a styloid process of the ulna overlaps with the distal radius, making it difficult to identify, its identification during diagnosis is conducted only with the anteroposterior images (Fig. 4).

Assessment

For the evaluation of the developed AI, we evaluated its diagnosis accuracy, sensitivity, and specificity using 40 images of distal radius fractures, 20 images of normal wrists, 20 images of fractures of the styloid process of the ulna, and 30 images of a normal styloid process of the ulna. We used the receiver operating characteristic (ROC) curve and area under curve (AUC) to evaluate the diagnostic ability. The time required for the process, which includes the extraction of the input data from the simple X-ray DICOM data and the diagnosis by the AI, was evaluated.

Results

The number of epochs with the accuracy peak for validation in the diagnosis of distal radius fracture was 9, 3, and 25 for patterns A, B, and C, respectively. Their respective diagnostic accuracies were 89.2%, 93.5%, and 93.6%. The diagnostic accuracy after this point mostly plateaued (Fig. 5). The diagnostic accuracy of the anteroposterior images of distal radius fractures in the respective optimal epoch number of patterns A, B, and C was 95.7 ± 1.7%; the sensitivity and specificity were 95.0 ± 3.1% and 97.2 ± 2.6%, respectively. When the lateral images were input, the diagnostic accuracy increased to 98.0 ± 1.6%; the sensitivity and specificity were 98.6 ± 1.8% and 96.7% ± 3.5, respectively. The diagnostic accuracy of fractures of the styloid process of the ulna in the respective optimal epoch numbers of three patterns was 91.1 ± 2.5%; the sensitivity and specificity were 92.2 ± 5.7% and 90.4 ± 3.9%, respectively. Figure 6a shows the ROC of the diagnostic tests of distal radius fractures. The AUC of the diagnostic test using only the anteroposterior images was 0.990 {n = 540; 95% confidence interval (CI), 0.984–0.996} and that of the test using both anteroposterior and lateral images was 0.991(n = 540; 95% CI 0.984–0.999). Figure 6b shows the ROC of diagnostic tests of the styloid process fracture of the ulna. The AUC of the diagnostic of fractures of the styloid process of the ulna using anteroposterior images was 0.956 (n = 450; 95% CI 0.938–0.973). The time required for image conversion from the DICOM data and diagnosis by AI was approximately 30 s.

Discussion

The initial diagnosis of trauma, such as distal radius fractures, is often performed by medical interns or emergency room doctors. It is possible to prevent the displacement of a fracture by appropriately diagnosing and performing immobilization with a cast or splint. Thus, initial diagnosis and treatment are essential [9]. The image recognition accuracy using AI is superior than the one by humans, and it has been applied to various fields [10]. The AI-based automatic fracture diagnosis system enables speedy diagnostic support of traumas and the initiation of treatment based on the diagnosis, which can be expected to improve the total treatment. Reports on existing bone image diagnosis using AI include a program that diagnoses bone age by learning the shape of the epiphyseal nucleus and bone maturity from the data of more than 10,000 plain X-ray images of children’s hands [11]. The accuracy of diagnosing bone age is 90.4% within 1 year and 98.1% within 2 years. Although a physician would require a few minutes to diagnose the bone age, AI can do so in less than 2 s. There is a report on the fracture diagnostic program that learned the location, plain X-ray direction, identification of fractures, and identification of left and right using data from more than 250,000 plain X-ray images of hand and foot [7]. Its accuracy in determining the location, direction, and left and right is more than 90%. The accuracy in identifying fractures is 83%, which is the same standard as specialized orthopedic surgeons. This result is expected to ensure clinical application. Regarding distal radius fractures, an AI that completed learning using approximately 35,000 plain X-ray images of wrists was approved by the United States Food and Drug Administration in 2018 [6]. It is a program that first diagnoses whether there is a fracture or not. If it identifies a fracture, it displays the location of the fracture in a heat map according to the trustworthiness of the diagnosis. The AUC of its fracture identification ability is 0.975, which is highly accurate. It has been reported that the support from this program reduced diagnostic error rate among emergency room doctors by 47%. Our developed program used less data for its learning than other reports, which was between 1/100 and 1/1000. However, with diagnostic accuracy of distal radius fractures at 98.0 ± 1.6% and AUC at 0.991, we obtained similar results with the same standard or even better than the previous reports. It is inferred that a good diagnostic rate was obtained, despite using a relatively small amount of data, due to the employment of the trained VGG16 model as the base, increasing the learning data up to the optimal quantity through image augmentation, and because the diagnosis was conducted in two stages using anteroposterior and lateral images, as in the case with the diagnosis by clinicians. It displayed good sensitivity at 98.6 ± 1.8%; hence, it is considered to be useful as a screening tool for initial diagnosis. However, although the diagnostic rate of fracture of the styloid process of the ulna was lower than that of distal radius fractures at 91.1 ± 2.5%, it is inferred that its diagnostic accuracy will improve if the number of datasets is increased to the same level as that used for distal radius fractures.

There are several limitations to this study. First, we used a small amount of learning data. It is not shown whether the same result can be obtained if the same number of data, other than the one used for this study, is used for learning. Second, because clinical diagnoses by orthopedic surgeons from clinical sites were used as the gold standard for correct identification of fractures, imaging tests, such as CTs, were not performed for all examples. However, the data included in this study were diagnosed as fracture by confirming the callus formation in the subsequent course even if fractures were diagnosed without CT. Third, it did not examine minute fractures that can be discovered using CT and other image detections or old fractures such as distal radius malunions and ulnar styloid nonunion. It is necessary to conduct further learning using more data, considering the diagnosis of fractures with hardly any displacement and old fractures. Finally, the data used in this study are from adults over 18 years, whose epiphyseal lines are already closed. Thus, AI cannot identify fractures for children’s bones, where epiphyseal lines still remain. A new network must be constructed for fractures in children.

Image diagnosis using AI is expected to improve significantly in the future. However, although imaging is one of the important examinations in the diagnosis of disease, comprehensive assessment of other clinical examination results, such as clinical histories, physical findings, and blood tests, is essential. In addition, it is to be remembered that image diagnosis using AI is only a supplementary diagnosis. Cohort studies and large randomized controlled trials are also needed to increase the reliability of AI-based diagnostics and predictions of injury in the field of orthopedics [12, 13].

Conclusion

In conclusion, our method resulted in a good diagnostic rate even when using a relatively small amount of data. The image diagnostic technologies using AI are speedy. They are highly applicable technologies that can be used in the diagnosis of every disease appearing on plain X-ray, CT, or magnetic resonance imaging (MRI). Thus, further application in clinical sites is expected.

Availability of data and materials

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

AI:: Artificial intelligence
CNN:: Convolutional neural network
CI:: Confidence interval
CT:: Computed tomography
DICOM:: Digital imaging and communications in medicine
ROC:: Receiver operating characteristic
AUC:: Area under curve
MRI:: Magnetic resonance imaging

References

Rundgren J, Bojan A, Mellstrand Navarro C, Enocson A. Epidemiology, classification, treatment and mortality of distal radius fractures in adults: an observational study of 23,394 fractures from the National Swedish Fracture Register. BMC Musculoskelet Disord. 2020;21:88. https://doi.org/10.1186/s12891-020-3097-8.
Article PubMed PubMed Central Google Scholar
Arora R, Lutz M, Deml C, Krappinger D, Haug L, Gabl M. A prospective randomized trial comparing nonoperative treatment with volar locking plate fixation for displaced and unstable distal radial fractures in patients sixty-five years of age and older. J Bone Joint Surg Am. 2011;93:2146–53. https://doi.org/10.2106/JBJS.J.01597.
Article PubMed Google Scholar
Young BT, Rayan GM. Outcome following nonoperative treatment of displaced distal radius fractures in low-demand patients older than 60 years. J Hand Surg Am. 2000;25:19–28. https://doi.org/10.1053/jhsu.2000.jhsu025a0019.
Article CAS PubMed Google Scholar
Zhou X, Takayama R, Wang S, Hara T, Fujita H. Deep learning of the sectional appearances of 3D CT images for anatomical structure segmentation based on an FCN voting method. Med Phys. 2017;44:5221–33. https://doi.org/10.1002/mp.12480.
Article PubMed Google Scholar
Fujisawa Y, Otomo Y, Ogata Y, Nakamura Y, Fujita R, Ishitsuka Y, Watanabe R, Okiyama N, Ohara K, Fujimoto M. Deep-learning-based, computer-aided classifier developed with a small dataset of clinical images surpasses board-certified dermatologists in skin tumour diagnosis. Br J Dermatol. 2019;180:373–81. https://doi.org/10.1111/bjd.16924.
Article CAS PubMed Google Scholar
Lindsey R, Daluiski A, Chopra S, Lachapelle A, Mozer M, Sicular S, Hanel D, Gardner M, Gupta A, Hotchkiss R, Potter H. Deep neural network improves fracture detection by clinicians. Proc Natl Acad Sci U S A. 2018;115:11591–6. https://doi.org/10.1073/pnas.1806905115.
Article CAS PubMed PubMed Central Google Scholar
Olczak J, Fahlberg N, Maki A, Razavian AS, Jilert A, Stark A, Sköldenberg O, Gordon M. Artificial intelligence for analyzing orthopedic trauma radiographs. Acta Orthop. 2017;88(6):581–6. https://doi.org/10.1080/17453674.2017.1344459.
Article PubMed PubMed Central Google Scholar
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. CoRR. arXiv:1409.1556 (2014).
Mayne IP, Brydges R, Moktar J, Murnaghan ML. Development and assessment of a distal radial fracture model as a clinical teaching tool. J Bone Joint Surg Am. 2016;98:410–6. https://doi.org/10.2106/JBJS.O.00565.
Article PubMed Google Scholar
Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)
Lee H, Tajmir S, Lee J, Zissen M, Yeshiwas BA, Alkasab TK, Choy G, Do S. Fully automated deep learning system for bone age assessment. J Digit Imaging. 2017;30(4):427–41. https://doi.org/10.1007/s10278-017-9955-8.
Article PubMed PubMed Central Google Scholar
Maffulli N, Rodriguez HC, Stone IW, Nam A, Song A, Gupta M, Alvarado R, Ramon D, Gupta A. Artificial intelligence and machine learning in orthopedic surgery: a systematic review protocol. J Orthop Surg Res. 2020;15(1):478. https://doi.org/10.1186/s13018-020-02002-z.
Article PubMed PubMed Central Google Scholar
Kakavas G, Malliaropoulos N, Pruna R, Maffulli N. Artificial intelligence: a tool for sports trauma prediction. Injury. 2020;51(Suppl 3):S63–5. https://doi.org/10.1016/j.injury.2019.08.033.
Article PubMed Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This research was funded by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number JP18K12104. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Author information

Authors and Affiliations

Department of Orthopaedic Surgery, Graduate School of Medicine, Osaka University, 2-2 Yamada-oka, Suita, Osaka, 565-0871, Japan
Kunihiro Oka, Ryoya Shiode, Hiroyuki Tanaka, Toru Iwahashi & Tsuyoshi Murase
Ibaraki Medical Center, Department of Orthopaedic Surgery, Tokyo Medical University, 3-20-1 Chuo, Ami, Inashiki, Ibaraki, 300-0395, Japan
Yuichi Yoshii

Authors

Kunihiro Oka
View author publications
You can also search for this author in PubMed Google Scholar
Ryoya Shiode
View author publications
You can also search for this author in PubMed Google Scholar
Yuichi Yoshii
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyuki Tanaka
View author publications
You can also search for this author in PubMed Google Scholar
Toru Iwahashi
View author publications
You can also search for this author in PubMed Google Scholar
Tsuyoshi Murase
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

KO contributed to conceptualization, methodology, validation, resources, data curation, writing original draft preparation, and funding acquisition, RS contributed to methodology, validation, formal analysis, resources, and writing review and editing, YY contributed to validation, resources, writing review and editing, and supervision, HT contributed to formal analysis, investigation, writing review and editing, supervision, TI contributed to investigation, resources, writing review and editing, TM contributed to conceptualization, methodology, resources, data curation, writing review and editing, supervision. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kunihiro Oka.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the Ethical Review Board of Osaka University Hospital (approval number: 18137). This study was performed in line with the principles of the Helsinki Declaration, revised in 2000.

Consent for publication

Written consents for publication were obtained from all study participants.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Oka, K., Shiode, R., Yoshii, Y. et al. Artificial intelligence to diagnosis distal radius fracture using biplane plain X-rays. J Orthop Surg Res 16, 694 (2021). https://doi.org/10.1186/s13018-021-02845-0

Download citation

Received: 03 September 2021
Accepted: 15 November 2021
Published: 25 November 2021
DOI: https://doi.org/10.1186/s13018-021-02845-0

Artificial intelligence to diagnosis distal radius fracture using biplane plain X-rays