Expected effects of AI in hip fracture diagnosis
As human lifespans prolong and the elderly population grows, the socioeconomic problems associated with hip fractures and postoperative care are public concerns worldwide [13]. Early diagnosis and treatment are essential to preserving patient function, improving quality of life and alleviating economic burden. Rapid diagnosis of non-displaced hip fractures by human could be difficult and sometimes requires the use of additional radiographs, bone scans, CT, or MRI. But, these additional tests are not always available in all hospitals. In addition, demineralization and overlying soft tissues may interfere with diagnosis of hip fracture [18]. Delayed diagnosis and treatment may lead to complications, such as malunion, osteonecrosis, and arthritis [19]. Moreover, as total number of imaging and radiological examinations has increased, radiology departments cannot report all acquired radiographs in timely manner [7]. For this reason, several studies on detecting hip fractures using ML have already been reported [1, 7, 8, 11,12,13,14,15,16,17,18,19,20,21]. Early diagnosis of hip fracture by AI algorithm in clinical course could help reduce medical costs, facilitate further preventive practices, and increase the quality of health care [20]. It also improves the allocation of resources, reduce the need for unnecessary consultations, and facilitate faster patient disposition. In particular, physicians can focus on conceptually more demanding tasks in high-volume clinics. However, reports on the effectiveness of early diagnosis of hip fractures by AI algorithm seem to be insufficient. It is considered that further studies are needed.
CNN architecture used for hip fracture diagnosis
In this study, several CNN structures were used for radiograph image analysis in each study for hip fracture diagnosis. Among the included studies, CNNs using DenseNet or GoogLeNet architecture models were used the most. These two CNNs are inception architecture, which are deep CNNs with an architecture design composed of repeating components [22]. GoogLenet is a CNN architecture with 22 layers and is widely used in image analysis such as radiographs because of its excellent ability to recognize visual patterns [23]. In addition, GoogLeNet has 9 inception modules including 1 × 1 convolution which allows to derive various characteristics by accumulating the feature maps generated in the previous layer [22]. This structure of GoogLenet allows to extract features from different layers without the need for additional computational burdens [24]. DenseNet is a Dense Convolution Network, a CNN that can receive input from all previous layers through concatenation in a more advanced architecture than that of GoogleNet. DenseNet has the advantage of increasing computational efficiency through a compact network and being able to train by considering more diverse feature sets in all layers [25]. In addition, Inception-V3 and Xception used in the included studies are the more advanced CNN architectures of GoogLenet. These results suggested that researchers have been applied progressively advanced CNN architectures of AI for hip fracture diagnosis (Table 3).
Diagnosis accuracy in AI versus human: Can AI replace human role in hip fracture diagnosis?
In the results of the articles included in our study, the accuracy of diagnosis for hip fracture by AI algorithm was over 90%, except for the results of Beyaz et al., and AUC of fracture diagnosis was over 0.9, which was very high [17]. Also, the diagnostic accuracy of AI was higher in a comparative study on the accuracy of hip fracture diagnosis between AI and human. Urakawa et al. presented a AI model that detected intertrochanteric fractures with an accuracy of 95.5% and an AUC of 0.984 [12]. This was higher than human's diagnostic accuracy of 92.2% and AUC of 0.969. Adams et al. reported a conventional neural network model to diagnose femoral neck fractures with an accuracy of 88.1–94.4% [11]. These figure is also comparable to experts and resident`s diagnostic accuracy of 93.5 and 92.9%. In the study of Cheng et al. and Sato et al., human diagnostic accuracy was lower than that by AI algorithm [1, 20]. Nevertheless, it is still questionable whether can AI replace human role in hip fracture diagnosis. Bae et al. used AI to diagnose femoral neck fracture after deep learning of AI using 4,189 images. Diagnostic accuracy of AI algorithm was 97.1%. However, they reported that it is difficult to detect a non-displaced fracture of the femoral neck, despite high diagnostic accuracy of AI [21]. This means that AI can reveal the limits of diagnosis in cases where AI is not trained or lacks learning. In addition, since all AI systems included in this study are not integrated with other clinical information, we consider that the clinical suspicion of human for occult fracture through evaluation of the patient's overall condition cannot yet be simulated by AI algorithm. Mawatari et al. also argued that, because the AUC values of AI aided experts were higher than the AI algorithm alone, a valid diagnosis could not be obtained by the radiograph alone, and it was inevitably affected by the quality of AI algorithm [18]. Thus, we believed that AI algorithm does not totally replace human intelligence in the current clinical environment; however, AI algorithms can complement and augment the ability and knowledge of physicians.
The increase in human dependence on hip fracture detection using AI algorithm may be another issue because it is difficult and time-consuming for doctors to make their own clinical judgments by synthesizing the results of examinations performed face-to-face with patients [20]. To solve this issue, Cheng et al. made the hip fracture detection site by AI to be highlighted and displayed so that physicians could check the results of the AL algorithm and make a final clinical judgment [20]. With the development of technology, the AI algorithm will further develop, and the tendency of humans to rely on AI will increase further in future. Further research is needed for further solutions to this problem in future.
Efforts for AI deep learning and high diagnostic accuracy for hip fracture
Because deep learning of AI automatically and adaptively learn features from data, large and clean datasets are required [17]. Better results for detection of hip fracture by AI are decided according to the number of images. In our study, we summarized the 2 methods suggested by previous studies to overcome this. The first is data augmentation and generation where data are manipulated to artificially enlarge the dataset. The number of patients visiting a single hospital is limited, and acquiring image information from other institutions may cause a problem of personal information leakage. Sato et al. created augmented 10,484 images by classifying the images of 4851 patients into fractured side and normal side according to the time they were taken, and used it for deep learning of AI [1]. Mutasa et al. created 9063 augmented images with 737 hip fracture images and 326 normal images in 550 patients, and Beyaz et al. also generated 2106 augmented images from 234 radiographs of 65 patients [16, 17]. The second is to use various type of image information. Yu et al. reported that a distinctive fracture line or cortical angular deformity of a neck fracture is easy to detect in a single radiographic view, but a larger sample size is required for intertrochanteric fractures with complex and multiple fracture lines because the spectrum of fracture morphology is large [15]. Also, soft tissue shading or femur alignment variation may affect the detection of fractures by AI [13]. To overcome this, Yamada et al. argued that the fracture detection rate could be increased by adding a lateral view as well as a hip AP view [19]. On the other hand, Yoon et al. reported that CT images as well as radiographs were used for fracture classification of intertrochanteric fractures, reducing time consumption due to fracture classification and helping to plan accurate surgery [8]. Also, Mawatari et al. used MRI as well as CT for hip fracture detection [18]. However, this has a disadvantage in that additional cost is consumed and it is difficult to obtain a normal hip lateral view.
As AI can quickly process large amounts of patient information, it has incredible potential in diagnosing and classifying patients' diseases [26]. Especially the usefulness of AI is being studied in the trauma prediction, which has a wide range of individual differences in the number and severity of injuries due to the involvement of many external and internal factors [27]. The present study is expected to be helpful in verifying the effectiveness of AI in diagnosing these specific diseases.
There are several limitations in our study. First, we did not consider the type of AI algorism and degree of training of AI algorism. Second, we did not consider the quality of radiographs for deep learning. The selected images are likely to have high quality. Also, these images can only represent characteristics of a specific age and sex. Third, implants used for surgical treatment of hip fracture were not considered.