Application of machine learning in predicting survival outcomes involving real-world data: a scoping review – BMC Medical Research Methodology – BMC Medical Research Methodology

This scoping review search identified a total of 98 studies from the PubMed and 159 studies from the Embase. After duplication elimination and abstract and title screening, studies were considered potentially relevant and selected for full-article review. Among these, 28 peer-reviewed studies involving at least one unique ML model across a broad list of patient populations and settings were included in this review (Fig. 1).

Fig. 1

Flow diagram for study selection

Study characteristics

Data source and sample size

The majority of these studies (N= 14) were conducted using data from the US setting [26,27,28,29,30,31,32,33,34,35,36,37,38,39]. Among these US studies, most of them used administrative claims datasets [26, 27, 30, 31, 36, 39] (N = 6), including SEER-Medicare, Veteran health administrative claims, followed by electronic health records or electronic medical records [32,33,34, 37, 38] (N = 5), and a few used patient registry cohort datasets [28, 29, 35] (N = 3). The remaining non-US studies used datasets from Europe [40,41,42,43,44,45] (N = 6), including Italy, Netherlands, Denmark, Switzerland, or Germany, and a few others used data from England (N = 3), China (N = 4), or India (N = 1). The median sample size was 10,614 (range: 142- 247,960 patients).

Study population and time-to-event outcomes

Most of these studies involving ML-based prediction for survival analyses focused on cancer patients [26, 27, 30, 31, 34, 36, 38, 39, 42, 43, 46, 47] (N = 12 studies); for ML studies in oncology, these models were used to predict their survival outcomes or cancer recurrence.

The remaining studies focused on patient populations in the cardiology [28, 35, 48, 49], COVID-19 [37, 50, 51], diabetes [29, 40, 41, 45], schizophrenia disorder patients [52], HBV infection [53], inpatients patients [32], those undergoing heart transplantation [33], or intensive care unit (ICU) patients [54]. Across these non-cancer disease areas, these ML studies predicted clinical outcomes, such as the development of cardiovascular events [29, 40, 41, 45], the incidence of sudden cardiac arrest or venous thromboembolism or ventricular fibrillation, and death. Only one study used ML for treatment outcomes [52]. A detailed summary of included studies is provided in e-supporting Table 1.

Table 1 ML algorithms used in the studies and featuring studies (N = 28 studies)

Characteristics of ML Models

Use of ML for survival outcomes

The types of ML algorithms used are reported in Table 1. From this review, the popular ML algorithms for survival analyses include random survival forests (N = 16) [26,27,28, 31,32,33,34, 36, 42, 43, 45,46,47,48,49, 53], boosted tree methods [31, 34, 42, 43, 45, 51, 53], and artificial neural networks [30, 31, 37, 39,40,41, 43, 44, 46, 47, 49, 50]. Support vector methods [34, 35, 42, 53] and regularization (LASSO, ridge, elastic net) [43, 49, 52, 53] were also common, and other algorithms included naïve bayes [29, 35, 53], K-nearest neighbor [35], multi-layer perceptron [34]. Table 2 provides a description of these ML algorithms.

Table 2 Description of ML methods

ML model performance

Across these studies, while three studies [28, 33, 45] failed to report model performance in AUC, others reported AUC for model evaluation. Among those studies reporting AUC for evaluation of model performance, there was a variation across the AUCs reported, with their mean at 0.7852 and their median at 0.789 (IQR: 0.73–0.847; range: 0.6–0.9503). While one study developed one ML model [52] with an AUC below 0.7, most of these studies developed at least one ML model with an AUC above 0.70. The boxplot and beeswarm plot of model performance based on the AUC, stratified by the type of ML algorithms, are shown in Fig. 2. The descriptive findings of the AUC across these ML models are shown in Table 3.

Fig. 2
figure 2

ML Performances for survival analyses

Table 3 Descriptive statistics of AUC by ML algorithms

Model validations

Among all included studies, twenty-five studies (89%) applied model validation. Table 4 details model validation methods among these included studies. Nineteen studies used internal validation, with fifteen studies randomly split datasets into a training set and a test set for validation of model performance [26, 27, 29, 31, 32, 36, 38,39,40,41, 44, 46, 49, 50, 53], while four studies internally validated model performance using cross-validation methods [35, 42, 48, 52]. Six studies applied external validation methods, including using an independent dataset for model performance validation [30, 34, 37, 43, 47], or used prospective validation [51]. Still, three studies did not report any validation methods [28, 33, 45].

Table 4 Overview Of methods for model validation across studies (N = 28 studies)

Comparison between model performance of ML vs. CPH

A total of 17 studies (61%) compared the performance of ML models with the traditional regression-based CPH. Most studies (N= 15 studies, 88%) reported that ML had better performance than CPH models [26, 30,31,32, 34, 36, 38,39,40,41,42,43, 48,49,50]. Only one study reported that ML algorithms did not surpass the CPH model [27], and one study did not make a comparison, although it included CPH [29]. Details can be found in e-supporting Table 1.

Quality assessment

Among the included studies, a majority had high quality based on the appraisal of six domains of the QUIPS tool. Details of quality assessment for all included studies are summarized in e-supporting doc Table 3.

Leave a Reply

Your email address will not be published. Required fields are marked *