机器学习算法和COX列线图在肝细胞癌术后生存预测中的应用价值

Application value of machine learning algorithms and COX nomogram in the survival prediction of hepatocellular carcinoma after resection

  • 摘要: 目的:探讨机器学习算法和COX列线图在肝细胞癌术后生存预测中的应用价值。
    方法:
    采用回顾性描述性研究方法。收集2012年1月至2017年1月中国医学科学院北京协和医学院肿瘤医院收治的375例肝细胞癌行根治性肝切除术患者的临床病理资料;男304例,女71例;中位年龄为57岁,年龄范围为21~79岁。375例患者通过计算机产生随机数方法以8∶2比例分为训练集300例和验证集75例,应用逻辑回归、支持向量机、决策树、随机森林、人工神经网络机器学习算法构建肝细胞癌患者术后生存的预测模型,筛选性能最优的机器学习算法预测模型;构建肝细胞癌患者术后生存预测的COX列线图预测模型;比较最优机器学习算法预测模型和COX列线图预测模型预测肝细胞癌患者术后生存的性能。观察指标:(1)训练集与验证集患者临床病理资料分析。(2)训练集与验证集患者随访及生存情况。(3)机器学习算法预测模型构建及验证。(4)COX列线图预测模型构建及验证。(5)随机森林机器学习算法预测模型与COX列线图预测模型预测性能评价。采用门诊或电话方式进行随访,了解患者生存情况。随访时间截至2019年12月或患者死亡。正态分布的计量资料以±s表示,组间比较采用配对t检验。偏态分布的计量资料以 M(P25,P75)或M(范围)表示,组间比较采用Mann-Whitney U检验。计数资料以绝对数表示,当Tmin≥5,N≥40时,组间比较采用x2检验;当1≤Tmin≤5,N≥40时,采用校正x2检验;当Tmin<1或N<40时,采用Fisher确切概率法。采用Kaplan-Meier法计算生存率和绘制生存曲线。采用COX比例风险模型进行单因素分析,将P<0.2的变量纳入Lasso回归分析,根据λ值筛选影响预后的变量,最后将变量纳入COX比例风险模型进行多因素分析。
    结果:(1)训练集与验证集患者临床病理资料分析:训练集和验证集患者微血管侵犯(无、有),肝硬化(无、有)分别为292、8例,105、195例和69、6例,37、38例,两组患者比较,差异均有统计学意义(x2=4.749,5.239,P<0.05)。(2)训练集与验证集患者随访及生存情况:训练集与验证集患者均获得随访。训练集300例患者随访时间为1.1~85.5个月,中位随访时间为50.3个月。验证集75例患者随访时间为1.0~85.7个月,中位随访时间为46.7个月。375例肝细胞癌患者术后1、 3年总体生存率分别为91.7%、79.5%。训练集和验证集患者术后1、3年总体生存率分别为92.0%、79.7%和90.7%、81.9%。两组患者术后生存情况比较,差异无统计学意义(x2=0.113,P>0.05)。(3)机器学习算法预测模型构建及验证。①筛选最优机器学习算法预测模型:根据变量对预测肝细胞癌术后3年生存的信息增益度,应用逻辑回归、支持向量机、决策树、随机森林和人工神经网络5种机器学习算法对肝细胞癌临床病理因素进行变量综合排名。筛选主要预测因素为乙型肝炎e抗原(HBeAg)、手术方式、肿瘤最大直径、围术期输血、肝被膜侵犯、肝脏Ⅳ段侵犯。将预测因素前3、6、9、12、15、18、21、24、27、29个变量依次引入5种机器学习算法。其结果显示:当引入9个变量时,逻辑回归、支持向量机、决策树、随机森林机器学习算法预测模型受试者工作特征曲线的曲线下面积(AUC)趋于稳定。当引入变量>12个时,人工神经网络机器学习算法预测模型AUC波动明显,逻辑回归、支持向量机机器学习算法预测模型AUC稳定性可继续改善,而随机森林机器学习算法预测模型AUC接近0.990,说明随机森林机器学习算法预测模型为最优机器学习算法预测模型。②随机森林机器学习算法预测模型优化和验证:将预测因素29个变量依次引入随机森林机器学习算法预测模型中,构建训练集最佳随机森林机器学习算法预测模型。其结果显示:当引入变量=10个时,网格搜索法示最佳决策树结点个数=4,最佳决策树数目=1 000;当引入变量≥10个时,随机森林机器学习算法预测模型AUC稳定在0.990左右。其中当引入变量=10个时,随机森林机器学习算法预测模型预测训练集术后3年总体生存AUC为0.992,灵敏度为0.629,特异度为0.996,预测验证集术后3年总体生存AUC为0.723,灵敏度为0.177,特异度为0.948。(4)COX列线图预测模型构建及验证。①训练集患者术后生存因素分析。单因素分析结果显示:HBeAg、甲胎蛋白、围术期输血、肿瘤最大直径、肝被膜侵犯、肿瘤分化程度是影响肝细胞癌患者术后生存的相关因素(风险比=1.958,1.878,2.170,1.188,2.052,0.222,95%可信区间为1.185~3.235,1.147~3.076,1.389~3.393,1.092~1.291,1.240~3.395,0.070~0.703,P<0.05)。将P<0.2的临床病理因素纳入Lasso回归分析,其结果显示:性别,HBeAg,甲胎蛋白,手术方式,围术期输血,肿瘤最大直径,肿瘤位置在肝脏Ⅴ段和肝脏Ⅷ段,肝被膜侵犯,肿瘤分化程度(高分化、中高分化、中分化、中低分化)是影响肝细胞癌患者术后生存的相关因素。进一步将上述临床病理因素纳入多因素COX回归分析,其结果显示:HBeAg、手术方式、肿瘤最大直径是肝细胞癌患者术后生存的独立影响因素(风险比=1.770,8.799,1.142,95%可信区间为1.049~2.987,1.203~64.342,1.051~1.242,P<0.05)。②COX列线图预测模型的构建和验证:将训练集COX多因素分析结果中P≤0.1的临床病理因素引入Rstudio软件及其rms软件包,构建训练集COX列线图预测模型。COX列线图预测模型预测术后总体生存的Cindex为0.723(se=0.028),预测训练集术后3年总体生存AUC为0.760,预测验证集术后3年总体生存AUC为0.795。训练集校准图验证显示COX列线图预测模型对术后生存有较好预测效果。COX列线图回归函数=0.627 06×HBeAg(正常=0,异常=1)+0.134 34×肿瘤最大直径(cm)+2.107 58×手术方式(腹腔镜=0,开腹手术=1)+0.545 58×围术期输血(无输血=0,输血=1)-1.421 33×高分化(非高分化=0,高分化=1)。计算所有患者COX列线图风险评分,应用Xtile软件寻找COX列线图风险评分最佳阈值,风险评分≥2.9分为高危组,风险评分<2.9分为低危组。Kaplan-Meier总体生存曲线结果显示:训练集低危组和高危组患者术后总体生存比较,差异有统计学意义(x2=33.065,P<0.05)。验证集低危组和高危组患者术后总体生存比较,差异有统计学意义(x2=6.585,P<0.05)。进一步采用决策曲线分析结果显示:联合HBeAg、手术方式、围术期输血、肿瘤最大直径和肿瘤分化程度因素的COX列线图预测模型预测性能优于单一因素的预测性能。(5)随机森林机器学习算法预测模型和COX列线图预测模型预测性能评价:通过对2种模型中共同含有的重要变量(肿瘤最大直径)进行分析,并将2种模型通过预测误差曲线进行比较,观察2种模型的预测差异。其结果显示:肿瘤最大直径为2.2 cm时,随机森林机器学习算法和COX列线图预测模型预测患者术后3年生存率分别为77.17%和74.77%(x2=0.182, P>0.05);肿瘤最大直径为6.3 cm时,随机森林机器学习算法和COX列线图预测模型预测患者术后3年生存率分别为57.51%和61.65%(x2=0.394,P>0.05);肿瘤最大直径为14.2 cm时,随机森林机器学习算法和COX列线图预测模型预测患者术后3年生存率分别为51.03%和27.52%(x2=12.762,P<0.05)。随着肿瘤最大直径增加,2种模型预测患者生存率差异增大。验证集中,随机森林机器学习算法预测模型预测患者术后3年总体生存AUC为0.723,COX列线图预测模型预测患者术后3年总体生存AUC为0.795,两者比较,差异有统计学意义(t=3.353,P<0.05)。采用Bootstrap交叉验证结果显示:随机森林机器学习算法预测模型和COX列线图预测模型预测3年生存的整合Brier 得分分别为0.139、0.134,COX列线图预测模型预测误差低于随机森林机器学习算法预测模型。
    结论:与机器学习算法预测模型比较,COX列线图预测模型预测肝细胞癌术后3年生存性能更佳,且其变量少,易于临床使用。

     

    Abstract: Objective:To investigate the application value of machine learning algorithms and COX nomogram in the survival prediction of hepatocellular carcinoma (HCC) after resection.
    Methods:The retrospective and descriptive study was conducted. The clinicopathological data of 375 patients with HCC who underwent radical resection in the Cancer Hospital of Chinese Academy of Medical Sciences and Peking Union Medical College from January 2012 to January 2017 were collected. There were 304 males and 71 females, aged from 21 to 79 years, with a median age of 57 years. According to the random numbers showed in the computer, 375 patients were divided into training dataset consisting of 300 patients and validation dataset consisting of 75 patients, with a ratio of 8∶2. Machine learning algorithms including logistic regression (LR), supporting vector machine (SVM), decision tree (DT), random forest (RF), and artificial neural network (ANN) were used to construct survival prediction models for HCC after resection, so as to identify the optimal machine learning algorithm prediction model. A COX nomogram prediction model for predicting postoperative survival in patients with HCC was also constructed. Comparison of performance for predicting postoperative survival of HCC patients was conducted between the optimal machine learning algorithm prediction model and the COX nomogram prediction model. Observation indicators: (1) analysis of clinicopathological data of patients in the training dataset and validation dataset; (2) followup and survival of patients in the training dataset and validation dataset; (3) construction and evaluation of machine learning algorithm prediction models; (4) construction and evaluation of COX nomogram prediction model; (5) evaluation of prediction performance between RF machine learning algorithm prediction model and COX nomogram prediction model. Followup was performed using outpatient examination or telephone interview to detect survival of patients up to December 2019 or death. Measurement data with normal distribution were expressed as Mean±SD, and comparison between groups was analyzed by the paired t test. Measurement data with skewed distribution were expressed as M (P25, P75) or M (range), and comparison between groups was analyzed by the Mann-Whitney U test. Count data were represented as absolute numbers. Comparison between groups was performed using the chisquare test when Tmin ≥5 and N ≥40, using the calibration chisquare test when 1≤ Tmin ≤5 and N ≥40, and using Fisher exact probability when Tmin <1 or N <40. The Kaplan-Meier method was used to calculate survival rate and draw survival curve. The COX proportional hazard model was used for univariate analysis, and variables with P<0.2 were included for the Lasso regression analysis. According to the lambda value, variables affecting prognosis were screened for COX proportional hazard model to perform multivariate analysis.
    Results:(1) Analysis of clinicopathological data of patients in the training dataset and validation dataset: cases without microvascular invasion or with microvascular invasion, cases without liver cirrhosis or with liver cirrhosis of the training dataset were 292, 8, 105, 195, respectively, versus 69, 6, 37, 38 of the validation dataset, showing significant differences between the two groups (x2=4.749, 5.239, P<0.05). (2) Followup and survival of patients in the training dataset and validation dataset: all the 375 patients received follow-up. The 300 patients in the training dataset were followed up for 1.1-85.5 months, with a median followup time of 50.3 months. Seventyfive patients in the validation dataset were followed up for 1.0-85.7 months, with a median followup time of 46.7 months. The postoperative 1, 3year overall survival rates of the 375 patients were 91.7%,79.5%. The postoperative 1, 3year overall survival rates of the training dataset were 92.0%, 79.7%, versus 90.7%, 81.9% of the validation dataset, showing no significant difference in postoperative survival between the two groups (x2=0.113, P>0.05). (3) Construction and evaluation of machine learning algorithm prediction models. ① Selection of the optimal machine learning algorithm prediction model: according to information divergence of variables for prediction of 3 years postoperative survival of HCC, five machine learning algorithms were used to comprehensively rank the variables of clinicopathological factors of HCC, including LR, SVM, DT, RF, and ANN. The main predictive factors were screened out, as hepatitis B e antigen (HBeAg), surgical procedure, maximum tumor diameter, perioperative blood transfusion, liver capsule invasion, and liver segment Ⅳ invasion. The rank sequence 3, 6, 9, 12, 15, 18, 21, 24, 27, 29 variables of predictive factors were introduced into 5 machine learning algorithms in turn. The results showed that the area under curve (AUC) of the receiver operating charateristic curve of LR, SVM, DT, and RF machine learning algorithm prediction models tended to be stable when 9 variables are introduced. When more than 12 variables were introduced, the AUC of ANN machine learning algorithm prediction model fluctuated significantly, the stability of AUC of LR and SVM machine learning algorithm prediction models continued to improve, and the AUC of RF machine learning algorithm prediction model was nearly 0.990, suggesting RF machine learning algorithm prediction model as the optimal machine learning algorithm prediction model. ② Optimization and evaluation of RF machine learning algorithm prediction model: 29 variables of predictive factors were sequentially introduced into the RF machine learning algorithm to construct the optimal RF machine learning algorithm prediction model in the training dataset. The results showed that when 10 variables were introduced, results of grid search method showed 4 as the optimal number of nodes in DT, and 1 000 as the optimal number of DT. When the number of introduced variables were not less than 10, the AUC of RF machine learning algorithm prediction model was about 0.990. When 10 variables were introduced, the RF machine learning algorithm prediction model had an AUC of 0.992 for postoperative overall survival of 3 years, a sensitivity of 0.629, a specificity of 0.996 in the training dataset, an AUC of 0.723 for postoperative overall survival of 3 years, a sensitivity of 0.177, a specificity of 0.948 in the validation dataset. (4) Construction and evaluation of COX nomogram prediction model. ① Analysis of postoperative survival factors of HCC patients in the training dataset. Results of univariate analysis showed that HBeAg, alpha fetoprotein (AFP), preoperative blood transfusion, maximum tumor diameter, liver capsule invasion, and degree of tumor differentiation were related factors for postoperative survival of HCC patients [hazard ratio (HR)=1.958, 1.878, 2.170, 1.188, 2.052, 0.222, 95% confidence interval (CI): 1.185-3.235, 1.147-3.076, 1.389-3.393, 1.092-1.291, 1.240-3.395, 0.070-0.703, P<0.05]. Clinicopathological data with P<0.2 were included for Lasso regression analysis, and the results showed that age, HBeAg, AFP, surgical procedure, perioperative blood transfusion, maximum tumor diameter, tumor located at liver segment Ⅴ or Ⅷ, liver capsule invasion, and degree of tumor differentiation as high differentiation, moderatehigh differentiation, moderate differentiation, moderatelow differentiation were related factors for postoperative survival of HCC patients. The above factors were included for further multivariate COX analysis, and the results showed that HBeAg, surgical procedure, maximum tumor diameter were independent factors affecting postoperative survival of HCC patients (HR=1.770, 8.799, 1.142, 95%CI: 1.049- 2.987, 1.203-64.342, 1.051-1.242, P<0.05). ② Construction and evaluation of COX nomogram prediction model: the clinicopathological factors of P≤0.1 in the COX multivariate analysis were induced to Rstudio software and rms software package to construct COX nomogram prediction model in the training dataset. The COX nomogram prediction model for predicting postoperative overall survival had an consistency index of 0.723 (se=0.028), an AUC of 0.760 for postoperative overall survival of 3 years in the training dataset, an AUC of 0.795 for postoperative overall survival of 3 years in the validation dataset. The verification of the calibration plot in the training dataset showed that the COX nomogram prediction model had a good prediction performance for postoperative survival. COX nomogram score=0.627 06×HBeAg (normal=0, abnormal=1)+0.134 34×maximum tumor diameter (cm)+2.107 58×surgical procedure (laparoscopy=0, laparotomy=1)+0.545 58×perioperative blood transfusion (without blood transfusion=0, with blood transfusion=1)-1.421 33×high differentiation (nonhigh differentiation=0, high differentiation=1). The COX nomogram risk scores of all patients were calculated. Xtile software was used to find the optimal threshold of COX nomogram risk scores. Patients with risk scores ≥2.9 were assigned into high risk group, and patients with risk scores <2.9 were assigned into low risk group. Results of Kaplan-Meier overall survival curve showed a significant difference in the postoperative overall survival between low risk group and high risk group of the training dataset (x2=33.065, P<0.05). There was a significant difference in the postoperative overall survival between low risk group and high risk group of the validation dataset (x2=6.585, P<0.05). Results of further analysis by the decisionmaking curve showed that COX nomogram prediction model based on the combination of HBeAg, surgical procedure, perioperative blood transfusion, maximum tumor diameter, and degree of tumor differentiation was superior to any of the above individual factors in prediction performance. (5) Evaluation of prediction performance between RF machine learning algorithm prediction model and COX nomogram prediction model: prediction difference between two models was investigated by analyzing maximun tumor diameter (the important variable shared in both models), and by comparing the predictive error curve of both models. The results showed that the postoperative 3year survival rates predicted by RF machine learning algorithm prediction model and COX nomogram prediction model were 77.17% and 74.77% respectively for tumor with maximum diameter of 2.2 cm (x2=0.182, P>0.05), 57.51% and 61.65% for tumor with maximum diameter of 6.3 cm (x2=0.394, P>0.05), 51.03% and 27.52% for tumor with maximum diameter of 14.2 cm (x2=12.762, P<0.05). With the increase of the maximum tumor diameter, the difference in survival rates predicted between the two models turned larger. In the validation dataset, the AUC for postoperative overall survival of 3 years of RF machine learning algorithm prediction model and COX nomogram prediction model was 0.723 and 0.795, showing a significant difference between the two models (t=3.353, P<0.05). Resluts of Bootstrap crossvalidation for prediction error showed that the integrated Brier scores of RF machine learning algorithm prediction model and COX nomogram prediction model for predicting 3year survival were 0.139 and 0.134, respectively. The prediction error of COX nomogram prediction model was lower than that of RF machine learning algorithm prediction model.
    Conclusion:Compared with machine learning algorithm prediction models, the COX nomogram prediction model performs better in predicting 3 years postoperative survival of HCC, with fewer variables, which is easy for clinical use.

     

/

返回文章
返回