基于术前磁共振成像多模态数据的机器学习模型预测结直肠癌根治术后异时性肝转移的应用价值

Application value of machine learning models based on preoperative multimodal magnetic resonance imaging data in predicting metachronous liver metastases after radical resection of colorectal cancer

  • 摘要:
    目的 探讨基于术前磁共振成像(MRI)多模态数据构建机器学习模型预测结直肠癌根治术后异时性肝转移(MLM)发生风险的应用价值。
    方法 采用回顾性队列研究方法。收集2019年1月至2023年10月河南省人民医院收治的356例原发性结直肠癌患者的临床病理资料;男213例,女143例;年龄为(61±12)岁。患者通过随机数字表法按7∶3分为训练集(249例)和测试集(107例)。训练集用于构建预测模型,测试集用于验证预测模型。患者术前均行盆腔MRI检查,并行结直肠癌根治术。纳入17项患者的临床和影像学指标,使用最小绝对收缩和选择算子(LASSO)回归和多因素Logistic回归分析从其中筛选独立预测因子。基于独立预测因子构建6种机器学习预测模型。观察指标:(1)患者术前影像学检查与术后随访结果。(2)筛选预测因素。(3)机器学习预测模型构建及性能比较。正态分布的计量资料组间比较采用独立样本t检验。计数资料组间比较采用χ2检验。LASSO回归和Logistic回归模型筛选预测因子。使用Python3.9行数据处理,机器学习预测模型建立及结果可视化输出,绘制受试者工作特征曲线(ROC),并计算曲线下面积(AUC)、灵敏度、特异度、准确度、精确率、F1评分、Brier分数、阴性预测率、Kappa值,使用DeLong检验比较模型的预测能力,采用沙普利加法解释(SHAP)行可解释性分析。
    结果 (1)患者术前影像学检查与术后随访结果:356例患者中,肿瘤部位为结肠171例、直肠185例,肿瘤长径≥5 cm 131例,肿瘤边界不清晰259例,肿瘤动脉期环形强化145例,动脉期瘤周高信号116例,有淋巴结转移222例。356例患者术后均行≥2年随访,随访时间为32.5(24.0~68.5)个月,随访期间92例发生MLM。(2)筛选预测因素:将训练集患者的17个临床及影像学因素纳入LASSO回归,筛选7个非零系数指标显著与MLM相关因素。二元Logistic回归分析结果显示:容量转移常数、癌胚抗原、肿瘤长径、肿瘤动脉期环形强化、肿瘤动脉期瘤周高信号、肿瘤T分期和淋巴结转移均是249例训练集结直肠癌患者根治术后发生MLM的独立影响因素(优势比=3.22、2.49、3.33、10.92、7.46 、2.74、3.55,95%可信区间为1.25~8.32、1.11~5.58、1.51~7.33、4.59~25.94、3.18~17.52、1.19~6.29、1.40~9.00,P<0.05)。(3)机器学习预测模型构建及性能比较:根据多因素分析结果构建6种机器学习预测模型,训练集和测试集模型AUC均>0.8。模型性能评价:测试集6种机器学习预测模型中,逻辑回归模型的灵敏度(0.87)、AUC(0.89,95%可信区间为0.82~0.95)和F1分数(0.68)较突出,而随机森林模型的准确率、精确率和特异度较优。Delong检验结果显示:6种机器学习预测模型间预测效能比较,差异均无统计学意义(P>0.05)。测试集6种机器学习预测模型的混淆矩阵结果显示:逻辑回归模型在真阳性数(26例)表现最佳;而随机森林模型真阴性数(66例)表现更优。测试集中基于逻辑回归模型的特征重要性分析,肿瘤动脉期环形强化对模型预判MLM具有最高贡献。测试集基于逻辑回归模型的SHAP蜂群图显示:模型预测MLM对肿瘤动脉期环形强化的依赖关系最强。
    结论 容量转移常数、癌胚抗原、肿瘤长径、肿瘤动脉期环形强化、肿瘤动脉期瘤周高信号、肿瘤T分期及淋巴结转移均是结直肠癌患者根治术后发生MLM的独立影响因素。基于上述特征构建的6种机器学习预测模型均表现出良好的预测效能,其中逻辑回归模型的综合预测性能最优。

     

    Abstract:
    Objective To explore the application value of machine learning models based on preoperative multimodal magnetic resonance imaging (MRI) data in predicting the risk of meta-chronous liver metastases (MLM) after radical resection of colorectal cancer.
    Methods The retros-pective cohort study was conducted. The clinicopathological data of 356 patients with primary colorectal cancer who were admitted to Henan Provincial People′s Hospital from January 2019 to October 2023 were collected. There were 213 males and 143 females, aged (61±12) years. The patients were divided into a training set (249 cases) and a test set (107 cases) at a ratio of 7∶3 using the random number table method. The training set was used to construct the prediction model, and the test set was used to verify the prediction model. All patients underwent preoperative pelvic MRI examina-tion and radical resection of colorectal cancer. Seventeen clinical and imaging indicators of the patients were included, and independent predictors were screened out using least absolute shrinkage and selection operator (LASSO) regression and multivariate Logistic regression analysis. Six machine learning prediction models were constructed based on the independent predictors. Observation indicators: (1) results of preoperative imaging examination and postoperative follow‑up of patients; (2) screening of predictive factors; (3) construction of machine learning prediction models and comparison of their performance. Comparison of measurement data with normal distribution was conducted using the independent sample t test. Comparison of count data between groups was conducted using the chi‑square test. LASSO regression and Logistic regression models were used to screen predictive factors. Python 3.9 was used for data processing, establishment of machine learning prediction models, and visual output of results. Receiver operating characteristic (ROC) curves were plotted, of which the area under the curve (AUC), sensitivity, specificity, accuracy, precision rate, F1 score, Brier score, negative predictive value, and Kappa value were calculated. The DeLong test was used to compare the predictive ability of the models, and the SHapley Additive exPlanations (SHAP) was used for interpretability analysis.
    Results (1) Results of preoperative imaging examination and postoperative follow‑up of patients: among the 356 patients, 171 cases had tumors located in the colon and 185 cases in the rectum, 131 cases had tumor diameter ≥5 cm, 259 cases had unclear tumor boundaries, 145 cases had tumor circumferential enhancement in the arterial phase, 116 cases had peritumoral hyperintensity in the arterial phase, 222 cases had lymph node metastasis. All 356 pati-ents were followed up for ≥2 years after surgery, with a follow-up time of 32.5 (range, 24.0‒68.5) months, during which 92 cases developed MLM. (2) Screening of predictive factors: 17 clinical and imaging factors of patients in the training set were included into LASSO regression, and 7 factors with non-zero coefficients were selected which were significantly associated with MLM. Results of binary Logistic regression analysis showed that volume transfer constant, carcinoembryonic antigen, tumor diameter, tumor circumferential enhancement in the arterial phase, peritumoral hyperintensity in the arterial phase, tumor T staging, and lymph node metastasis were independent influencing factors for MLM of 249 colorectal cancer patients after radical resection in the training set (odds ratio=3.22, 2.49, 3.33, 10.92, 7.46, 2.74, 3.55, 95% confidence interval as 1.25‒8.32, 1.11‒5.58, 1.51‒7.33, 4.59‒25.94, 3.18‒17.52, 1.19‒6.29, 1.40‒9.00, P<0.05). (3) Construction of machine learning prediction models and comparison of their performance: based on the results of multivariate analysis, 6 machine learning prediction models were constructed. The AUC values of 6 machine learning prediction models in the training set and the test set were >0.8. Regarding model performance evaluation, among the 6 machine learning prediction models in the test set, the Logistic regression model showed prominent sensiti-vity (0.87), AUC (0.89, 95% confidence interval as 0.82‒0.95), and F1 score (0.68) while the random forest model had superior accuracy, precision rate, and specificity. DeLong test showed no significant difference in predictive efficacy among the 6 machine learning prediction models (P>0.05). The confusion matrix results of 6 machine learning prediction models in the test set showed that Logistic regression model had the best performance in the number of true positives (26 cases), while the random forest model had a better performance in the number of true negatives (66 cases). The feature importance analysis based on Logistic regression model in the test set showed that tumor circumferential enhancement in the arterial phase had the highest contribution to the model′s pre-diction of MLM. The SHAP bee swarm plot based on Logistic regression model in the test set showed that the model′s prediction of MLM had the strongest dependence on tumor circumferential enhan-cement in the arterial phase.
    Conclusions Volume transfer constant, carcinoembryonic antigen, tumor diameter, tumor circumferential enhancement in the arterial phase, peritumoral hyperintensity in the arterial phase, tumor T staging, and lymph node metastasis are independent influencing factors for MLM of colorectal cancer patients after radical resection. Six machine learning prediction models based on above features demonstrate high predictive efficacy, in which Logistic regression model show the best comprehensive predictive performance.

     

/

返回文章
返回