Objective To explore the application value of machine learning models based on preoperative multimodal magnetic resonance imaging (MRI) data in predicting the risk of meta-chronous liver metastases (MLM) after radical resection of colorectal cancer.
Methods The retros-pective cohort study was conducted. The clinicopathological data of 356 patients with primary colorectal cancer who were admitted to Henan Provincial People′s Hospital from January 2019 to October 2023 were collected. There were 213 males and 143 females, aged (61±12) years. The patients were divided into a training set (249 cases) and a test set (107 cases) at a ratio of 7∶3 using the random number table method. The training set was used to construct the prediction model, and the test set was used to verify the prediction model. All patients underwent preoperative pelvic MRI examina-tion and radical resection of colorectal cancer. Seventeen clinical and imaging indicators of the patients were included, and independent predictors were screened out using least absolute shrinkage and selection operator (LASSO) regression and multivariate Logistic regression analysis. Six machine learning prediction models were constructed based on the independent predictors. Observation indicators: (1) results of preoperative imaging examination and postoperative follow‑up of patients; (2) screening of predictive factors; (3) construction of machine learning prediction models and comparison of their performance. Comparison of measurement data with normal distribution was conducted using the independent sample t test. Comparison of count data between groups was conducted using the chi‑square test. LASSO regression and Logistic regression models were used to screen predictive factors. Python 3.9 was used for data processing, establishment of machine learning prediction models, and visual output of results. Receiver operating characteristic (ROC) curves were plotted, of which the area under the curve (AUC), sensitivity, specificity, accuracy, precision rate, F1 score, Brier score, negative predictive value, and Kappa value were calculated. The DeLong test was used to compare the predictive ability of the models, and the SHapley Additive exPlanations (SHAP) was used for interpretability analysis.
Results (1) Results of preoperative imaging examination and postoperative follow‑up of patients: among the 356 patients, 171 cases had tumors located in the colon and 185 cases in the rectum, 131 cases had tumor diameter ≥5 cm, 259 cases had unclear tumor boundaries, 145 cases had tumor circumferential enhancement in the arterial phase, 116 cases had peritumoral hyperintensity in the arterial phase, 222 cases had lymph node metastasis. All 356 pati-ents were followed up for ≥2 years after surgery, with a follow-up time of 32.5 (range, 24.0‒68.5) months, during which 92 cases developed MLM. (2) Screening of predictive factors: 17 clinical and imaging factors of patients in the training set were included into LASSO regression, and 7 factors with non-zero coefficients were selected which were significantly associated with MLM. Results of binary Logistic regression analysis showed that volume transfer constant, carcinoembryonic antigen, tumor diameter, tumor circumferential enhancement in the arterial phase, peritumoral hyperintensity in the arterial phase, tumor T staging, and lymph node metastasis were independent influencing factors for MLM of 249 colorectal cancer patients after radical resection in the training set (odds ratio=3.22, 2.49, 3.33, 10.92, 7.46, 2.74, 3.55, 95% confidence interval as 1.25‒8.32, 1.11‒5.58, 1.51‒7.33, 4.59‒25.94, 3.18‒17.52, 1.19‒6.29, 1.40‒9.00, P<0.05). (3) Construction of machine learning prediction models and comparison of their performance: based on the results of multivariate analysis, 6 machine learning prediction models were constructed. The AUC values of 6 machine learning prediction models in the training set and the test set were >0.8. Regarding model performance evaluation, among the 6 machine learning prediction models in the test set, the Logistic regression model showed prominent sensiti-vity (0.87), AUC (0.89, 95% confidence interval as 0.82‒0.95), and F1 score (0.68) while the random forest model had superior accuracy, precision rate, and specificity. DeLong test showed no significant difference in predictive efficacy among the 6 machine learning prediction models (P>0.05). The confusion matrix results of 6 machine learning prediction models in the test set showed that Logistic regression model had the best performance in the number of true positives (26 cases), while the random forest model had a better performance in the number of true negatives (66 cases). The feature importance analysis based on Logistic regression model in the test set showed that tumor circumferential enhancement in the arterial phase had the highest contribution to the model′s pre-diction of MLM. The SHAP bee swarm plot based on Logistic regression model in the test set showed that the model′s prediction of MLM had the strongest dependence on tumor circumferential enhan-cement in the arterial phase.
Conclusions Volume transfer constant, carcinoembryonic antigen, tumor diameter, tumor circumferential enhancement in the arterial phase, peritumoral hyperintensity in the arterial phase, tumor T staging, and lymph node metastasis are independent influencing factors for MLM of colorectal cancer patients after radical resection. Six machine learning prediction models based on above features demonstrate high predictive efficacy, in which Logistic regression model show the best comprehensive predictive performance.