Development of prognostic model and multivariate analysis for breast cancer survival patients using SEER database
Main Article Content
Abstract
Background: Many studies employed machine learning (ML) to forecast the prognosis of breast cancer (BC) patients and discovered that the ML model showed high individualized forecasting ability. Breast cancer is the most frequent kind of carcinoma in women globally and ranks as the leading cause of death in women.
Objectives: This study intends to use the Surveillance, Epidemiology, and End Results dataset to categorize breast carcinoma cases’ alive and dead conditions. Deep learning and machine learning have been extensively utilized in clinical studies to address various categorization problems due to their ability to manage massive data sets in an organized manner. Pre-processing the data allows it to be visualized and analyzed for making critical choices. This study describes a realistic machine learning-based strategy for categorizing the SEER breast cancer dataset.
Materials and methods: We employed classification and machine learning algorithms to classify breast cancer mortality. Four well-known classification ML algorithms were employed in this study. To identify risk factors, we employed multivariate analysis using the data set.
Results: The decision tree performed the best accuracy (0.914) among all the models. T4 stage (β=1.4, p<0.001, OR=4.22, 95% CI (2.06-8.64), N2 stage (β=0.39, p=0.008, OR= 1.49, 95% CI (1.111-1.997) found to be major risk factors for breast cancer mortality using multivariate analysis.
Conclusion: The significant prognostic variables affecting the breast carcinoma survival rates reported in the current research are relevant and might be turned into decision support systems in the medical realm.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Personal views expressed by the contributors in their articles are not necessarily those of the Journal of Associated Medical Sciences, Faculty of Associated Medical Sciences, Chiang Mai University.
References
DeSantis C, Siegel R, Bandi P, Jemal A. Breast cancer statistics, 2011. CA Cancer J Clin. 2011; 61(6): 408-18. doi: 10.3322/caac.20134.
Ataollahi MR, Sharifi J, Paknahad MR, Paknahad A. Breast cancer and associated factors: a review. J Med Life. 2015; 8(Spec Iss 4): 6. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5319297
Venturi S. Is there a role for iodine in breast diseases?. The Breast. 2001; 10(5): 379-82. doi.org/10.1054/brst.2000.0267
Heravi Karimovi M, Pourdehqan M, Jadid Milani M, Foroutan SK, Aieen F. Study of the effects of group counseling on quality of sexual life of patients with breast cancer under chemotherapy at Imam Khomeini Hospital. J Maz Univ Med. 2006; 16(54): 43-51. http://jmums.mazums.ac.ir/article-1-133-en.html
Wu R, Luo J, Wan H, Zhang H, Yuan Y, Hu H, Feng J, Wen J, Wang Y, Li J, Liang Q. Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results database. PlosOne. 2023; 18(1): e0280340. doi. org/10.1371/journal.pone.0280340
Qiu J, Wu Q, Ding G, Xu Y, Feng S. A survey of machine learning for big data processing. Eurasip J Adv. 2016; 2016: 1-6.
Sarker IH. Machine learning: Algorithms, real-world applications and research directions. SN Comput Sci. 2021; 2(3): 160. doi.org/10.1007/s42979-021-005 92-x
Gardezi SJ, Elazab A, Lei B, Wang T. Breast cancer detection and diagnosis using mammographic data: Systematic review. J Med Internet Res. 2019; 21(7): e14464. doi.org/10.2196/14464
Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J. 2015; 13: 8-17. doi.org/10.1016/j.csbj.2014.11.005
Toth R, Schiffmann H, Hube-Magg C, Büscheck F, Höflmayer D, Weidemann S, Lebok P, Fraune C, Minner S, Schlomm T, Sauter G. Random forest-based modelling to detect biomarkers for prostate cancer progression. Clin Epigenetics. 2019; 11: 1-5. doi.org/ 10.1186/s13148-019-0736-8
Statnikov A, Wang L, Aliferis CF. A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinformatics. 2008; 9(1): 1-0. doi.org/10.1186 /1471-2105-9-319
Panda NR. A review on logistic regression in medical research. Natl J community Med. 2022; 13(04): 265-70. doi.org/10.55489/njcm.134202222
Zhou X, Liu KY, Wong ST. Cancer classification and prediction using logistic regression with Bayesian gene selection. J Biomed Inform. 2004; 37(4): 249- 59. doi.org/10.1016/j.jbi.2004.07.009
Liu L. Research on logistic regression algorithm of breast cancer diagnose data by machine learning. In: 2018 International Conference on Robots & Intelligent System (ICRIS) 2018 May 26 (pp. 157-160). IEEE. doi.org/10.1109/ICRIS.2018.00049
Sharma M, Singh SK, Agrawal P, Madaan V. Classification of clinical dataset of cervical cancer using KNN. Indian J Sci Technol. 2016; 9(28): 1-5. doi: 10.17485/ijst/2016/v9i28/98380
Odajima K, Pawlovsky AP. A detailed description of the use of the kNN method for breast cancer diagnosis. In: 2014 7th International Conference on Biomedical Engineering and Informatics 2014 Oct 14 (pp. 688- 692). IEEE. doi.org/10.1109/BMEI.2014.7002861
Pawlovsky AP, Nagahashi M. A method to select a good setting for the kNN algorithm when using it for breast cancer prognosis. In: IEEE-EMBS International conference on biomedical and health informatics (BHI) 2014 Jun 1 (pp. 189-192). IEEE. doi.org/10.1109/BHI.2014.6864336
Chaudhari P, Agarwal H, Bhateja V. Data augmentation for cancer classification in oncogenomics: an improved KNN based approach. Evol Intell. 2021; 14: 489-98. doi.org/10.1007/s12065-019-00283-w
Panda NR, Mahanta KL Pati JK, Varanasi PR, Bhuyan R. Comparison of Some Prediction Models and their Relevance in the Clinical Research. Int. J. Stat. Med. 2023; 12: 12-19. doi.org/10.6000/1929-6029.2023.12.02
Venkatesan EV, Velmurugan T. Performance analysis of decision tree algorithms for breast cancer classification. Indian J Sci Technol. 2015; 8(29): 1-8.
Hamsagayathri P, Sampath P. Performance analysis of breast cancer classification using decision tree classifiers. Int J Curr Pharm Res. 2017; 9(2): 19-25. doi: http://dx.doi.org/10.22159/ijcpr.2017v9i2.17383
Hazra R, Banerjee M, Badia L. Machine learning for breast cancer classification with Ann and decision tree. In: 2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON) 2020 Nov 4 (pp. 0522-0527). IEEE. doi.org/10.1109/IEMCON51383.2020.9284936
Ghiasi MM, Zendehboudi S. Application of decision tree-based ensemble learning in the classification of breast cancer. Comput Biol Med.2021; 128: 104089. doi.org/10.1016/j.compbiomed.2020.104089
Tarawneh O, Otair M, Husni M, Abuaddous HY, Tarawneh M, Almomani MA. Breast cancer classification using decision tree algorithms. Int J Adv Comput Sci Appl. 2022; 13(4): 676-80
Dhanya R, Paul IR, Akula SS, Sivakumar M, Nair JJ. A comparative study for breast cancer prediction using machine learning and feature selection. In: 2019 International conference on intelligent computing and control systems (ICCS) 2019 May 15 (pp.1049- 1055). IEEE. doi.org/10.1109/ICCS45141.2019. 9065563
Sharma S, Aggarwal A, Choudhury T. Breast cancer detection using machine learning algorithms. In: 2018 International conference on computational techniques, electronics and mechanical systems
(CTEMS) 2018 Dec 21 (pp. 114-118). IEEE. doi.org/10.1109/CTEMS.2018.8769187
Abdulla SH, Sagheer AM, Veisi H. Breast cancer classification using machine learning techniques: A review. TURCOMAT. 2021; 12(14): 1970-9.
Manikandan P, Ramyachitra D, Nandhini R. Fuzzy based algorithms to predict MicroRNA regulated protein interaction pathways and ranking estimation in Arabidopsis thaliana. Gene. 2019; 692: 170-5. doi.org/10.1016/j.gene.2018.12.066
Huang CC, Chan SY, Lee WC, Chiang CJ, Lu TP, Cheng SH. Development of a prediction model for breast cancer based on the national cancer registry in Taiwan. Breast Cancer Res. 2019; 21(1): 1-9. doi.org/10.1186/s13058-019-1172-6
Ganggayah MD, Taib NA, Har YC, Lio P, Dhillon SK. Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med Inform Decis Mak. 2019; 19: 1-7. doi.org/10.1186/s12911-019-0801-4
Chen MT, Sun HF, Zhao Y, Fu WY, Yang LP, Gao SP, Li LD, Jiang HL, Jin W. Comparison of patterns and prognosis among distant metastatic breast cancer patients by age groups: a SEER population-based analysis. Sci Rep. 2017; 7(1): 9254. doi.org/10.1038/s41598-017-10166-8