Development of prognostic model and multivariate analysis for breast cancer survival patients using SEER database

Main Article Content

Nihar Ranjan Panda
Kamal Lochan Mahanta
Jitendra kumar Pati
Soumya Subhashree Satapathy
Ruchi Bhuyan

Abstract

Background: Many studies employed machine learning (ML) to forecast the prognosis of breast cancer (BC) patients and discovered that the ML model showed high individualized forecasting ability. Breast cancer is the most frequent kind of carcinoma in women globally and ranks as the leading cause of death in women.


Objectives: This study intends to use the Surveillance, Epidemiology, and End Results dataset to categorize breast carcinoma cases’ alive and dead conditions. Deep learning and machine learning have been extensively utilized in clinical studies to address various categorization problems due to their ability to manage massive data sets in an organized manner. Pre-processing the data allows it to be visualized and analyzed for making critical choices. This study describes a realistic machine learning-based strategy for categorizing the SEER breast cancer dataset.


Materials and methods: We employed classification and machine learning algorithms to classify breast cancer mortality. Four well-known classification ML algorithms were employed in this study. To identify risk factors, we employed multivariate analysis using the data set.


Results: The decision tree performed the best accuracy (0.914) among all the models. T4 stage (β=1.4, p<0.001, OR=4.22, 95% CI (2.06-8.64), N2 stage (β=0.39, p=0.008, OR= 1.49, 95% CI (1.111-1.997) found to be major risk factors for breast cancer mortality using multivariate analysis.


Conclusion: The significant prognostic variables affecting the breast carcinoma survival rates reported in the current research are relevant and might be turned into decision support systems in the medical realm.

Article Details

How to Cite
Panda, N. R. ., Mahanta, K. L., Pati, J. kumar, Satapathy, S. S., & Bhuyan, R. (2023). Development of prognostic model and multivariate analysis for breast cancer survival patients using SEER database. Journal of Associated Medical Sciences, 57(1), 67–76. Retrieved from https://he01.tci-thaijo.org/index.php/bulletinAMS/article/view/265128
Section
Research Articles
Author Biographies

Kamal Lochan Mahanta, Department of Mathematics, CV Raman Global University, Bhubaneswar, India

 

 

Jitendra kumar Pati, KIIT International School, KIIT University, Bhubaneswar, India

 

 

References

DeSantis C, Siegel R, Bandi P, Jemal A. Breast cancer statistics, 2011. CA Cancer J Clin. 2011; 61(6): 408-18. doi: 10.3322/caac.20134.

Ataollahi MR, Sharifi J, Paknahad MR, Paknahad A. Breast cancer and associated factors: a review. J Med Life. 2015; 8(Spec Iss 4): 6. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5319297

Venturi S. Is there a role for iodine in breast diseases?. The Breast. 2001; 10(5): 379-82. doi.org/10.1054/brst.2000.0267

Heravi Karimovi M, Pourdehqan M, Jadid Milani M, Foroutan SK, Aieen F. Study of the effects of group counseling on quality of sexual life of patients with breast cancer under chemotherapy at Imam Khomeini Hospital. J Maz Univ Med. 2006; 16(54): 43-51. http://jmums.mazums.ac.ir/article-1-133-en.html

Wu R, Luo J, Wan H, Zhang H, Yuan Y, Hu H, Feng J, Wen J, Wang Y, Li J, Liang Q. Evaluation of machine learning algorithms for the prognosis of breast cancer from the Surveillance, Epidemiology, and End Results database. PlosOne. 2023; 18(1): e0280340. doi. org/10.1371/journal.pone.0280340

Qiu J, Wu Q, Ding G, Xu Y, Feng S. A survey of machine learning for big data processing. Eurasip J Adv. 2016; 2016: 1-6.

Sarker IH. Machine learning: Algorithms, real-world applications and research directions. SN Comput Sci. 2021; 2(3): 160. doi.org/10.1007/s42979-021-005 92-x

Gardezi SJ, Elazab A, Lei B, Wang T. Breast cancer detection and diagnosis using mammographic data: Systematic review. J Med Internet Res. 2019; 21(7): e14464. doi.org/10.2196/14464

Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J. 2015; 13: 8-17. doi.org/10.1016/j.csbj.2014.11.005

Toth R, Schiffmann H, Hube-Magg C, Büscheck F, Höflmayer D, Weidemann S, Lebok P, Fraune C, Minner S, Schlomm T, Sauter G. Random forest-based modelling to detect biomarkers for prostate cancer progression. Clin Epigenetics. 2019; 11: 1-5. doi.org/ 10.1186/s13148-019-0736-8

Statnikov A, Wang L, Aliferis CF. A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinformatics. 2008; 9(1): 1-0. doi.org/10.1186 /1471-2105-9-319

Panda NR. A review on logistic regression in medical research. Natl J community Med. 2022; 13(04): 265-70. doi.org/10.55489/njcm.134202222

Zhou X, Liu KY, Wong ST. Cancer classification and prediction using logistic regression with Bayesian gene selection. J Biomed Inform. 2004; 37(4): 249- 59. doi.org/10.1016/j.jbi.2004.07.009

Liu L. Research on logistic regression algorithm of breast cancer diagnose data by machine learning. In: 2018 International Conference on Robots & Intelligent System (ICRIS) 2018 May 26 (pp. 157-160). IEEE. doi.org/10.1109/ICRIS.2018.00049

Sharma M, Singh SK, Agrawal P, Madaan V. Classification of clinical dataset of cervical cancer using KNN. Indian J Sci Technol. 2016; 9(28): 1-5. doi: 10.17485/ijst/2016/v9i28/98380

Odajima K, Pawlovsky AP. A detailed description of the use of the kNN method for breast cancer diagnosis. In: 2014 7th International Conference on Biomedical Engineering and Informatics 2014 Oct 14 (pp. 688- 692). IEEE. doi.org/10.1109/BMEI.2014.7002861

Pawlovsky AP, Nagahashi M. A method to select a good setting for the kNN algorithm when using it for breast cancer prognosis. In: IEEE-EMBS International conference on biomedical and health informatics (BHI) 2014 Jun 1 (pp. 189-192). IEEE. doi.org/10.1109/BHI.2014.6864336

Chaudhari P, Agarwal H, Bhateja V. Data augmentation for cancer classification in oncogenomics: an improved KNN based approach. Evol Intell. 2021; 14: 489-98. doi.org/10.1007/s12065-019-00283-w

Panda NR, Mahanta KL Pati JK, Varanasi PR, Bhuyan R. Comparison of Some Prediction Models and their Relevance in the Clinical Research. Int. J. Stat. Med. 2023; 12: 12-19. doi.org/10.6000/1929-6029.2023.12.02

Venkatesan EV, Velmurugan T. Performance analysis of decision tree algorithms for breast cancer classification. Indian J Sci Technol. 2015; 8(29): 1-8.

Hamsagayathri P, Sampath P. Performance analysis of breast cancer classification using decision tree classifiers. Int J Curr Pharm Res. 2017; 9(2): 19-25. doi: http://dx.doi.org/10.22159/ijcpr.2017v9i2.17383

Hazra R, Banerjee M, Badia L. Machine learning for breast cancer classification with Ann and decision tree. In: 2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON) 2020 Nov 4 (pp. 0522-0527). IEEE. doi.org/10.1109/IEMCON51383.2020.9284936

Ghiasi MM, Zendehboudi S. Application of decision tree-based ensemble learning in the classification of breast cancer. Comput Biol Med.2021; 128: 104089. doi.org/10.1016/j.compbiomed.2020.104089

Tarawneh O, Otair M, Husni M, Abuaddous HY, Tarawneh M, Almomani MA. Breast cancer classification using decision tree algorithms. Int J Adv Comput Sci Appl. 2022; 13(4): 676-80

Dhanya R, Paul IR, Akula SS, Sivakumar M, Nair JJ. A comparative study for breast cancer prediction using machine learning and feature selection. In: 2019 International conference on intelligent computing and control systems (ICCS) 2019 May 15 (pp.1049- 1055). IEEE. doi.org/10.1109/ICCS45141.2019. 9065563

Sharma S, Aggarwal A, Choudhury T. Breast cancer detection using machine learning algorithms. In: 2018 International conference on computational techniques, electronics and mechanical systems

(CTEMS) 2018 Dec 21 (pp. 114-118). IEEE. doi.org/10.1109/CTEMS.2018.8769187

Abdulla SH, Sagheer AM, Veisi H. Breast cancer classification using machine learning techniques: A review. TURCOMAT. 2021; 12(14): 1970-9.

Manikandan P, Ramyachitra D, Nandhini R. Fuzzy based algorithms to predict MicroRNA regulated protein interaction pathways and ranking estimation in Arabidopsis thaliana. Gene. 2019; 692: 170-5. doi.org/10.1016/j.gene.2018.12.066

Huang CC, Chan SY, Lee WC, Chiang CJ, Lu TP, Cheng SH. Development of a prediction model for breast cancer based on the national cancer registry in Taiwan. Breast Cancer Res. 2019; 21(1): 1-9. doi.org/10.1186/s13058-019-1172-6

Ganggayah MD, Taib NA, Har YC, Lio P, Dhillon SK. Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med Inform Decis Mak. 2019; 19: 1-7. doi.org/10.1186/s12911-019-0801-4

Chen MT, Sun HF, Zhao Y, Fu WY, Yang LP, Gao SP, Li LD, Jiang HL, Jin W. Comparison of patterns and prognosis among distant metastatic breast cancer patients by age groups: a SEER population-based analysis. Sci Rep. 2017; 7(1): 9254. doi.org/10.1038/s41598-017-10166-8