Predictive model for cerebrovascular disease risk factors using ant-miner algorithm
Keywords:
predictive model, cerebrovascular disease, ant-miner algorithmAbstract
Stroke remains a leading cause of mortality and disability worldwide, with a significant impact on public health. Early accurate risk assessment and diagnosis are crucial for effective prevention and treatment strategies. This study aimed to analyze medical data of cerebrovascular disease from an international database, extract beneficial risk factors related to diagnosis, and create an advanced model for stroke diagnosis using data mining techniques. This study was analyzed
data from the Stroke Prediction Dataset, comprising 5,110 patient records with 11 clinical attributes. Multivariate logistic regression analysis was employed to identify significant risk factors. The Ant-Miner Algorithm, inspired by ant colony behavior, was utilized to develop a diagnostic model for stroke. The model's performance was evaluated using 5-fold cross-validation and compared with other machine learning techniques, including decision trees and random forest. The results found that identified age over 65 years (OR = 3.72, 95% CI = 2.86-4.84), hypertension (OR = 2.98, 95% CI = 2.31-3.84), heart disease (OR = 2.75, 95% CI = 2.07-3.66), and high blood glucose levels (OR = 2.45, 95% CI = 1.89-3.17 for >150 mg/dL) as the most significant risk factors for stroke. The Ant-Miner Algorithm-based model excelled in handling imbalanced medical data, generating interpretable clinical rules high and valuable for clinical decisions. It achieved 98.24% accuracy, 92.00% precision, 90.20% sensitivity, and 91.09% F1-score. These explainable rules, like “IF (age>65) AND (avg_glucose>150) AND (hypertension=1) THEN high stroke risk with OR = 27.12”, are applicable in real-world stroke diagnostics. In conclusion, the Ant-Miner algorithm model proved to be an accurate and stroke diagnostic model which simply identifies risk factors and diagnostic rules that can aid in early detection and prevention of stroke, potentially improving clinical decision-making through this data mining innovation. Further studies are recommended to validate the model's performance in other diseases of diverse populations and to explore its potential applications in clinical practice and public health interventions. The integration of this model with electronic health records and the development of user-friendly applications for healthcare professionals could significantly enhance stroke risk assessment and management tools.
References
American Stroke Association [Internet]. Types of stroke and treatment; [cited 2024 Apr 11]. Available from: https://www.stroke.org/en/about-stroke/types-of-stroke.
World Health Organization [Internet]. Cardiovascular diseases (CVDs); [cited 2024 Apr 12]. Available from: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds).
International Health Policy Program, Ministry of Public Health. Disability-adjusted life years DALYs 2019. Bangkok: Handy Press; 2023. (in Thai)
Durairaj M, Ranjani VA. Data mining applications in healthcare sector: a study. Int J Sci Technol Res 2013;2(10):29-35.
Parpinelli AA, Lopes HS, Freitas AA. Data mining with an ant colony optimization algorithm. IEEE Trans Evol Comput 2002;6(4):321-32. doi:10.1109/TEVC.2002.802452.
Soriano F. Stroke prediction dataset [Internet]; [cited 2024 Apr 13]. Available from: https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset.
Han J, Kamber M, Pei J. Data mining concepts and techniques. 3rd ed. Massachusetts: Morgan Kaufmann; 2012.
Witten IH, Frank E, Hall MA, et al. Data mining practical machine learning tools and techniques. 4th ed. Massachusetts: Morgan Kaufmann; 2016.
Kankhajane C. Accuracy assessment remote sensing technical note no. 3 [Internet]; Faculty of Forestry, Kasetsart University; 2018 [cited 2024 May 13]. Available from: https://forest-admin.forest.ku.ac.th/304xxx/?q=system/files/book/3%282018%29%20Accuracy%20Assessment.pdf. [in Thai]
Hartshorn S. Machine learning with random forests and decision trees: a visual guide for beginners. Amazon Kindle; 2016.
Pacharawongsakda E. An introduction to data mining techniques. 2nd ed. Bangkok: ASIA Digital PRESS; 2014.
Fernandez A, Garcia S, Herrera F, et al. SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res 2018;61:863-905. doi:10.1613/jair.1.11192.
Wang S, Liu W, Wu J, et al. Training deep neural networks on imbalanced data sets. In: 2016 International Joint Conference on Neural Networks (IJCNN); 2016 Jul 24-29; Vancouver, BC, Canada. IEEE; 2016;4368-74. doi:10.1109/IJCNN.2016.7727770.
Wang Y, Xingquan Z, Yong J, et al. Prevalence, knowledge, and treatment of transient ischemic attacks in China. Neurology 2015;84(23):2354-61. doi:10.1212/WNL.0000000000001665.
Boehme AK, Esenwa C, Elkind MS. Stroke risk factors, genetics, and prevention. Circ Res 2017;120(3):472-95. doi:10.1161/CIRCRESAHA.116.308398.
Suwanwela NC. Stroke epidemiology in Thailand. J Stroke 2014;11-7. (In Thai)
Zheng Y, Ley SH, Hu FB. Global etiology and epidemiology of type 2 diabetes mellitus and its complications. Nat Rev Endocrinol 2018;14(2):88-98. doi:10.1038/nrendo.2017.151.
Shen L, Chen H, Yu Z, et al. Evolving support vector machines using fruit fly optimization for medical data classification. Knowl Based Syst 2016;96:61-75. doi:10.1016/j.knosys.2016.01.002.
Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 2019;1(5):206-15. doi:10.1038/s42256-019-0048-x.
Molnar C, Casalicchio G, Bischl B. Interpretable machine learning – a brief history, state-of-the-art and challenges. In: Koprinska I, et al., editors. ECML PKDD 2020 Workshops. Cham: Springer; 2020. p. 417–31. (Communications in Computer and Information Science; vol. 1323). doi:10.1007/978-3-030-65965-3_28.
Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019;25(1):44-56. doi:10.1038/s41591-018-0300-7.
Chawla NV, Bowyer KW, Hall LO, et al. SMOTE synthetic minority over-sampling technique. J Artif Intell Res 2002;16:321-57. doi:10.1613/jair.953.
Johnson JM, Khoshgoftaar TM. Survey on deep learning with class imbalance. J Big Data 2019;6(27):1-54. doi:10.1186/s40537-019-0192-5.
Liu L, Chen W, Zhou H, et al. Chinese Stroke Association guidelines for clinical management of cerebrovascular disorders executive summary and 2019 update of clinical management of ischemic cerebrovascular diseases. Stroke Vasc Neurol 2020;5(2):159-76. doi:10.1136/svn-2020-000378.
Zhang C, Liu C, Zhang X, et al. An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst Appl 2017;82:128-50. doi:10.1016/j.eswa.2017.04.003.
Domingos P. A few useful things to know about machine learning. Commun ACM 2012;55(10):78-87. doi:10.1145/2347736.2347755.
Khosla A, Cao Y, Lin CCY, et al. An integrated machine learning approach to stroke prediction. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '10). New York (NY): Association for Computing Machinery; 2010. p. 183–92. doi:10.1145/1835804.1835830.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Journal of Medicine and Health Sciences

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.