Data-driven diagnosis of cardiovascular disease in patients with comorbid hypertension, dyslipidemia or diabetes using classification techniques 10.55131/jphd/2026/240216
Main Article Content
Abstract
Cardiovascular disease (CVD) is a leading cause of death globally, especially among individuals with chronic comorbidities such as hypertension, diabetes, and dyslipidemia. Early identification of high-risk individuals is essential for timely intervention, yet conventional risk models often underperform in complex clinical populations. Using real-world clinical data, this study aimed to develop and evaluate machine learning (ML) models for predicting CVD with a focus on optimizing sensitivity and interpretability. We used hospital-based clinical data from patients with at least one of the following comorbidities: hypertension, diabetes or dyslipidemia. The applied ML algorithms included logistic regression, naive Bayes, k-nearest neighbors, decision tree (DT), random forest, support vector machine (SVM), and artificial neural networks. Repeated 10x10-fold cross-validation and downsampling were used to address class imbalance, and systematic hyperparameter tuning was performed. Model performance was evaluated using accuracy, precision, sensitivity, F1 score, and area under the curve (AUC). Shapley Additive Explanations (SHAP) was applied to interpret the final model and identify key features. The SVM model achieved the highest sensitivity, indicating suitability for use in CVD screening, while the DT also showed relatively strong sensitivity. Final SVM performances included a sensitivity of 89.29%, a precision of 57.94%, an F1-score of 69.97%, and an AUC of 75.65%. SHAP analysis revealed creatinine, systolic blood pressure, low-density lipoprotein, fasting blood sugar, and triglycerides as the top predictors of CVD, aligning with established cardiovascular risk factors. We present an interpretable ML model with high sensitivity for CVD prediction in patients with metabolic comorbidities. By integrating SHAP with robust validation techniques, the model supports personalized risk assessment and is suitable for screening applications aimed at early detection and preventive care in clinical settings.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
World Health Organization. Cardiovascular diseases (CVDs) [Internet]. [cited 2024 Nov 27]. Available from: https://www. who.int/news-room/fact-sheets/detail/ cardiovascular-diseases-(cvds).
Shlipak MG, Chertow GC, Massie BM. Beware the rising creatinine level. J Card Fail. 2003;9:26-8. doi: 10.1054/ jcaf.2003.10.
McCarthy CP, Natarajan P. Systolic blood pressure and cardiovascular risk: straightening the evidence. Hypertension. 2023;80:577-9. doi: 10.1161/ HYPERTENSIONAHA.123.20788.
Scudeler TL, da Costa LM, Nunes RA, Schneidewind RO, Brito TM, Pereira DC, et al. Association between low-density lipoprotein cholesterol levels and all-cause mortality in patients with coronary artery disease: a real-world analysis using data from an international network. Sci Rep. 2024; 14:29201. doi: 10.1038/s41598-024-80578-w.
Aberra T, Peterson ED, Pagidipati NJ, Mulder H, Wojdyla DM, Philip S, et al. The association between triglycerides and incident cardiovascular disease: what is “optimal”?. J Clin Lipidol. 2020;14:438-47. doi: 10.1016/j.jacl. 2020.04.009.
Park C, Guallar E, Linton JA, Lee DC, Jang Y, Son DK, et al. Fasting glucose level and the risk of incident atherosclerotic cardiovascular diseases. Diabetes care. 2013;36:1988-93. doi: 10.2337/dc12-1577.
Fuchs FD, Whelton PK. High blood pressure and cardiovascular disease. Hypertension. 2020;75:285-92. doi: 10.1161/HYPERTENSIONAHA.119.14240.
Leon BM, Maddox TM. Diabetes and cardiovascular disease: epidemiology, biological mechanisms, treatment recommendations and future research. World J Diabetes. 2015;6:1246-58. doi: 10.4239/wjd.v6.i13.1246.
Petrie JR, Guzik TJ, Touyz RM. Diabetes, hypertension, and cardiovascular disease: clinical insights and vascular mechanisms. Can J Cardiol. 2018;34:575-584. doi: 10.1016/j.cjca.2017.12.005.
Washio M, Sasazuki S, Kodama H, Yoshimasu K, Liu Y, Tanaka K, et al. Role of hypertension, dyslipidemia and diabetes mellitus in the development of coronary atherosclerosis in Japan. Jpn Circ J. 2001;65:731-37. doi: 10.1253/ jcj.65.731.
Chapakiya I, Traisuwan A, Chumpong S, Chumpong K. Follow-up period classification of type 2 diabetes patients using data mining techniques. J Health Sci Med Res. 2025;43:20241083. doi: 10.31584/jhsmr.20241083.
Dinesh KG, Arumugaraj K, Santhosh KD, Mareeswari V. Prediction of cardiovascular disease using machine learning algorithms. In: 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT); 2018 Mar 1-3; Coimbatore, India. IEEE;2018:1-7.
Drozdz K, Nabrdalik K, Kwiendacz H, Hendel M, Olejarz A, Tomasik A et al. Risk factors for cardiovascular disease in patients with metabolic-associated fatty liver disease: a machine learning approach. Cardiovasc Diabetol. 2022;21:240. doi: 10.1186/s12933-022-01672-9.
Enriko IK. Comparative study of heart disease diagnosis using top ten data mining classification algorithms. In: 5th International Conference on Frontiers of Educational Technologies; 2019 Jun 1-3; Beijing, China. ACM; 2019:159-64.
Hasan N, Bao Y. Comparing different feature selection algorithms for cardiovascular disease prediction. Health Technol. 2021;11:49-62. doi: 10.1007/s12553-020-00499-2.
Khanarsa P, Suwanmanee S, Chumpong S, Chumpong K. Enhancing diabetes follow-up period prediction through classification algorithms with feature selection techniques. J Public Health Emerg. 2025;25:9. doi: 10.21037/jphe-24-119.
Maiga J, Hungilo GG. Comparison of machine learning models in prediction of cardiovascular disease using health record data. In: 2019 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS); 2019 Oct 24-25; Jakarta, Indonesia. IEEE;2019: 45-8.
Saqlain M, Hussain W, Saqib NA, Khan MA. Identification of heart failure by using unstructured data of cardiac patients. In: 2016 45th International Conference on Parallel Processing Workshops (ICPPW); 2016 Aug 16-19; Philadelphia, United States. IEEE;2016:426-431.
Shah D, Patel S, Bharti SK. Heart disease prediction using machine learning techniques. SN Comput Sci. 2020;1:345. doi: 10.1007/s42979-020-00365-y.
Cao XH, Stojkovic I, Obradovic Z. A robust data scaling algorithm to improve classification accuracies in biomedical data. BMC bioinformatics. 2016;17:359. doi: 10.1186/s12859-016-1236-x.
Krstajic D, Buturovic LJ, Leahy DE, Thomas S. Cross-validation pitfalls when selecting and assessing regression and classification models. J Cheminform. 2014;6:10. doi: 10.1186/1758-2946-6-10.
Debnath R, Takahide N, Takahashi H. A decision based one-against-one method for multi-class support vector machine. Pattern Anal Appl. 2004;7:164-75. doi: 10.1007/s10044-004-0213-6.
Devetyarov D, Nouretdinov I. Prediction with confidence based on a random forest classifier. In: 6th IFIP WG 12.5 International Conference on Artificial Intelligence Applications and Innovations (AIAI); 2010 Oct 6-7; Larnaca, Cyprus. Springer; 2010;37-44.
Jain AK, Mao J, Mohiuddin KM. Artificial neural networks: A tutorial. Computer. 1996;29:31-44. doi: 10.1109/2.485891.
LaValley MP. Logistic regression. Circulation. 2008;117(18):2395-99. doi: 10.1161/CIRCULATIONAHA. 106.682658.
Scholarpedia. K-nearest neighbor [Internet]. [cited 2024 Nov 27]. Available from: http://www. scholarpedia. org/article/K-nearest_ neighbor.
Song YY, Ying LU. Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry. 2015;27(2):130-35. doi: 10.11919/j.issn.1002-0829.215044.
Yang FJ. An implementation of naive bayes classifier. In: 2018 International Conference on Computational Science and Computational Intelligence (CSCI); 2018 Dec 12-14; Las Vegas, United States. IEEE;2018:301-6.
Yang L, Shami A. On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing. 2020;415:295-316. doi: 10.1016/j.neucom.2020.07.061.
Hand DJ. Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 2009;77:103-23. doi: 10.1007/ s10994-009-5119-5.
Pope JH, Aufderheide TP, Ruthazer R, Woolard RH, Feldman JA, Beshansky JR, Griffith JL, Selker HP. Missed diagnoses of acute cardiac ischemia in the emergency department. N Engl J Med. 2000;342:1163-70. doi: 10.1056/ NEJM200004203421603.
Vujović Ž. Classification model evaluation metrics. Int J Adv Comput Sci Appl. 2021;12:599-606. doi: 10.14569/IJACSA.2021.0120670.
Warraich HJ, Kaltenbach LA, Fonarow GC, Peterson ED, Wang TY. Adverse change in employment status after acute myocardial infarction: analysis from the TRANSLATE-ACS study. Circ Cardiovasc Qual Outcomes. 2018;11: 004528. doi: 10.1161/CIRC OUTCOMES.117.004528.
Nohara Y, Matsumoto K, Soejima H, Nakashima N. Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput Methods Programs Biomed. 2022;214:106584. doi: 10.1016/j.cmpb.2021.106584.
DiCiccio TJ, Efron B. Bootstrap confidence intervals. Statistical science. 1996;11:189-228. doi: 10.1214/ss/ 1032280214.