Towards precision hematology: Machine learning-driven comparative feature importance analysis of hematologic parameters in newly diagnosed chronic myeloid leukemia vs healthy controls
Main Article Content
Abstract
Background: Chronic Myeloid Leukemia (CML) is a myeloproliferative neoplasm characterized by uncontrolled granulocytic proliferation and is often initially suspected based on peripheral blood smear findings. Utilizing machine learning to analyze routinely available Complete Blood Count (CBC) parameters and derived inflammatory indices may facilitate early identification of CML at the time of initial diagnosis.
Objectives: The objective of this study is to evaluate the diagnostic value of routinely available Complete Blood Count (CBC) parameters and derived inflammatory indices for the early identification of chronic myeloid leukemia at the time of initial diagnosis, using machine learning–based models.
Materials and methods: This study was conducted on 295 newly diagnosed cases of CML and 340 normal control samples. A total of 49 factors were subjected to study, including the following: variables included in the CBC of patients, inflammatory indices, and demographic data. Logistic regression analysis was performed to identify relevant variables, resulting in the selection of 22 features. Subsequently, multiple machine learning algorithms, including Random Forest (RF), Recursive Feature Elimination (RFE), Simulated Annealing (SA), Decision Tree (DT), K-Nearest Neighbor (KNN), and Xg-Boost (XGB), were applied to evaluate the diagnostic performance pf the selected features.
Results: The findings of this study indicate that the factors most pertinent in initial diagnosis in comparison with normal control include the WBC count, the relative percentage of cells including neutrophils, monocytes, and lymphocytes, and a series of indicators related to RBC such as RBC count and RDW-CV, as well as the index of inflammatory NLR, and BLR and PDW.
Conclusion: This study demonstrates that machine learning models based solely on routinely available CBC parameters and derived inflammatory indices can support the early identification of CML at the time of initial diagnosis. In addition to leukocyte-related variables, RBC-related and inflammatory indices provided complementary diagnostic information, highlighting their potential value in early-stage CML detection. The application of machine learning techniques could prioritize the development of more user-friendly dashboards to facilitate the diagnosis of CML at initial diagnosis.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Personal views expressed by the contributors in their articles are not necessarily those of the Journal of Associated Medical Sciences, Faculty of Associated Medical Sciences, Chiang Mai University.
References
Elhadary M, Elsabagh AA, Ferih K, Elsayed B, Elshoeibi AM, Kaddoura R, et al. Applications of machine learning in chronic myeloid leukemia. Diagnostics. 2023;13.
Pandey MK, Pal S. Evaluation of chronic myelogenous leukemia (CML) as the chronic phase of disease using machine learning techniques. Int J Mech Eng Educ. 2022; 7: 1-12.
Shanbehzadeh M, Afrash MR, Mirani N, Kazemi Arpanahi H. Comparing machine learning algorithms to predict 5-year survival in patients with chronic myeloid leukemia. BMC Med Inform Decis Mak. 2022; 22(1): 236. doi:10.1186/s12911-022-01980-w.
Ni W, Tong X, Qian W, Jin J, Zhao H. Discrimination of malignant neutrophils of chronic myelogenous leukemia from normal neutrophils by support vector machine. Comput Biol Med. 2013; 43(9): 1192-5. doi:10.1016/j.compbiomed.2013.06.004
Bergenheim MB. Towards prediction of CML treatment outcomes with machine learning and CyTOF data: The University of Bergen; 2024.
Cerrato TR. Use of artificial intelligence to improve access to initial leukemia diagnosis in low-and middle-income countries. American Society of Clinical Oncology; 2020.
Dese K, Raj H, Ayana G, Yemane T, Adissu W, Krishnamoorthy J, et al. Accurate machine-learningbased classification of leukemia from blood smear images. Clin Lymphoma Myeloma Leuk. 2021; 21(11): e903-e14. doi: 10.1016/j.clml. 2021.06.025.
Salah HT, Muhsen IN, Salama ME, Owaidah T, Hashmi SK. Machine learning applications in the diagnosis of leukemia: Current trends and future directions. Int J Lab Hematol. 2019; 41(6): 717-25. doi:10.1111/ijlh.13089
Hauser RG, Esserman D, Beste LA, Ong SY, Colomb DG, Bhargava A, et al. A aachine learning model to successfully predict future diagnosis of chronic myelogenous leukemia with retrospective electronic health records data. Am J Clin Pathol. 2021; 156(6): 1142-8. doi:10.1093/ajcp/aqab086
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. Springer series in statistics New-York; 2009.
Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003;(3): 1157-82.
Saeys Y, Inza I, Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007; 23(19): 2507-17. doi:10.1093/bioinformatics/btm344.
Couronné R, Probst P, Boulesteix A-L. Random forest versus logistic regression: a large-scale benchmark experiment. BMC Bioinformatics. 2018; 19(1): 270. doi: 10.1186/s12859-018-2264-5.
Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Machine learning. 2002; 46(1): 389-422.
Kirkpatrick S, Gelatt Jr CD, Vecchi MP. Optimization by simulated annealing. Science. 1983; 220(4598): 671-80.
Chen T. XGBoost: A scalable tree boosting system. Cornell University. 2016.
Kotsiantis SB, Zaharakis I, Pintelas P. Supervised machine learning: A review of classification techniques. Emerging artificial intelligence applications in computer engineering. 2007; 160(1): 3-24.
Algahtani FH, Alqahtany FS. Evaluation and characterisation of Chronic myeloid leukemia and various treatments in Saudi Arabia: A retrospective study. J Infect Public Health. 2020; 13(2): 295-8. doi:10.1016/j.jiph.2019.12.006
Fenu E, O’Neill SS, Insuasti-Beltran G. BCR-ABL1 p210 screening for chronic myeloid leukemia in patients with peripheral blood cytoses. Int J Lab Hematol. 2021; 43(6): 1458-64. doi:10.1111/ijlh.13635
Pepedil-Tanrikulu F, Buyukkurt N, Korur A, Sariturk C, Aytan P, Boga C, et al. Significance of lymphocyte count, monocyte count, and lymphocyte-tomonocyte ratio in predicting molecular response in patients with chronic myeloid leukemia: a single-centre experience. Clin Lab. 2020; 66(3). doi:10.7754/Clin.Lab.2019.190628
Mulas O, Mola B, Madeddu C, Caocci G, Macciò A, Nasa GL. Prognostic role of cell blood count in chronic myeloid neoplasm and acute myeloid leukemia and its possible implications in hematopoietic stem cell transplantation. Diagnostics. 2022; 12(10): 2493. doi.org/10.3390/diagnostics12102493
Iriyama N, Hatta Y, Kobayashi S, Uchino Y, Miura K, Kurita D, et al. Higher red blood cell distribution width is an adverse prognostic factor in chronicphase chronic myeloid leukemia patients treated with tyrosine kinase inhibitors. Anticancer Res. 2015; 35(10): 5473-8. PMID: 26408711