Comparison of Modeling Approaches in Design of Experiment: Statistical Methods versus Machine Learning Techniques
Main Article Content
Abstract
Objective: To compare the modeling performance between Linear Regression (LR) and four Machine Learning (ML) algorithms, namely Decision Tree (DT), Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), and Neural Network (NN). Methods: Data were obtained from 88 pharmaceutical Design of Experiment (DOE) studies published between October 2021 and October 2022, retrieved from PubMed and Scopus databases. The raw data were used to construct models using RapidMiner Studio 10, and model performance was evaluated by the coefficient of determination (R²). Statistical significance was tested using Kruskal–Wallis test. Results: Across all DOE types, K-NN and DT models achieved significantly higher mean R² values than LR, while NN and SVM performed comparably to K-NN and DT but without statistical differences. In the Central Composite Design, K-NN, NN, and SVM models showed significantly higher R² than LR, whereas DT performed similarly to LR. For the Full Factorial Design, K-NN and DT yielded significantly higher R² than LR, and K-NN also outperformed NN and SVM, while NN and SVM were comparable to LR. In the Mixture Design, no significant difference in mean R² was observed between LR and any ML algorithms, indicating similar overall model performance. Conclusion: The study demonstrated that ML techniques—particularly K-NN, NN, and DT—can effectively enhance modeling performance in pharmaceutical DOE, yielding higher R² values than LR in several DOE types, including All DOE types, Central Composite Design, and Full Factorial Design. Although no clear difference was found in Mixture Design, the findings highlight the potential of ML methods to improve experimental data analysis and serve as promising tools for formulation design in future pharmaceutical research.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
ผลการวิจัยและความคิดเห็นที่ปรากฏในบทความถือเป็นความคิดเห็นและอยู่ในความรับผิดชอบของผู้นิพนธ์ มิใช่ความเห็นหรือความรับผิดชอบของกองบรรณาธิการ หรือคณะเภสัชศาสตร์ มหาวิทยาลัยสงขลานครินทร์ ทั้งนี้ไม่รวมความผิดพลาดอันเกิดจากการพิมพ์ บทความที่ได้รับการเผยแพร่โดยวารสารเภสัชกรรมไทยถือเป็นสิทธิ์ของวารสารฯ
References
Montgomery DC. Design and analysis of experiments. 9th ed. Hoboken, NJ: Wiley; 2017.
International Conference on Harmonisation (ICH). ICH Q8(R2) Pharmaceutical development. Geneva: ICH; 2009.
Belda-Galbis CM, Manzanares P, Soto J, Martínez A. Application of design of experiments (DoE) for optimization of pharmaceutical formulations. Pharmaceutics. 2020; 12: 853. doi:10.3390/pharma ceutics12090853
Bouza AA, Almalki SA, Kamel FO. Design of experiments in pharmaceutical development: A systematic review of trends and applications. Eur J Pharm Biopharm. 2022; 173: 105–115. doi:10.1016/ j.ejpb.2022.02.007
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: Data mining, inference, and prediction. 2nd ed. New York: Springer; 2009.
Brownlee J. Machine learning mastery with Python. San Francisco, CA: Machine Learning Mastery; 2016.
Kuhn M, Johnson K. Applied predictive modeling. New York: Springer; 2013.
James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning: With applications in R. 2nd ed. New York: Springer; 2021.
Fontana R, Molena A, Pegoraro L, Salmaso L. Design of experiments and machine learning with application to industrial experiments. Stat Pap 2023; 64: 1251–74.
Pestana H, Bonacin R, Rosa FF, Dametto M. A review on the use of machine learning for pharmaceutical formulations. In: Advances in Intelligent Systems and Computing. Cham: Springer; 2024. (AISC; vol. 1456).
Kolluri S, Lin J, Liu R, Zhang Y, Zhang W. Machine learning and artificial intelligence in pharmaceutical research and development: a review. AAPS J. 2022; 24: 19.
Rodriguez-Granrose D, et al. Design of experiment applied to artificial neural network architecture enables rapid bioprocess improvement. Biotechnol J 2021; 16: e2100064.
Grzesik P, Warth SC. One-time optimization of advanced T cell culture media using a machine learning pipeline. Sci Rep. 2021; 11: 24274.
Belei C, et al. Fused-filament fabrication of short carbon fiber-reinforced polyamide: Parameter optimization. Mater Today Commun. 2021; 28: 102583.
Hameg R, et al. Modeling and optimizing culture medium mineral composition for in vitro propagation of Actinidia arguta. Plant Cell Tissue Organ Cult. 2022; 150: 465–77.
Arboretti R, et al. Machine learning and design of experiments for product innovation. J Ind Eng Chem. 2022; 106: 306–16.
Reina-Romo E, et al. In silico nozzle design optimization for extrusion-based bioprinting. Biofabrication. 2022; 14: 025001.