Health Risk Classification Using XGBoost with Bayesian Hyperparameter Optimization
Abstract
Health risk classification is important. However, health risk classification is challenging to address using conventional analytical techniques. The XGBoost algorithm offers many advantages over the traditional methods for risk classification. Hyperparameter Optimization (HO) of XGBoost is critical for maximizing the performance of the XGBoost algorithm. The manual selection of hyperparameters requires a large amount of time and computational resources. Automatic HO is needed to avoid this problem. Several studies have shown that Bayesian Optimization (BO) works better than Grid Search (GS) or Random Search (RS). Based on these problems, this study proposes health risk classification using XGBoost with Bayesian Hyperparameters Optimization. The goal of this study is to reduce the time required to select the best XGBoost hyperparameters and improve the accuracy and generalization of XGBoost performance in health risk classification. The variables used were patient demographics and medical information, including age, blood pressure, cholesterol, and lifestyle variables. The experimental results show that the proposed approach outperforms other well-known ML techniques and the XGBoost method without HO. The average accuracy, precision, recall and f1-score produced by the proposed method are 0.926, 0.920, 0.928, and 0.923, respectively. However, improvements are needed to obtain a faster and more accurate method in the future.
Downloads
References
M. Sigala, A. Beer, L. Hodgson, and A. O’Connor, “Big Data for Measuring the Impact of Tourism Economic Development Programmes: A Process and Quality Criteria Framework for Using Big Data,” in Big Data and Innovation in Tourism, Travel, and Hospitality: Managerial Approaches, Techniques, and Applications, M. Sigala, R. Rahimi, and M. Thelwall, Eds., Singapore: Springer Singapore, 2019, pp. 57–73. doi: 10.1007/978-981-13-6339-9_4.
G. Nguyen et al., “Machine Learning and Deep Learning Frameworks and Libraries for Large-Scale Data Mining: A Survey,” 2019. doi: 10.1007/s10462-018-09679-z.
C. Shorten and T. M. Khoshgoftaar, “A survey on Image Data Augmentation for Deep Learning,” J Big Data, vol. 6, no. 1, p. 60, 2019, doi: 10.1186/s40537-019-0197-0.
R. Vinayakumar, M. Alazab, K. P. Soman, P. Poornachandran, A. Al-Nemrat, and S. Venkatraman, “Deep Learning Approach for Intelligent Intrusion Detection System,” IEEE Access, vol. 7, pp. 41525–41550, 2019, doi: 10.1109/ACCESS.2019.2895334.
S. A. Thamrin, D. S. Arsyad, H. Kuswanto, A. Lawi, and S. Nasir, “Predicting Obesity in Adults Using Machine Learning Techniques: An Analysis of Indonesian Basic Health Research 2018,” 2021. doi: 10.3389/fnut.2021.669155.
A. D. Dwivedi, G. Srivastava, and R. Dhar Shalini and Singh, “A decentralized privacy-preserving healthcare blockchain for IoT,” Sensors (Basel), vol. 19, no. 2, p. 326, Jan. 2019.
F. Al-Turjman, H. Zahmatkesh, and L. Mostarda, “Quantifying Uncertainty in Internet of Medical Things and Big-Data Services Using Intelligence and Deep Learning,” IEEE Access, vol. 7, pp. 115749–115759, 2019, doi: 10.1109/ACCESS.2019.2931637.
S. Kumar and M. Singh, “Big data analytics for healthcare industry: impact, applications, and tools,” Big Data Mining and Analytics, vol. 2, no. 1, pp. 48–57, 2019, doi: 10.26599/BDMA.2018.9020031.
L.-M. Ang, K. P. Seng, G. K. Ijemaru, and A. M. Zungeru, “Deployment of IoV for Smart Cities: Applications, Architecture, and Challenges,” IEEE Access, vol. 7, pp. 6473–6492, 2019, doi: 10.1109/ACCESS.2018.2887076.
B. P. L. Lau et al., “A survey of data fusion in smart city applications,” Information Fusion, vol. 52, pp. 357–374, 2019, doi: https://doi.org/10.1016/j.inffus.2019.05.004.
Y. Wu et al., “Large Scale Incremental Learning,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 374–382. doi: 10.1109/CVPR.2019.00046.
M. Yanto, F. Hadi, and S. Arlis, “Optimization of machine learning classification analysis of malnutrition cases in children,” J. RESTI (Rekayasa Sist. Dan Teknol. Inf.), vol. 7, no. 6, pp. 1378–1386, Dec. 2023.
W. K. Sari, D. P. Rini, R. F. Malik, and Iman Saladin B. Azhar, “Multilabel text classification in news articles using long-Term Memory with Word2Vec,” J. RESTI (Rekayasa Sist. Dan Teknol. Inf.), vol. 4, no. 2, pp. 276–285, Apr. 2020.
Y. Yunidar, Y. Yusni, N. Nasaruddin, and F. Arnia, “CNN performance improvement for classifying stunted facial images using early stopping approach,” J. RESTI (Rekayasa Sist. Dan Teknol. Inf.), vol. 9, no. 1, pp. 62–68, Jan. 2025.
A. Mosavi, S. Shamshirband, E. Salwana, K. Chau, and J. H. M. Tah, “Prediction of multi-inputs bubble column reactor using a novel hybrid model of computational fluid dynamics and machine learning,” Engineering Applications of Computational Fluid Mechanics, vol. 13, no. 1, pp. 482–492, Jan. 2019, doi: 10.1080/19942060.2019.1613448.
V. Palanisamy and R. Thirunavukarasu, “Implications of big data analytics in developing healthcare frameworks – A review,” Journal of King Saud University - Computer and Information Sciences, vol. 31, no. 4, pp. 415–425, 2019, doi: https://doi.org/10.1016/j.jksuci.2017.12.007.
J. Sadowski, “When data is capital: Datafication, accumulation, and extraction,” Big Data Soc, vol. 6, no. 1, p. 2053951718820549, Jan. 2019, doi: 10.1177/2053951718820549.
J. R. Saura, B. R. Herráez, and A. Reyes-Menendez, “Comparing a Traditional Approach for Financial Brand Communication Analysis with a Big Data Analytics Technique,” IEEE Access, vol. 7, pp. 37100–37108, 2019, doi: 10.1109/ACCESS.2019.2905301.
D. Nallaperuma et al., “Online Incremental Machine Learning Platform for Big Data-Driven Smart Traffic Management,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 12, pp. 4679–4690, 2019, doi: 10.1109/TITS.2019.2924883.
S. Schulz, M. Becker, M. R. Groseclose, S. Schadt, and C. Hopf, “Advanced MALDI mass spectrometry imaging in pharmaceutical research and drug development,” Curr Opin Biotechnol, vol. 55, pp. 51–59, 2019, doi: https://doi.org/10.1016/j.copbio.2018.08.003.
P. Zhang, Y. Jia, and Y. Shang, “Research and Application of XGBoost in Imbalanced Data,” 2022. doi: 10.1177/15501329221106935.
T.-F. Lee et al., “Development of a Risk Prediction Model for Radiation Dermatitis Following Proton Radiotherapy in Head and Neck Cancer Using Ensemble Machine Learning,” 2024. doi: 10.1186/s13014-024-02470-1.
I. Barkiah and Y. Sari, “Overcoming Overfitting Challenges with HOG Feature Extraction and XGBoost-Based Classification for Concrete Crack Monitoring,” 2023. doi: 10.24425/ijet.2023.146509.
F. H. Garabaghi, S. Benzer, and R. Benzer, “Sequential GP-UCB Bayesian Optimization for Deep Neural Network Fine-Tuning in Dissolved Oxygen Prediction,” Feb. 16, 2024. doi: 10.21203/rs.3.rs-3930680/v1.
J. Wang, W. Rong, Z. Zhang, and D. Mei, “Credit Debt Default Risk Assessment Based on the XGBoost Algorithm: An Empirical Study from China,” 2022. doi: 10.1155/2022/8005493.
A. Dimas and S. S. Mukti, “Performance Comparison of Grid Search and Random Search Methods for Hyperparameter Tuning in Extreme Gradient Boosting Algorithm to Predict Chronic Kidney Failure,” 2021. doi: 10.22266/ijies2021.1231.19.
L. Gao and Y. Ding, “Disease prediction via Bayesian hyperparameter optimization and ensemble learning,” BMC Res Notes, vol. 13, no. 1, p. 205, 2020, doi: 10.1186/s13104-020-05050-0.
M. Liang et al., “Improving Genomic Prediction with Machine Learning Incorporating TPE for Hyperparameters Optimization,” Biology (Basel), vol. 11, no. 11, 2022, doi: 10.3390/biology11111647.
P. I. Frazier, “A Tutorial on Bayesian Optimization,” 2018. [Online]. Available: https://arxiv.org/abs/1807.02811
F. Archetti and A. Candelieri, Bayesian Optimization and Data Science. 2019. doi: 10.1007/978-3-030-24494-1.
Y. Chen et al., “Closed-Loop Dynamic Blending Optimization Based on Variational Bayesian and its Application in Industry,” IEEE Access, vol. 11, pp. 494–505, 2023, doi: 10.1109/ACCESS.2022.3232812.
Copyright (c) 2025 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;