Health Risk Classification Using XGBoost  with Bayesian Hyperparameter Optimization

Syaiful Anam; Imam Nurhadi Purwanto; Dwi Mifta Mahanani; Feby Indriana Yusuf; Hady  Rasikhun

doi:10.29207/resti.v9i3.6307

Syaiful Anam Brawijaya University
Imam Nurhadi Purwanto Brawijaya University
Dwi Mifta Mahanani Brawijaya University
Feby Indriana Yusuf Brawijaya University
Hady Rasikhun Muhammadiyah University of Mataram

DOI: https://doi.org/10.29207/resti.v9i3.6307

Keywords: health risk classification, hyperparameters, optimization, XGBoost

Abstract

Health risk classification is important. However, health risk classification is challenging to address using conventional analytical techniques. The XGBoost algorithm offers many advantages over the traditional methods for risk classification. Hyperparameter Optimization (HO) of XGBoost is critical for maximizing the performance of the XGBoost algorithm. The manual selection of hyperparameters requires a large amount of time and computational resources. Automatic HO is needed to avoid this problem. Several studies have shown that Bayesian Optimization (BO) works better than Grid Search (GS) or Random Search (RS). Based on these problems, this study proposes health risk classification using XGBoost with Bayesian Hyperparameters Optimization. The goal of this study is to reduce the time required to select the best XGBoost hyperparameters and improve the accuracy and generalization of XGBoost performance in health risk classification. The variables used were patient demographics and medical information, including age, blood pressure, cholesterol, and lifestyle variables. The experimental results show that the proposed approach outperforms other well-known ML techniques and the XGBoost method without HO. The average accuracy, precision, recall and f1-score produced by the proposed method are 0.926, 0.920, 0.928, and 0.923, respectively. However, improvements are needed to obtain a faster and more accurate method in the future.

Downloads

Download data is not yet available.

References

M. Sigala, A. Beer, L. Hodgson, and A. O’Connor, “Big Data for Measuring the Impact of Tourism Economic Development Programmes: A Process and Quality Criteria Framework for Using Big Data,” in Big Data and Innovation in Tourism, Travel, and Hospitality: Managerial Approaches, Techniques, and Applications, M. Sigala, R. Rahimi, and M. Thelwall, Eds., Singapore: Springer Singapore, 2019, pp. 57–73. doi: 10.1007/978-981-13-6339-9_4.

G. Nguyen et al., “Machine Learning and Deep Learning Frameworks and Libraries for Large-Scale Data Mining: A Survey,” 2019. doi: 10.1007/s10462-018-09679-z.

C. Shorten and T. M. Khoshgoftaar, “A survey on Image Data Augmentation for Deep Learning,” J Big Data, vol. 6, no. 1, p. 60, 2019, doi: 10.1186/s40537-019-0197-0.

R. Vinayakumar, M. Alazab, K. P. Soman, P. Poornachandran, A. Al-Nemrat, and S. Venkatraman, “Deep Learning Approach for Intelligent Intrusion Detection System,” IEEE Access, vol. 7, pp. 41525–41550, 2019, doi: 10.1109/ACCESS.2019.2895334.

S. A. Thamrin, D. S. Arsyad, H. Kuswanto, A. Lawi, and S. Nasir, “Predicting Obesity in Adults Using Machine Learning Techniques: An Analysis of Indonesian Basic Health Research 2018,” 2021. doi: 10.3389/fnut.2021.669155.

A. D. Dwivedi, G. Srivastava, and R. Dhar Shalini and Singh, “A decentralized privacy-preserving healthcare blockchain for IoT,” Sensors (Basel), vol. 19, no. 2, p. 326, Jan. 2019.

F. Al-Turjman, H. Zahmatkesh, and L. Mostarda, “Quantifying Uncertainty in Internet of Medical Things and Big-Data Services Using Intelligence and Deep Learning,” IEEE Access, vol. 7, pp. 115749–115759, 2019, doi: 10.1109/ACCESS.2019.2931637.

S. Kumar and M. Singh, “Big data analytics for healthcare industry: impact, applications, and tools,” Big Data Mining and Analytics, vol. 2, no. 1, pp. 48–57, 2019, doi: 10.26599/BDMA.2018.9020031.

L.-M. Ang, K. P. Seng, G. K. Ijemaru, and A. M. Zungeru, “Deployment of IoV for Smart Cities: Applications, Architecture, and Challenges,” IEEE Access, vol. 7, pp. 6473–6492, 2019, doi: 10.1109/ACCESS.2018.2887076.

B. P. L. Lau et al., “A survey of data fusion in smart city applications,” Information Fusion, vol. 52, pp. 357–374, 2019, doi: https://doi.org/10.1016/j.inffus.2019.05.004.

Y. Wu et al., “Large Scale Incremental Learning,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 374–382. doi: 10.1109/CVPR.2019.00046.

M. Yanto, F. Hadi, and S. Arlis, “Optimization of machine learning classification analysis of malnutrition cases in children,” J. RESTI (Rekayasa Sist. Dan Teknol. Inf.), vol. 7, no. 6, pp. 1378–1386, Dec. 2023.

W. K. Sari, D. P. Rini, R. F. Malik, and Iman Saladin B. Azhar, “Multilabel text classification in news articles using long-Term Memory with Word2Vec,” J. RESTI (Rekayasa Sist. Dan Teknol. Inf.), vol. 4, no. 2, pp. 276–285, Apr. 2020.

Y. Yunidar, Y. Yusni, N. Nasaruddin, and F. Arnia, “CNN performance improvement for classifying stunted facial images using early stopping approach,” J. RESTI (Rekayasa Sist. Dan Teknol. Inf.), vol. 9, no. 1, pp. 62–68, Jan. 2025.

A. Mosavi, S. Shamshirband, E. Salwana, K. Chau, and J. H. M. Tah, “Prediction of multi-inputs bubble column reactor using a novel hybrid model of computational fluid dynamics and machine learning,” Engineering Applications of Computational Fluid Mechanics, vol. 13, no. 1, pp. 482–492, Jan. 2019, doi: 10.1080/19942060.2019.1613448.

V. Palanisamy and R. Thirunavukarasu, “Implications of big data analytics in developing healthcare frameworks – A review,” Journal of King Saud University - Computer and Information Sciences, vol. 31, no. 4, pp. 415–425, 2019, doi: https://doi.org/10.1016/j.jksuci.2017.12.007.

J. Sadowski, “When data is capital: Datafication, accumulation, and extraction,” Big Data Soc, vol. 6, no. 1, p. 2053951718820549, Jan. 2019, doi: 10.1177/2053951718820549.

J. R. Saura, B. R. Herráez, and A. Reyes-Menendez, “Comparing a Traditional Approach for Financial Brand Communication Analysis with a Big Data Analytics Technique,” IEEE Access, vol. 7, pp. 37100–37108, 2019, doi: 10.1109/ACCESS.2019.2905301.

D. Nallaperuma et al., “Online Incremental Machine Learning Platform for Big Data-Driven Smart Traffic Management,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 12, pp. 4679–4690, 2019, doi: 10.1109/TITS.2019.2924883.

S. Schulz, M. Becker, M. R. Groseclose, S. Schadt, and C. Hopf, “Advanced MALDI mass spectrometry imaging in pharmaceutical research and drug development,” Curr Opin Biotechnol, vol. 55, pp. 51–59, 2019, doi: https://doi.org/10.1016/j.copbio.2018.08.003.

P. Zhang, Y. Jia, and Y. Shang, “Research and Application of XGBoost in Imbalanced Data,” 2022. doi: 10.1177/15501329221106935.

T.-F. Lee et al., “Development of a Risk Prediction Model for Radiation Dermatitis Following Proton Radiotherapy in Head and Neck Cancer Using Ensemble Machine Learning,” 2024. doi: 10.1186/s13014-024-02470-1.

I. Barkiah and Y. Sari, “Overcoming Overfitting Challenges with HOG Feature Extraction and XGBoost-Based Classification for Concrete Crack Monitoring,” 2023. doi: 10.24425/ijet.2023.146509.

F. H. Garabaghi, S. Benzer, and R. Benzer, “Sequential GP-UCB Bayesian Optimization for Deep Neural Network Fine-Tuning in Dissolved Oxygen Prediction,” Feb. 16, 2024. doi: 10.21203/rs.3.rs-3930680/v1.

J. Wang, W. Rong, Z. Zhang, and D. Mei, “Credit Debt Default Risk Assessment Based on the XGBoost Algorithm: An Empirical Study from China,” 2022. doi: 10.1155/2022/8005493.

A. Dimas and S. S. Mukti, “Performance Comparison of Grid Search and Random Search Methods for Hyperparameter Tuning in Extreme Gradient Boosting Algorithm to Predict Chronic Kidney Failure,” 2021. doi: 10.22266/ijies2021.1231.19.

L. Gao and Y. Ding, “Disease prediction via Bayesian hyperparameter optimization and ensemble learning,” BMC Res Notes, vol. 13, no. 1, p. 205, 2020, doi: 10.1186/s13104-020-05050-0.

M. Liang et al., “Improving Genomic Prediction with Machine Learning Incorporating TPE for Hyperparameters Optimization,” Biology (Basel), vol. 11, no. 11, 2022, doi: 10.3390/biology11111647.

P. I. Frazier, “A Tutorial on Bayesian Optimization,” 2018. [Online]. Available: https://arxiv.org/abs/1807.02811

F. Archetti and A. Candelieri, Bayesian Optimization and Data Science. 2019. doi: 10.1007/978-3-030-24494-1.

Y. Chen et al., “Closed-Loop Dynamic Blending Optimization Based on Variational Bayesian and its Application in Industry,” IEEE Access, vol. 11, pp. 494–505, 2023, doi: 10.1109/ACCESS.2022.3232812.