Obesity Status Prediction Through Artificial Intelligence and Balanced Label Distribution Using SMOTE

Arif Riyandi; Mahazam Afrad; M Yoka Fathoni; Yogo Dwi Prasetyo

doi:10.29207/resti.v9i3.6204

Arif Riyandi Telkom University
Mahazam Afrad Telkom University
M Yoka Fathoni Telkom University
Yogo Dwi Prasetyo Telkom University

DOI: https://doi.org/10.29207/resti.v9i3.6204

Keywords: obesity prediction, SMOTE, random forest, artificial neural network, AI in healthcare

Abstract

Obesity, a global health challenge influenced by genetic and environmental factors, is characterized by excessive body fat that increases the risk of various diseases. With over two billion individuals affected worldwide, addressing this issue is crucial. This study investigated the application of Artificial Intelligence (AI) to predict obesity status using a dataset of 1,610 individuals, including demographic and anthropometric data. Four AI algorithms were analyzed: Artificial Neural Network (ANN), K-Nearest Neighbors (KNN), Random Forest, and Support Vector Machine (SVM). The Synthetic Minority Over-Sampling Technique (SMOTE) was applied to address dataset imbalance. The results demonstrate that SMOTE significantly enhanced the models' performance, especially in recall and F1-score for minority classes, such as obesity. Random Forest achieved the highest accuracy (92%) and recall (92%) post-SMOTE. The ANN showed substantial improvement in recall, increasing from 77% to 89%, whereas the SVM achieved the highest precision (89%), minimizing false positives. Despite these improvements, KNN remained the least effective. The findings underscore the critical role of SMOTE in improving AI model accuracy for obesity prediction and highlight Random Forest as the most reliable algorithm for clinical decision-making. Limitations, such as dataset representativeness, suggest future research directions, including expanding data diversity and advanced feature selection techniques. This study provides valuable insights into leveraging AI and preprocessing methods for obesity management.

Downloads

Download data is not yet available.

References

A. N. M. S. Islam, H. Sultana, M. Nazmul Hassan Refat, Z. Farhana, A. Abdulbasah Kamil, and M. Meshbahur Rahman, “The global burden of overweight-obesity and its association with economic status, benefiting from STEPs survey of WHO member states: A meta-analysis,” Oct. 01, 2024, Elsevier Inc. doi: 10.1016/j.pmedr.2024.102882.

M. M. Ali, S. Parveen, V. Williams, R. Dons, and G. I. Uwaifo, “Cardiometabolic comorbidities and complications of obesity and chronic kidney disease (CKD),” Jun. 01, 2024, Elsevier B.V. doi: 10.1016/j.jcte.2024.100341.

E. S. Tee and S. H. Voon, “Combating obesity in Southeast Asia countries: Current status and way forward,” Sep. 01, 2024, KeAi Communications Co. doi: 10.1016/j.glohj.2024.08.006.

T. Khater, H. Tawfik, and B. Singh, “Explainable artificial intelligence for investigating the effect of lifestyle factors on obesity,” Intelligent Systems with Applications, vol. 23, Sep. 2024, doi: 10.1016/j.iswa.2024.200427.

A. Agarwal, G. Singh, S. Jain, and P. Mittal, “Beyond boundaries: Charting the frontier of healthcare with big data and ai advancements in pharmacovigilance,” Health Sciences Review, vol. 14, p. 100214, Mar. 2025, doi: 10.1016/j.hsr.2025.100214.

C. Bunkhumpornpat, E. Boonchieng, V. Chouvatut, and D. Lipsky, “FLEX-SMOTE: Synthetic over-sampling technique that flexibly adjusts to different minority class distributions,” Patterns, Nov. 2024, doi: 10.1016/j.patter.2024.101073.

A. X. Wang, V. T. Le, H. N. Trung, and B. P. Nguyen, “Addressing imbalance in health data: Synthetic minority oversampling using deep learning,” Comput Biol Med, vol. 188, Apr. 2025, doi: 10.1016/j.compbiomed.2025.109830.

S. M. Ganie, B. B. Reddy, H. K, and M. Rege, “An investigation of ensemble learning techniques for obesity risk prediction using lifestyle data,” Decision Analytics Journal, vol. 14, Mar. 2025, doi: 10.1016/j.dajour.2024.100539.

D. Cahn et al., “Novel application of deep learning to evaluate conversations from a mental health text support service,” Natural Language Processing Journal, vol. 9, p. 100119, Dec. 2024, doi: 10.1016/j.nlp.2024.100119.

T. B. Zannah, S. I. Tonni, M. A. Sheakh, M. S. Tahosin, A. H. Sarower, and M. Begum, “Comparative performance analysis of ensemble learning methods for fetal health classification,” Inform Med Unlocked, vol. 56, Jan. 2025, doi: 10.1016/j.imu.2025.101656.

D. Panteli et al., “Artificial intelligence in public health: promises, challenges, and an agenda for policy makers and public health institutions,” May 01, 2025, Elsevier Ltd. doi: 10.1016/S2468-2667(25)00036-2.

A. O. Widodo, B. Setiawan, and R. Indraswari, “Machine Learning-Based Intrusion Detection on Multi-Class Imbalanced Dataset Using SMOTE,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 578–583. doi: 10.1016/j.procs.2024.03.042.

M. F. Shahzad, S. Xu, W. M. Lim, X. Yang, and Q. R. Khan, “Artificial intelligence and social media on academic performance and mental well-being: Student perceptions of positive impact in the age of smart learning,” Heliyon, vol. 10, no. 8, Apr. 2024, doi: 10.1016/j.heliyon.2024.e29523.

U. Hasanah, A. M. Soleh, and K. Sadik, “Effect of Random Under sampling, Oversampling, and SMOTE on the Performance of Cardiovascular Disease Prediction Models,” Jurnal Matematika, Statistika dan Komputasi, vol. 21, no. 1, pp. 88–102, Sep. 2024, doi: 10.20956/j.v21i1.35552.

L. Ezquerro, R. Coimbra, B. Bauluz, C. Núñez-Lahuerta, T. Román-Berdiel, and M. Moreno-Azanza, “Large dinosaur egg accumulations and their significance for understanding nesting behaviour,” Geoscience Frontiers, vol. 15, no. 5, Sep. 2024, doi: 10.1016/j.gsf.2024.101872.

W. Zhang, M. Xu, Y. Feng, Z. Mao, and Z. Yan, “The Effect of Procrastination on Physical Exercise among College Students—The Chain Effect of Exercise Commitment and Action Control,” International Journal of Mental Health Promotion, vol. 26, no. 8, pp. 611–622, 2024, doi: 10.32604/ijmhp.2024.052730.

A. Ebadi, M. Kaur, and Q. Liu, “Hyperparameter optimization and neural architecture search algorithms for graph Neural Networks in cheminformatics,” Comput Mater Sci, vol. 254, May 2025, doi: 10.1016/j.commatsci.2025.113904.

H. Liu, H. Gu, J. Li, Y. Fang, S. Yang, and G. Liang, “Evaluating the relationship between environmental chemicals and obesity: Evidence from a machine learning perspective,” Ecotoxicol Environ Saf, vol. 300, p. 118457, Jul. 2025, doi: 10.1016/j.ecoenv.2025.118457.

Y. Zhang, F. Li, and S. Huang, “Data preprocessing and its impact on AI performance,” IEEE Access, vol. 11, pp. 134567–134578, 2023.

H. Matsukawa, Y. Miyagi, and K. Otake, “ANN-based estimation of pure-component parameters of PC-SAFT equation of state using quantum chemical data,” Fluid Phase Equilib, vol. 596, Sep. 2025, doi: 10.1016/j.fluid.2025.114444.

F. Y. P. Assalé, A. F. A. Kouao, and M. T. Kessé, “Machine learning and neural networks in predicting grain-size of sandy formations,” Results in Earth Sciences, vol. 3, p. 100084, Dec. 2025, doi: 10.1016/j.rines.2025.100084.

M. A. El-Omairi, M. El Garouani, and A. El Garouani, “Enhanced lithological mapping via remote sensing: Employing SVM, random trees, ANN, with MNF and PCA transformations,” Egyptian Journal of Remote Sensing and Space Science, vol. 28, no. 1, pp. 34–52, Mar. 2025, doi: 10.1016/j.ejrs.2024.12.001.

N. S. Bajaj, A. D. Patange, R. Jegadeeshwaran, S. S. Pardeshi, K. A. Kulkarni, and R. S. Ghatpande, “Application of metaheuristic optimization based support vector machine for milling cutter health monitoring,” Intelligent Systems with Applications, vol. 18, May 2023, doi: 10.1016/j.iswa.2023.200196.

D. J. Maulana, S. Saadah, and P. E. Yunanto, “54-61 Data in Classifying Financial Distress Companies using SVM and Naïve Bayes,” J. RESTI (Rekayasa Sist. Teknol. Inf.), vol. 10, no. 1, pp. 54–61, 2024, doi: 10.29207/resti.v8i1.5150.

F. J. Gomez-Gil, V. Martínez-Martínez, R. Ruiz-Gonzalez, L. Martínez-Martínez, and J. Gomez-Gil, “Vibration-based monitoring of agro-industrial machinery using a k-Nearest Neighbors (kNN) classifier with a Harmony Search (HS) frequency selector algorithm,” Comput Electron Agric, vol. 217, Feb. 2024, doi: 10.1016/j.compag.2023.108556.

S. A. More and A. V Kachavimath, “SDN Intrusion Detection using Meta-Heuristic Optimization and K-Nearest Neighbors Classifier,” Procedia Comput Sci, vol. 260, pp. 1137–1144, 2025, doi: 10.1016/j.procs.2025.03.299.

P. A. Jusia, A. Rahim, H. Yani, and J. Jasmir, “Improving Performance of KNN and C4.5 using Particle Swarm Optimization in Classification of Heart Diseases,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 8, no. 3, pp. 333–339, Jun. 2024, doi: 10.29207/resti.v8i3.5710.

Wikky Fawwaz Al Maki, Khaidir Mauladan, and Indra Bayu Muktyas, “Date Fruit Classification using K-Nearest Neighbor with Principal Component Analysis and Binary Particle Swarm Optimization,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 7, no. 6, pp. 1456–1463, Dec. 2023, doi: 10.29207/resti.v7i6.4839.

M. Kasahun and A. Legesse, “Machine learning for urban land use/ cover mapping: Comparison of artificial neural network, random forest and support vector machine, a case study of Dilla town,” Heliyon, vol. 10, no. 20, Oct. 2024, doi: 10.1016/j.heliyon.2024.e39146.

V. Distefano, M. Palma, and S. De Iaco, “Multi-class random forest model to classify wastewater treatment imbalanced data,” Socioecon Plann Sci, vol. 95, Oct. 2024, doi: 10.1016/j.seps.2024.102021.

B. R. Ramos Collin et al., “Random forest regressor applied in prediction of percentages of calibers in mango production,” Information Processing in Agriculture, 2024, doi: 10.1016/j.inpa.2024.12.002.

Y. Zhao and C. Teng, “Classification of soil layers in Deep Cement Mixing using optimized random forest integrated with AB-SMOTE for imbalance data,” Comput Geotech, vol. 179, Mar. 2025, doi: 10.1016/j.compgeo.2024.106976.