Enhanced Heart Disease Diagnosis Using Machine Learning Algorithms: A Comparison of Feature Selection
Abstract
Heart disease or cardiovascular disease is one of the leading causes of death in the world. Based on WHO data, in 2019, as many as 17.9 million people died from cardiovascular disease. If early prevention is not carried out immediately, of course, the victims will increase every year. Therefore, with the increasingly rapid development of technology, especially in the health sector, it is hoped that it can help medical personnel in treating patients suffering from various diseases, especially heart disease. So in this study, it will be more focused on the selection of relevant features or attributes to increase the accuracy value of the Machine Learning algorithm. The algorithms used include Random Forest and SVM. Meanwhile, for feature selection, several feature selection techniques are used, including information gain (IG), Chi-square (Chi2) and correlation feature selection (CFS). The use of these three techniques aims to obtain the main features so that they can minimize irrelevant features that can slow down the machine process. Based on the results of the experiment with a comparison of 70:30, it shows that CFS-SVM is superior by using nine features, which obtain the highest accuracy of 92.19%, while CFS-RF obtains the best value with eight features of 91.88%. By using feature selection and hyperparameter techniques, SVM obtained an increase of 10.88%, and RF obtained an increase of 9.47%. Based on the performance of the model using the selected relevant features, it shows that the proposed CFS-SVM shows good and efficient performance in diagnosing heart disease.
Downloads
References
A. Al Ahdal et al., “Monitoring Cardiovascular Problems in Heart Patients Using Machine Learning,” J. Healthc. Eng., vol. 2023, no. 1, Jan. 2023, doi: 10.1155/2023/9738123.
S. H. Rampengan, Buku praktis kardiolaogi. Jakarta: Badan Penerbit Fakultas Kedokteran Universitas Indonesia, 2014.
R. Vijaya Saraswathi, K. Gajavelly, A. Kousar Nikath, R. Vasavi, and R. Reddy Anumasula, “Heart Disease Prediction Using Decision Tree and SVM,” Springer, no. March, pp. 69–78, 2022, doi: 10.1007/978-981-16-7389-4_7.
O. Taylan, A. S. Alkabaa, H. S. Alqabbaa, E. Pamukçu, and V. Leiva, “Early Prediction in Classification of Cardiovascular Diseases with Machine Learning, Neuro-Fuzzy and Statistical Methods,” MDPI, vol. 12, no. 1, pp. 1–31, 2023, doi: 10.3390/biology12010117.
G. N. Ahamad et al., “Influence of Optimal Hyperparameters on the Performance of Machine Learning Algorithms for Predicting Heart Disease,” MDPI, vol. 11, no. 3, 2023, doi: 10.3390/pr11030734.
R. Spencer, F. Thabtah, N. Abdelhamid, and M. Thompson, “Exploring feature selection and classification methods for predicting heart disease,” Digit. Heal., vol. 6, pp. 1–10, 2020, doi: 10.1177/2055207620914777.
R. C. Das, M. C. Das, M. A. Hossain, M. A. Rahman, M. H. Hossen, and R. Hasan, “Heart Disease Detection Using ML,” IEEE, pp. 983–987, 2023, doi: 10.1109/CCWC57344.2023.10099294.
C. M. Bhatt, P. Patel, T. Ghetia, and P. L. Mazzeo, “Effective Heart-Disease Prediction by Using Hybrid Machine Learning Technique,” MDPI, pp. 1670–1675, 2023, doi: 10.1109/ICCPCT58313.2023.10245785.
E. Chitsaz, M. Taheri, S. D. Katebi, and M. Z. Jahromi, “An improved fuzzy feature clustering and selection based on chi-squared-test,” Proc. Int. multiconference Eng. Comput. Sci., vol. 1, no. June 2015, pp. 18–20, 2009.
Z. Noroozi, A. Orooji, and L. Erfannia, “Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction,” Sci. Rep., vol. 13, no. 1, pp. 1–15, 2023, doi: 10.1038/s41598-023-49962-w.
P. Khurana, S. Sharma, and A. Goyal, “Heart Disease Diagnosis: Performance Evaluation of Supervised Machine Learning and Feature Selection Techniques,” Proc. 8th Int. Conf. Signal Process. Integr. Networks, SPIN 2021, no. August, pp. 510–515, 2021, doi: 10.1109/SPIN52536.2021.9565963.
A. K. Gárate-Escamila, A. Hajjam El Hassani, and E. Andrès, “Classification models for heart disease prediction using feature selection and PCA,” Elsevier, vol. 19, 2020, doi: 10.1016/j.imu.2020.100330.
K. V. V. Reddy, I. Elamvazuthi, A. A. Aziz, S. Paramasivam, H. N. Chua, and S. Pranavanand, “Heart disease risk prediction using machine learning classifiers with attribute evaluators,” MDPI, vol. 11, no. 18, 2021, doi: 10.3390/app11188352.
K. V. V. Reddy, I. Elamvazuthi, A. A. Aziz, S. Paramasivam, H. N. Chua, and S. Pranavanand, “Prediction of Heart Disease Risk Using Machine Learning with Correlation-based Feature Selection and Optimization Techniques,” 2021 7th Int. Conf. Signal Process. Commun. ICSC 2021, no. December 2022, pp. 228–233, 2021, doi: 10.1109/ICSC53193.2021.9673490.
B. C. L. Adiatma, E. Utami, and A. D. Hartanto, “PENGENALAN EKSPRESI WAJAH MENGGUNAKAN DEEP CONVOLUTIONAL NEURAL NETWORK,” EXPLORE, vol. 11, no. 2, p. 75, Jul. 2021, doi: 10.35200/explore.v11i2.478.
Padathala Visweswara Rao, “Extraction and Feature Selection for Precise Cardiovascular Disease Classification,” Int. J. Multidimens. Res. Perspect., vol. 2, no. 7, pp. 79–87, 2024, doi: 10.61877/ijmrp.v2i7.172.
N. Jalal, A. Mehmood, G. S. Choi, and I. Ashraf, “A novel improved random forest for text classification using feature ranking and optimal number of trees,” Elsevier, vol. 34, no. 6, pp. 2733–2742, 2022, doi: 10.1016/j.jksuci.2022.03.012.
M. Brendel, C. Su, Z. Bai, H. Zhang, O. Elemento, and F. Wang, “Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review,” Elsevier, vol. 20, no. 5, pp. 814–835, 2022, doi: 10.1016/j.gpb.2022.11.011.
N. Chandrasekhar and S. Peddakrishna, “Enhancing Heart Disease Prediction Accuracy through Machine Learning Techniques and Optimization,” MDPI, vol. 11, no. 4, 2023, doi: 10.3390/pr11041210.
A. Khan, M. Qureshi, M. Daniyal, and K. Tawiah, “A Novel Study on Machine Learning Algorithm-Based Cardiovascular Disease Prediction,” Health Soc. Care Community, vol. 2023, no. Cvd, pp. 1–10, 2023, doi: 10.1155/2023/1406060.
G. N. Ahmad, S. Ullah, A. Algethami, H. Fatima, and S. M. H. Akhter, “Comparative Study of Optimum Medical Diagnosis of Human Heart Disease Using Machine Learning Technique with and Without Sequential Feature Selection,” IEEE Access, vol. 10, pp. 23808–23828, 2022, doi: 10.1109/ACCESS.2022.3153047.
M. Azhari, Z. Situmorang, and R. Rosnelly, “Perbandingan Akurasi, Recall, dan Presisi Klasifikasi pada Algoritma C4.5, Random Forest, SVM dan Naive Bayes,” J. Media Inform. Budidarma, vol. 5, no. 2, p. 640, 2021, doi: 10.30865/mib.v5i2.2937.
V. N. Vapnik, “An overview of statistical learning theory,” IEEE Trans. Neural Networks, vol. 10, no. 5, pp. 988–999, 1999, doi: 10.1109/72.788640.
M. Awad and R. Khanaa, Efficient Learning Machine: Theories, Concepts and Application for Engineers and System Designers, no. 112. Apress, 2015.
P. Eko, Data Mining: Konsep dan Aplikasi menggunakan Matlab. Yogyakarta: ANDI Yogyakarta, 2012.
J. P. Li, A. U. Haq, S. U. Din, J. Khan, A. Khan, and A. Saboor, “Heart Disease Identification Method Using Machine Learning Classification in E-Healthcare,” IEEE Access, vol. 8, no. Ml, pp. 107562–107582, 2020, doi: 10.1109/ACCESS.2020.3001149.
M. Dash and H. Liu, “Feature selection for classification,” Intell. Data Anal., vol. 1, no. 3, pp. 131–156, 1997, doi: 10.3233/IDA-1997-1302.
A. M. Qadri, A. Raza, K. Munir, and M. S. Almutairi, “Effective Feature Engineering Technique for Heart Disease Prediction With Machine Learning,” IEEE Access, vol. 11, no. June, pp. 56214–56224, 2023, doi: 10.1109/ACCESS.2023.3281484.
A. G. Karegowda, A. S. Manjunath, G. Ratio, and C. F. Evaluation, “Comparative study of Attribute Selection Using Gain Ratio and Correlation Based Feature Selection,” Int. J. Inf. Technol. Knowl. Knowl. Manag., vol. 2, no. 2, pp. 271–277, 2010.
R. L. Plackett, “Karl Pearson and the Chi-squared Test,” Int. Stat. Inst., vol. 64, no. 1, pp. 50–53, 1984, doi: 10.47316/cajmhe.2024.5.1.05.
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique Nitesh,” J. Artif. Intell. Res., vol. 4, no. 16, pp. 321–357, 2002, doi: 10.46880/jmika.vol4no1.pp67-72.
S. P. R. Yulianto, A. Z. Fanani, A. Affandy, and M. I. Aziz, “Analisis Metode Smoote pada Klasifikasi Penyakit Jantung Berbasis Random Forest Tree,” J. Media Inform. Budidarma, vol. 8, no. 3, p. 1460, 2024, doi: 10.30865/mib.v8i3.7712.
Copyright (c) 2025 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;