K Nearest Neighbor Imputation Performance on Missing Value Data Graduate User Satisfaction
Abstract
A missing value is a common problem of most data processing in scientific research, which results in a lack of accuracy of research results. Several methods have been applied as a missing value solution, such as deleting all data that have a missing value, or replacing missing values with statistical estimates using one calculated value such as, mean, median, min, max, and most frequent methods. Maximum likelihood and expectancy maximization, and machine learning methods such as K Nearest Neighbor (KNN). This research uses KNN Imputation to predict the missing value. The data used is data from a questionnaire survey of graduate user satisfaction levels with seven assessment criteria, namely ethics, expertise in the field of science (main competence), foreign language skills, foreign language skills, use of information technology, communication skills, cooperation, and self-development. The results of testing imputation predictions using KNNI on user satisfaction level data for STMIK PPKIA Tarakanita Rahmawati graduates from 2018 to 2021. Where using the five k closest neighbors, namely 1, 5, 10, 15, and 20, the error value of the k nearest neighbors is 5 in RMSE is 0, 316 while the error value using MAPE is 3,33 %, both values are smaller than the value of k other nearest neighbors. K nearest neighbor 5 is the best imputation prediction result, both calculated by RMSE and MAPE, even in MAPE the error value is below 10%, which means it is very good.
Downloads
References
S. P. Mandel J, “A Comparison of Six Methods for Missing Data Imputation,” J. Biom. Biostat., vol. 06, no. 01, 2015, doi: 10.4172/2155-6180.1000224.
G. Vink, “Roderick J. Little and Donald B. Rubin: Statistical Analysis with Missing Data,” Psychometrika, 2022, doi: 10.1007/s11336-022-09856-8.
R. S. Somasundaram and R. Nedunchezhian, “Evaluation of Three Simple Imputation Methods for Enhancing Preprocessing of Data with Missing Values,” Int. J. Comput. Appl., vol. 21, no. 10, 2011, doi: 10.5120/2619-3544.
S. Awawdeh, H. Faris, and H. Hiary, “EvoImputer: An evolutionary approach for Missing Data Imputation and feature selection in the context of supervised learning,” Knowledge-Based Syst., vol. 236, 2022, doi: 10.1016/j.knosys.2021.107734.
R. B. Kline, TXTBK Principles and practices of structural equation modelling Ed. 4 ***. 2015.
A. Basu, “Book Review: Missing Data: A Gentle Introduction, by Patrick E. McKnight, Katherine M. McKnight, Souraya Sidani, and Aurelio José Figueredo, New York: Guilford, 2007,” Am. J. Eval., vol. 28, no. 3, 2007, doi: 10.1177/1098214007306655.
W. DeSarbo and V. R. Rao, “A Constrained Unfolding Methodology for Product Positioning,” Mark. Sci., vol. 5, no. 1, 1986, doi: 10.1287/mksc.5.1.1.
N. M. Laird, “Missing data in longitudinal studies,” Stat. Med., vol. 7, no. 1–2, 1988, doi: 10.1002/sim.4780070131.
G. E. A. P. A. Batista and M. C. Monard, “A study of k-nearest neighbour as an imputation method,” Front. Artif. Intell. Appl., vol. 87, 2002.
BAN-PT, “Akreditasi Perguruan Tinggi Kriteria dan Prosedur 3.0,” Badan Akreditasi Nas. Perguru. Tinggi, p. 18, 2019.
S. Y. Siregar, S. St, T. Toharudin, B. Tantular, S. Si, and M. Si, “Performa Metode K Nearest Neighbor Imputation ( Knni ) Untuk Menangani Multivariate Missing Data,” pp. 1–7, 2013.
S. G. Liao et al., “Missing value imputation in high-dimensional phenomic data: Imputable or not, and how?,” BMC Bioinformatics, vol. 15, no. 1, 2014, doi: 10.1186/s12859-014-0346-6.
P. J. García-Laencina, J. L. Sancho-Gómez, A. R. Figueiras-Vidal, and M. Verleysen, “K nearest neighbours with mutual information for simultaneous classification and missing data imputation,” Neurocomputing, vol. 72, no. 7–9, 2009, doi: 10.1016/j.neucom.2008.11.026.
S. Susanti, S. Martha, and E. Sulistianingsih, “K NEAREST NEIGHBOR DALAM IMPUTASI MISSING DATA,” Bul. Ilm. Math. Stat. dan Ter., 2018.
G. M. Susanto, S. Kosasi, D. David, G. Gat, and S. M. Kuway, “Sistem Referensi Pemilihan Smartphone Android Dengan Metode Fuzzy C-Means dan TOPSIS,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 6, 2020, doi: 10.29207/resti.v4i6.2584.
I. Hidayatin, S. Adinugroho, and C. Dewi, “Pengelompokan Wilayah berdasarkan Penyandang Masalah Kesejahteraan Sosial (PMKS) dengan Optimasi Algoritme K-Means menggunakan Self Organizing Map (SOM),” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 3, no. 8, 2019.
E. Sartika, “ANALISIS METODE K NEAREST NEIGHBOR IMPUTATION (KNNI) UNTUK MENGATASI DATA HILANG PADA ESTIMASI DATA SURVEY,” TEDC, 2018.
S. Makridakis, S. Wheelwright C, and V. E. McGee, “Metode dan Aplikasi Peramalan,” Bin. Aksara, 1999.
Moch Farryz Rizkilloh and Sri Widiyanesti, “Prediksi Harga Cryptocurrency Menggunakan Algoritma Long Short Term Memory (LSTM),” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 6, no. 1, 2022, doi: 10.29207/resti.v6i1.3630.
I. M. Yudha Arya Dala, I. K. Gede Darma Putra, and P. Wira Buana, “Forecasting Cases of Dengue Hemorrhagic Fever Using the Backpropagation, Gaussians and Support-Vector Machine Methods,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 2, 2021, doi: 10.29207/resti.v5i2.2936.
Y. Jian, M. Pasquier, A. Sagahyroon, and F. Aloul, “Using Machine Learning to Predict Diabetes Complications,” 2021. doi: 10.1109/BioSMART54244.2021.9677649.
L. Akter and N. Akhter, “Ovarian Cancer Prediction from Ovarian Cysts Based on TVUS Using Machine Learning Algorithms,” in Lecture Notes on Data Engineering and Communications Technologies, vol. 95, 2022. doi: 10.1007/978-981-16-6636-0_5
Copyright (c) 2022 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;