Educational Data Mining Using Cluster Analysis Methods and Decision Trees based on Log Mining

  • Safira Nuri Safitri Universitas Sebelas Maret
  • Haryono Setiadi Universitas Sebelas Maret
  • Esti Suryani
Keywords: Analysis Cluster, Decision Tree, Educational Data Mining (EDM), Log, SPADA


Educational Data Mining (EDM) often appears to be applied in big data processing in the education sector. One of the educational data that can be further processed with EDM is activity log data from an e-learning system used in teaching and learning activities. The log activity can be further processed more specifically by using log mining. The purpose of this study was to process log data from the Sebelas Maret University Online Learning System (SPADA UNS) to determine student learning behavior patterns and their relationship to the final results obtained. The data mining method applied in this research is cluster analysis with the K-means Clustering and Decision Tree algorithms. The clustering process is used to find groups of students who have similar learning patterns. While the decision tree is used to model the results of the clustering in order to enable the analysis and decision-making processes. Processing of 11,139 SPADA UNS log data resulted in 3 clusters with a Davies Bouldin Index (DBI) value of 0.229. The results of these three clusters are modeled by using a Decision Tree. The decision tree model in cluster 0 represents a group of students who have a low tendency of learning behavior patterns with the highest frequency of access to course viewing activities obtained accuracy of 74.42% . In cluster 1, which contains groups of students with high learning behavior patterns, have a high frequency of access to viewing discussion activities obtained accuracy of 76.47%. While cluster 2 is a group of students who have a pattern of learning behavior that is having a high frequency of access to the activity of sending assignments obtained accuracy of 90.00%.



Download data is not yet available.


J. Liang, J. Yang, Y. Wu, C. Li, and L. Zheng, “Big data application in education: Dropout prediction in edx MOOCs,” Proc. - 2016 IEEE 2nd Int. Conf. Multimed. Big Data, BigMM 2016, pp. 440–443, 2016, doi: 10.1109/BigMM.2016.70.

S. Agarwal, Data mining: Data mining concepts and techniques. 2014.

C. Romero, S. Ventura, and E. García, “Data mining in course management systems: Moodle case study and tutorial,” Comput. Educ., vol. 51, no. 1, pp. 368–384, 2008, doi: 10.1016/j.compedu.2007.05.016.

C. Angeli, S. K. Howard, J. Ma, J. Yang, and P. A. Kirschner, “Computers & Education Data mining in educational technology classroom research : Can it make a contribution ?,” Comput. Educ., vol. 113, pp. 226–242, 2017, doi: 10.1016/j.compedu.2017.05.021.

S. Križanić, “Educational data mining using cluster analysis and decision tree technique: A case study,” Int. J. Eng. Bus. Manag., vol. 12, pp. 1–9, 2020, doi: 10.1177/1847979020908675.

M. Pettinato, J. P. Gil, P. Galeas, and B. Russo, “Log mining to re-construct system behavior: An exploratory study on a large telescope system,” Inf. Softw. Technol., vol. 114, no. May, pp. 121–136, 2019, doi: 10.1016/j.infsof.2019.06.011.

T. Lerche and E. Kiel, “Predicting student achievement in learning management systems by log data analysis,” Comput. Human Behav., vol. 89, pp. 367–372, 2018, doi: 10.1016/j.chb.2018.06.015.

M. Hussain, M. A. Sujith, and M. Abdullah, “Mining Educational Data for Academic Accreditation : Aligning Assessment with Outcomes,” Glob. J. Flex. Syst. Manag., 2016, doi: 10.1007/s40171-016-0143-3.

Y. Park and I. Jo, “Assessment & Evaluation in Higher Education Using log variables in a learning management system to evaluate learning activity using the lens of activity theory,” vol. 2938, no. April, pp. 0–17, 2016, doi: 10.1080/02602938.2016.1158236.

N. Kerimbayev, N. Nurym, А. Akramova, and S. Abdykarimova, “Virtual educational environment: interactive communication using LMS Moodle,” Educ. Inf. Technol., vol. 25, no. 3, pp. 1965–1982, 2020, doi: 10.1007/s10639-019-10067-5.

U. Anis Chaeruman, B. Wibawa, and Z. Syahrial, “Determining the Appropriate Blend of Blended Learning: A Formative Research in the Context of Spada-Indonesia,” Am. J. Educ. Res., vol. 6, no. 3, pp. 188–195, 2018, doi: 10.12691/education-6-3-5.

B. Bakhshinategh, O. R. Zaiane, S. ElAtia, and D. Ipperciel, “Educational data mining applications and tasks: A survey of the last 10 years,” Educ. Inf. Technol., vol. 23, no. 1, pp. 537–553, 2018, doi: 10.1007/s10639-017-9616-z.

M. H. Cho and J. S. Yoo, “Exploring online students’ self-regulated learning with self-reported surveys and log files: a data mining approach,” Interact. Learn. Environ., vol. 25, no. 8, pp. 970–982, 2017, doi: 10.1080/10494820.2016.1232278.

C. Pradana, S. S. Kusumawardani, and A. E. Permanasari, “Comparison Clustering Performance Based on Moodle Log Mining,” IOP Conf. Ser. Mater. Sci. Eng., vol. 722, no. 1, 2020, doi: 10.1088/1757-899X/722/1/012012.

I. Vhallah, S. Sumijan, J. Santony, and others, “Pengelompokan mahasiswa potensial drop out menggunakan metode Clustering K-Means,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 2, no. 2, pp. 572–577, 2018.

R. Ananda, A. Z. Yamani, and others, “Determination of Initial K-means Centroid in the Process of Clustering Data Evaluation of Teaching Lecturers,” J. RESTI (Rekayasa Sist. Dan Teknol. Informasi), vol. 4, no. 3, pp. 544–550, 2020.

T. Susilowati, D. Sugiarto, I. Mardianto, and others, “Validity Test of Self-Organizing Map (SOM) and K-Means Algorithm for Employee Grouping,” J. RESTI (Rekayasa Sist. Dan Teknol. Informasi), vol. 4, no. 6, pp. 1171–1178, 2020.

I. Romli, F. Kharida, and C. Naya, “Penentuan Kepuasan Pelanggan Terhadap Pelayanan Kantor Pelayanan Pajak Menggunakan C4. 5 dan PSO,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 2, pp. 296–302, 2020.

M. Capó, A. Pérez, and J. A. Lozano, “Knowle dge-Base d Systems An efficient approximation to the K -means clustering for massive data,” vol. 0, pp. 1–14, 2016, doi: 10.1016/j.knosys.2016.06.031.

D. Singh and B. Singh, “Investigating the impact of data normalization on classification performance,” Appl. Soft Comput. J., vol. 97, p. 105524, 2020, doi: 10.1016/j.asoc.2019.105524.

S. S. Yu, S. W. Chu, C. M. Wang, Y. K. Chan, and T. C. Chang, “Two improved k-means algorithms,” Appl. Soft Comput. J., vol. 68, pp. 747–755, 2018, doi: 10.1016/j.asoc.2017.08.032.

R. Ünlü and P. Xanthopoulos, “Estimating the number of clusters in a dataset via consensus clustering,” Expert Syst. Appl., vol. 125, pp. 33–39, 2019, doi: 10.1016/j.eswa.2019.01.074.

D. C. Wickramarachchi, B. L. Robertson, M. Reale, C. J. Price, and J. Brown, “HHCART: An oblique decision tree,” Comput. Stat. Data Anal., vol. 96, pp. 12–23, 2016, doi: 10.1016/j.csda.2015.11.006.

B. Irena, E. B. Setiawan, and others, “Fake news (hoax) identification on social media twitter using decision tree c4. 5 method,” J. RESTI (Rekayasa Sist. Dan Teknol. Informasi), vol. 4, no. 4, pp. 711–716, 2020.

A. Souri, M. Yassin, G. Aram, M. Ahmed, and F. Safara, “A new machine learning-based healthcare monitoring model for student ’ s condition diagnosis in Internet of Things environment,” Soft Comput., vol. 6, 2020, doi: 10.1007/s00500-020-05003-6.

W. Chen, X. Xie, J. Wang, B. Pradhan, H. Hong, and D. Tien, “Catena A comparative study of logistic model tree , random forest , and classi fi cation and regression tree models for spatial prediction of landslide susceptibility,” Catena, vol. 151, pp. 147–160, 2017, doi: 10.1016/j.catena.2016.11.032.

W. Chen, S. Zhang, R. Li, and H. Shahabi, “Science of the Total Environment Performance evaluation of the GIS-based data mining techniques of best- fi rst decision tree , random forest , and naïve Bayes tree for landslide susceptibility modeling,” Sci. Total Environ., vol. 644, pp. 1006–1018, 2018, doi: 10.1016/j.scitotenv.2018.06.389.

D. Delen, C. Kuzey, and A. Uyar, “Expert Systems with Applications Measuring firm performance using financial ratios : A decision tree approach,” Expert Syst. Appl., vol. 40, no. 10, pp. 3970–3983, 2013, doi: 10.1016/j.eswa.2013.01.012.

M. Hasan, M. Islam, I. I. Zarif, and M. M. A. Hashem, “Internet of Things Attack and anomaly detection in IoT sensors in IoT sites using machine learning approaches,” Internet of Things, vol. 7, p. 100059, 2019, doi: 10.1016/j.iot.2019.100059.

T. Reddy, G. Neelu, K. Sweta, and B. Saurabh, “Deep neural networks to predict diabetic retinopathy,” J. Ambient Intell. Humaniz. Comput., no. 0123456789, 2020, doi: 10.1007/s12652-020-01963-7.

Y. Yang, “The Evaluation of Online Education Course Performance Using,” vol. 2021, 2021.

How to Cite
Safitri, S. N., Haryono Setiadi, & Suryani, E. (2022). Educational Data Mining Using Cluster Analysis Methods and Decision Trees based on Log Mining. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(3), 448 - 456.
Information Systems Engineering Articles