Employee Education and Training Recommendations using the Apriori Algorithm

The Ministry of Finance (MoF) aims to enhance employee performance through suitable education and training opportunities. Based on the data of education and training implementation in 2022 at the Central ICT Department in MoF, only 27.35% of employees participated in education and training according to proposed needs for both positions and individuals. This is partly due to mandatory training that must be attended by some or all employees, urgent needs in the current year, or the substitute participants that is not from the same team or function. To address this issue, the association method of data mining techniques can be utilized to analyze historical data of employees. The study used the apriori algorithm to analyze historical data of employee positions, organizations, and education and training from 2011 to 2021. This research involved comparing various minimum support values, assuming that employees attended at least 2, 3, and 4 training courses, to calculate the corresponding minimum support values. The evaluation results of the model show that the best rules are generated with a minimum support value of 0.013 and a minimum confidence value of 0.6, which is a total of 10 rules. One of the training recommendations is that if an employee has taken the Enterprise Service Bus (ESB)-API Management training, they will take the ESB API Integration Platform training. Furthermore, it can be used by the Human Resource Unit to provide education and training aligned with organizational needs and improve employee competency in line with their duties and functions, leading to better overall organizational performance.


Introduction
In order to improve the performance of employees at the Ministry of Finance (MoF), each employee needs to be given the opportunity to improve their competency through education and training (training) that is appropriate to their duties and functions.MoF, through the Financial Education and Training Agency (BPPK) [1], provides various types of diklat, both soft skills and hard skills, that can be attended by all MoF employees.Available training include Distance Learning conducted through video conference since the pandemic, Elearning and microlearning for self-paced learning through video tutorials and quizzes, workshops and seminars via video conference, and technical training with classical methods, namely face-to-face in the classroom.In addition to diklat provided by BPPK, MoF also provides a budget for improving employee competency by attending training activities organized by educational institutions outside of MoF.
The mechanism for employees to attend training can be through proposals from the Human Resource (HR) unit carried out every year or by attending available elearning independently.In terms of training proposals, employees can propose the desired diklat to the HR unit at the end of the year for the following year's learning needs plan, and every month the HR unit will offer available training to employees.Employee-initiated training proposals are done by selecting training from the list provided by BPPK in the learning module on the e-performance application, where currently there are 1444 available training from various categories.Next, employees need to read the Program Reference Framework (KAP) documents for each training to find out the objectives, targets, expected competency standards, and participant requirements.As for training needs that are not available in BPPK, employees actively seek information on training organized by educational institutions outside of MoF according to their duties and functions.However, training proposal submissions to the HR unit are made in less than a week, making it impossible for employees to fully understand the entire KAP document or training syllabus.On the other hand, every month the HR unit confirms the participation of training participants and informs them of the available training offers for that month.
Based on the monthly monitoring of employee training data conducted by the HR unit at the MoF's Central ICT Department in 2022, only 27.35% of employees attended training that was proposed in accordance with their position and individual needs.Meanwhile, the rest is divided into 44.66% of employees attending mandatory training, 25.57% of employees attending training according to their duties and functions but not included in the initial proposal, and 2.42% of employees attending training that is not in accordance with the proposal or their duties and functions.It can be concluded that employees who attended training that did not match the proposed needs were either required to attend mandatory training, had sudden training needs in the current year, or were replacements for other training participants who were not from the same team or function.
To overcome the inefficiency in the training proposal selection process, an analysis is conducted by utilizing data mining techniques on employee training data, resulting in a training recommendation model.The intended training recommendation model can be used by the HR unit to provide training according to the needs of employees.It is expected that this model will enhance the competency of employees based on their duties and responsibilities.
The literature review covers various research studies on data mining using recommendation techniques.One study [2] utilized the apriori algorithm and association rules to generate sales strategies based on sales transaction data.The strongest association rules were determined using lift ratio.[3] conducted research on a book recommendation system for librarians using an association approach with an apriori algorithm.Other study [4] on a book recommendation system also adopts an association approach based on book borrowing data with an enhanced apriori algorithm to create a personalized book recommendation system.The system's performance using the apriori algorithm was compared to that using hybrid cooperative and K-means algorithms, and results showed that CPU usage, memory, and response time were minimal with 50 clients running the system simultaneously.A course recommendation system [5] was also developed using the apriori algorithm with minimum support and confidence thresholds of 7 and 80%, resulting in a high accuracy rate for study plan recommendations.A modified apriori algorithm was also proposed in another study [6] by combining combination reduction techniques and iteration limitation, where the iteration limit is taken from the maximum set size of the most frequently occurring transactions because the probability of getting the best association rules is higher, to improve time efficiency.Other studies utilize the FP-Growth algorithm in the data mining association method.Several studies utilize this algorithm for various purposes, such as examining violence patterns with a minimum support of 50% and a minimum confidence of 60% to identify frequent acts of violence.This enables DP3AKB to implement suitable preventive or intervention measures [7].Additionally, the algorithm is used to aid students in selecting elective courses [8] and optimize product placement on shelves based on customer search frequency levels [9].While [10] compared three methods for determining selfdevelopment training, namely the C4.5 algorithm, the combination of PCA and C4.5, and the combination of C4.5, discretization, and PCA.The test results show that the combination of PCA, discretization, and C4.5 gives better performance than the other two methods with an average accuracy rate of 86.6%.Finally, another study [11] compared the apriori algorithm and FP-Growth in analyzing sales transaction data and found that while the FP-Growth algorithm generated association rules faster, the apriori algorithm was superior in terms of itemset variations.
After reviewing the literature, it is evident that association techniques are used in several fields, ranging from recommending sales strategies [2], [6], [9], [11], book recommendations [3], [4], study plan recommendations [5], [8], and even violence prevention recommendations [7].However, research on training recommendations using data mining is very limited [10].Therefore, the researcher opted to use association data mining techniques with the Apriori method to design training recommendations.The aim was to determine the correlation between training, using historical employee data for the period 2011 to 2021, which will ultimately produce a training recommendation model that can be utilized by the personnel unit in training planning.

Research Flow
This research begins with problem identification, followed by a literature review to determine the approach chosen for analyzing data, in this case, the chosen approach is association.The next steps are in accordance with the stages in the CRISP-DM framework [12]- [14].It begins with the Business Understanding stage, which involves conducting document studies, interviews, and observations to gain a comprehensive understanding of the business context.In the Data Understanding stage, data related to the problem is collected over a span of 10 years, including employee training data, employee organizational data, and employee position data.The next step is Data Preparation, where the attributes to be used in the analysis are determined, and the collected data is Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 7 No. 5 (2023) DOI: https://doi.org/10.29207/resti.v7i5.4973Creative Commons Attribution 4.0 International License (CC BY 4.0) 1120 transformed accordingly.Following that is the Modeling stage, where a model is created using the Apriori method, which helps in identifying patterns or associations within the data.Once the model is created, it moves to the Evaluation stage, where its effectiveness is assessed to ensure it meets the business requirements.Finally, in the Deployment stage, conclusions are drawn from the research results, and a research report is prepared to present the findings obtained from the analysis.
Figure 1.The Research Stages [14] Figure 1 presents the flow of the research stages starting from the problem identification stage to the deployment stage which has been described previously.

Data Mining
Data mining is a process carried out to extract information and discover knowledge without explicit assumptions from large data sources, where the obtained information must have three characteristics, which are previously unknown, effective, and practical [15].Data mining can be divided into two main categories, namely predictive and descriptive.Descriptive patterns describe the general nature of the data, for example, through association analysis and clustering analysis [16].Meanwhile, predictive patterns summarize current data, for example, through classification and regression.As of now [17], the Cross Industry Standard Process for Data Mining (CRISP-DM) is the most commonly used framework for conducting data analysis, mining, and science work [18], [19].CRISP-DM [17] consists of 6 phases which include (1) Business understanding -identifying business objectives and data mining goals, (2) Data understanding -collecting, exploring, and validating data, (3) Data preparation -cleaning, transforming, and integrating data, (4) Modeling -selecting modeling techniques, creating models, and evaluating models, (5) Evaluation -evaluating modeling results against business objectives, and (6) Deployment -planning for deployment, monitoring, and maintenance.Some data mining techniques that are commonly used include [9] (1) Classification, which categorizes data into predefined categories, (2) Clustering, which groups data into several subsets or groups with high similarity within one group and low similarity between groups, (3) Regression, which predicts the value of a certain continuous variable based on other variable values, assuming a linear or nonlinear dependency model, and (4) Association rules, which detect sets of attributes that frequently occur together and form a number of rules.

Association Rules
Association rule is a data mining technique used to analyze or detect associations between items in a set of item combinations.The goal is to find the relationship between items in itemsets [20].Association rules are also known as market basket analysis.An illustration is the analysis of products purchased in a clothing store resulting in the likelihood of customers buying pants and shirts together.This can help the store owner to organize their inventory or offer promotion by providing special discounts for frequently purchased item combinations, which will increase sales [6].The basic stages of association rules are divided into two [3], [21].First is frequent pattern analysis.In this stage, the search for item combinations that meet the minimum support value criteria is performed, which is the value of the occurrence of product combinations in each transaction.The support value of an item is calculated by Formula 1: While the formula for calculating the support value for two items is in Formula 2 and Formula 3: ,  = Ρ ( ∩ ) (2) Second is association rule formation.After the frequent pattern is found, association rules that meet the minimum confidence value criteria are sought by calculating the confidence value of the "If A then B" rule using Formula 4.

The Apriori Method
The Apriori algorithm is a basic and popular algorithm in the application of association rules, introduced by Agrawal and Srikant in 1994.In the Apriori algorithm, each transaction is considered as an itemset where the algorithm will identify the items that are at least a minimum threshold subset as a new itemset.The approach used is "bottom-up" where the itemset is determined one by one, called candidate generation.This algorithm uses a breadth-first search and a hash tree structure to efficiently calculate the candidate itemsets.A group of candidates is tested against the data, which will be pruned if the candidate has infrequent subpatterns.This process is repeated until no more extensions are found.The Apriori algorithm is Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol.considered a brute-force method because it considers every k-itemset as a frequent itemset candidate [22].
Although Apriori is a clear and simple algorithm, its main weakness is the time it takes to manage a large number of candidates, which is very long due to the repetitive process for each candidate.Therefore, when the available resources are limited, in this case, CPU and memory, the use of this algorithm is not efficient [23].

Business Understanding
The business understanding phase is a process to understand how the proposal process for training and its supporting applications work.The process includes document studies, such as the Training Needs Analysis request memorandum and the previous year's training program list, interviews with employees in the HR unit and person in charge (PIC) of the workgroup, as well as observations.The supporting applications used in the training proposal process include the Learning module in the E-Performance application and the Employee Profile module in the HRIS application.

Data Understanding
The data understanding phase is carried out to understand and identify the required data and collect it.
Based on the identification results, the data used are the historical training data, historical employee position data, and historical organizational data were obtained from the HRIS application.
Historical data on employee training for the period 2011 to 2021 with a total of 7.615 records as shown in Table 1, contains information on the name of the education and training that each employee has participated in along with the date of implementation and the name of the education and training organizer.
Table 2 shows the employee position's historical data for the period 2011 to 2021 with a total of 4.950 records, which includes details on employee ID, position ID, start date, end date, and decree number.
Upon examining the table, it becomes evident that the date format used in the start date and end date columns differ from one another.Consequently, this inconsistency must be resolved during the data preparation phase.
In Table 3, there are records of the employee reassignments within the company covering the period from 2011 to 2021 with a total of 2.701 records, comprised of details on Employee ID, Position ID, Position Name, Unit Name, and Echelon Name.

Data preparation
The next step is data preparation, also known as data preprocessing, which aims to remove or improve certain characteristics so that the data is ready for use in the analysis of training recommendations.Data preprocessing is performed using the Python programming language by first importing the data from Ms. Excel format to Jupyter Notebook [24].
The following are the data preprocessing steps that were applied which include (1)   The results of the fifth step in the data preparation stage can be seen in Table 4 where the table provides information on mapping the roles of employee positions when the training is carried out with a total of 3.181 records.For example, an employee with ID 104471 attended the Cisco BGP Advanced and Networking Advance training when he was in the DCOPS role and attended the Enterprise Service Bus: API Integration Platform training when he was in the ARCHITECTURE role.The binary table resulting from data preparation stage contains 312 records as seen in Table 5.The information presented in Table 5 illustrates how the employee ID and SubRole correspond to various training programs.The information is displayed in binary table format, where a "true" value indicates that the employee participated in the training, and a "false" value indicates that they did not attend.As an illustration, an employee with the ID number 75956 who belongs to the APL subrole attended both the Implementing Cisco IP Routing and Implementing Cisco IP Switched Networks training sessions.

Descriptive Analysis
After going through the data preparation stage, descriptive analysis was conducted by displaying visualization of training uptake trends per year, the most frequently taken trainings, as well as the number of employees with excess or insufficient amount of training.Based on Figure 4, it can be seen that in the period from 2011-2021, the adequacy of employees in attending training with the 'Insufficient' status was 69.07%.Meanwhile, employees with 'Adequate' status were 12.36%, and 'Excess' status were 18.57%.Then, the itemset results for each minimum support value were used to calculate the average confidence value.
Table 6 shows the calculation using a minimum support value of 0.006, resulting in 252 rows of itemsets.
Association rules were then created on the prepared baskets, resulting in a table with 436 rules as shown in Table 7.Based on  Calculation using a minimum support value of 0.009 resulted in 118 rows of itemsets, as shown in Table 8.

Evaluation
The model evaluation and validation were performed by calculating the average confidence value for each minimum support value so that the minimum confidence value was determined from the highest average confidence value of 0.6, as seen in Table 12, to obtain higher accuracy.
To obtain the rules with the best accuracy, a recalculation was performed using a minimum confidence value of 0.6.The number of rules obtain from recalculation results can be seen in Table 13 in the last column.The details of the rules generated with minimum support of 0.013 and minimum confidence of 0.6 are displayed in Table 14.Furthermore, according to the above Apriori model evaluation results, the 10 rules with the best accuracy can be utilized by the HR unit as input in determining training according to organizational needs and can improve employee competency according to their duties and functions.

Conclusion
This research was conducted using three datasets covering the period from 2011 to 2021.The datasets include historical employee training data with a total of 7.615 records, historical employee organizational data with a total of 2.701 records, and historical employee position data with a total of 4.950 records.These three datasets were processed and integrated during the data preparation stage to create a single binary table dataset consisting of 312 records, which encompassed employee information, roles, and training names.
In the modeling phase, a descriptive analysis was conducted based on historical employee training data to identify trends in training uptake per year, the most frequently taken trainings, and the number of employees with excess or insufficient amount of training.The results of the descriptive analysis revealed that the highest number of training sessions attended by employees for internal training occurred in 2019, with a total of 1,557 sessions.As for external training, the highest number of sessions attended was in 2021, with a total of 48 sessions.Additionally, it was found that the most commonly attended internal training format was E-Learning.In terms of training adequacy, there are 69.07% of employees with an insufficient status, where employees participate in fewer than 2 internal/external training sessions per year.Therefore, HR needs to further analyze the causes of this occurrence.
Next, modeling was conducted using the Apriori algorithm with the prepared binary table dataset to identify correlations between the trainings attended by employees.This research utilized three minimum support values, assuming that employees attended at least 2, 3, and 4 trainings.As a result, the obtained minimum support values were 0.006, 0.009, and 0.0013, respectively.Meanwhile, the minimum confidence value to achieve the highest accuracy was determined based on the highest average confidence value for each minimum support value, which was set at 0.6.Based on the model evaluation results, the best rule was generated with a minimum support value of 0.013 and a minimum confidence value of 0.6, resulting in 10 rules with strong correlations between antecedents and consequents, providing a potential accuracy of more than 60%.Further research can be conducted by comparing the performance of the Apriori algorithm with other algorithms to obtain the algorithm with the best performance.

Figure 2 .
Figure 2. Training Uptake Trends Per YearFigure2shows the trend in the number of trainings taken per year during the period from 2011 to 2021, where the highest number of internal trainings was taken in with a total of 1.557 trainings and the highest number of external trainings was taken in 2015 with a total of 48 trainings.Next, an analysis was conducted on the 5 most frequently internal trainings taken by employees during the same period.Based on Figure3, it can be seen that 4 of the 5 internal training that are most often attended by employees are in the form of E-learning.To determine the adequacy of trainings attended by employees, both internal and external, classification is

Figure 4 .
Figure 4. Status Of Training Uptake By Employees3.5.Modeling with Apriori AlgorithmWhen creating a model using the Apriori algorithm, it is important to establish the minimum support and confidence values beforehand, as they will have an impact on the number of rules generated.The greater the support and confidence values used, the fewer the

Table 3 .
Historical Employee Position Data ) converting the data format in the basket to a binary table of 312 transactions as shown in Table5, where the columns are transaction items containing true or false values.The transactions refers to one employee in one subrole, and the item column is the name of the training.

Table 4 .
Mapping Of Employee Position Role At The Time Of The Training

Table 5 .
Binary Table From Preprocessing Data

Table 7 ,
one of the recommendation is that if an employee has taken the Working Load Analysis and 40-hour TOEFL Preparation Course training, they will take the Job Analysis training with a probability of 100%.

Table 6 .
Calculation With Minimum Support Of 0.006

Table 8 .
Calculation With Minimum Support Of 0.009

Table 9 .
Result Of Apriori Algorithm With Minimum Support 0,009

Table 10 .
Calculation With Minimum Support Of 0.013

Table 11
contains a total of 36 association rules that were derived from the basket in Table 10.The information presented in Table 11 suggests that if an employee participates in the IT Service Management Foundation training, then they are also likely to enroll in the Prediction test of TOEIC training.Arief Wibowo, Vasthu Imaniar Ivanoti, Megananda Hervita Permata Sari Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 7 No. 5 (2023) DOI: https://doi.org/10.29207/resti.v7i5.4973Creative Commons Attribution 4.0 International License (CC BY 4.0) 1129 Table 12 provides a concise summary of the outcomes that were achieved as a result of implementing the Apriori algorithm model.

Table 12 .
Summary Of Modeling Results

Table 13 .
Summary Of Modeling Results

Table 14 .
Apriori Algorithm Calculation Results With A Minimum Support Of 0.013 And A Minimum Confidence Of 0.6 Vasthu Imaniar Ivanoti, Megananda Hervita Permata Sari Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 7 No. 5 (2023) DOI: https://doi.org/10.29207/resti.v7i5.4973Creative Commons Attribution 4.0 International License (CC BY 4.0) 1130 Based on the results of the descriptive analysis, there are 69.07% of employees with an insufficient status, where employees participate in fewer than 2 internal/external training sessions per year.Therefore, HR needs to further analyze the causes of this occurrence.