Footstep Recognition Using Mel Frequency Cepstral Coefficients and Artificial Neural Network

Footstep recognition is relatively new biometrics and based on the learning of footsteps signals captured from people walking on the sensing area. The footstep signals classification process for security systems still has a low level of accuracy. Therefore, we need a classification system that has a high accuracy for security systems. Most systems are generally developed using geometric and holistic features but still provide high error rates. In this research, a new system is proposed by using the Mel Frequency Cepstral Coefficients (MFCCs) feature extraction, because it has a good linear frequency as a copycat of the human hearing system and Artificial Neural Network (ANN) as a classification algorithm because it has a good level of accuracy with a dataset of 500 recording footsteps. The classification results show that the proposed system can achieve the highest accuracy of validation loss value 57.3, Accuracy testing 92.0%, loss value 193.8, and accuracy training 100%, the accuracy results are an evaluation of the system in improving the foot signal recognition system for security systems in the smart home environment.


Introduction
Every human being has different characteristics, one of which is his footsteps. Every human footstep has differences, between one human and another. The differences come many features, like the swing of his footsteps, the rhythm of his footsteps, and the length of his footsteps. A lot of size-based technologies that exist in the human body or commonly referred to as biometrics such as fingerprints, eye corneas, and faces, are used in the field of security to prevent acts of crime.
According to 2017 crime statistics, the number of crime incidents in Indonesia has not decreased significantly. In 2013 there were an estimated 140 people who were at risk of crime, each 100,000 population, 131 people in 2014, and 140 people in 2015 [1]. From 2013 to 2015, the national police registration data revealed that crime incidents in Indonesia tended to change or were not constant.  [2], data shows that the occurrence of crime continues every year. The rise of these actions caused unrest, which made the community must be wary of home security. Therefore a security system is needed to prevent crime. The footstep recognition system is one of the ways that homeowners can do it.
Previously, many studies have been conducted to build a recognition system using human footsteps. Difference systems have been built before. Riwurohi et al [3] conducted a study for a recognition system that focuses on data acquisition in applying scenarios to obtain various variations of footsteps on existing media with recordings to recognize the footsteps of people from three sides. The sides are left, middle, and right sides (for general/unspecified person) by applying the MFCCs feature extraction method and ANN backpropagation algorithm for classification. The ANN architecture uses five input neurons, two hidden, each of which has 30 neurons, and ten output neurons (as a result of the representation of the number of people involved in the study). The dataset used changed ten people consisting of 5 men and 5 women using wooden or other footwear of the same type. The research dataset was taken in a closed room with a recording sample using four microphones with a total of 720 footsteps. At the time of testing the distribution of training data and testing, data amounted to 70:30. The results obtained in that study were 98.8% for the recognition of people who walked on the left side of the track, 98.8% for the middle side of the track, and 95% for the recognition of people who ran on the right side of the track. While the system that will be built now is the writer will build a footstep signal recognition system as the identity of the homeowner that is applied to the security system in a smart home environment (people are more specific/in accordance with the characteristics of his footsteps) by applying the MFCCs feature extraction method and ANN algorithm (backpropagation and feed-forward) for classification. Accuracy results that will be obtained in the form of system accuracy can recognize the signal of the smart home owner's footsteps.
The development of digital technology has made a lot of security systems that come from manuals into digital forms. In research [4], a security system study was conducted on smart homes by applying footstep signals as a security system by obtaining an error accuracy of 13% using SVM as a classification algorithm. In research [5], a security system study was conducted to identify people in the room using structural footsteps that have an accuracy of 83% using uncertain footsteps. When footsteps are unknown, identification accuracy increases to 96.5%. Furthermore, research [6] conducted a study of the body's safety system by detecting the severity of asthma based on breath voices using the extraction of the Mel Frequency Cepstral Coefficients (MFCCs) and K-Nearest Neighbor (KNN) features as a classification algorithm. Accuracy results reached an average of 97.5%. In [7], a security system for advanced driver assistance (ADAS) has been carried out by classifying the driver's intention to change paths based on information that has been measured and estimated using machine learning-based estimation techniques using Artificial Neural Network (ANN) and Support Vector Machine (SVM). Accuracy results reach 90% under any surface conditions. One classification algorithm that can be used to carry out the classification process is ANN. ANN has been used for various classification problems. In research [8][9] a comparison of ANN and SVM was conducted. The results obtained, ANN had better results with the achievement of the accuracy of 88.4% compared to SVM, which reached 78.2%. Furthermore, in research [10][11] ANN algorithm and MFCCs feature extraction were used for 96.4% Malayalam speech recognition. In research [12][13] a study was carried out using MFCCs feature extraction to translate speech into digital text with an accuracy of around 87.5%. Therefore, this research proposes the feature extraction of MFCCs and ANN algorithms to be used for footstep signals recognition classification. The focus in this research includes testing of footstep signals recognition to improve the footstep signals recognition system for security systems in the smart home environment.

Research Method
This research is an experimental based research. This section explains research methods, biometric theory, MFCCs, and ANN.

Research Methodology
The system is designed to consist of a process to the data split or data separation process. Next, the process is divided into two parts, namely the training process and the testing process, as illustrated in Figure 1. In the first process carried out is to convert the footstep signal using the MFCCs feature extraction by dividing the signal into several frames and changing it into a cepstrum to get the vector value. The results of MFCCs will be labeled and used as a dataset in this study. In the next process, the separation of the dataset is grouped into two parts, namely training data and testing data. In the training process, the system will be trained using training data by applying the ANN algorithm to get the best classification model for footstep recognition. Furthermore, after getting the model, then the testing process is carried out using test data by applying ANN classification that have been obtained in the training process. After carrying out these tests it will get the results of the accuracy of the footstep recognition system process.  [15], biometrics is the science that identifies a person based on unique features related to the human body, such as fingerprints, voice, face, retina, and footsteps. Good biometrics are the ones with certain characteristics, such as unique to everyone. Biometric features can be obtained quickly, have highresolution capabilities, and features obtained from biometric extraction will not change over time or can be said to be fixed. The biometric features in research [16] stated that biometrics are very important to be applied for safety factors. The application of biometrics as a security system can be applied in various environments.
In [4], biometrics is applied in the smart home environment. It also can be applied in smart parking areas [17]. In addition to security systems, biometric factors can be used as recognition systems [18].

Mel Frequency Cepstral Coefficients (MFCCs)
MFCCs are variations of the human hearing bandwidth with frequencies below 1000 Hz to copy the mechanism of human hearing. The characteristics of MFCCs are having linear frequencies below 1000 Hz and logs above 1000 Hz. The advantages of MFCCs are having a high level of performance in hearing recognition accuracy, capturing the characteristics of a sound, and having a low complexity [19], that the characteristics and performance of MFCCs can be applied as a signal translator system [20].
The way MFCCs work is the following steps. First, the frequency signal is divided into several frames and applies hamming windowing to each frame. After the process, to get the wave set from each frame, we use the formula in equation (1).
where y(n) is the windowing frame and Y(k) is the Discrete Fourier Transforms (DFT) of each frame. After getting a collection of waves, the next process is the calculation of tones that play a different scale (mel scale). Next in equation (2) the formula for calculating the estimate of mels.
( ) = 2595 × log 10 (1 + /700) Where f is the frequency (Hz) [21], then the next step is to filter the results of the mel scale using the mel filterbank. The results of the filtering process will take the output value for each filter. In the process of taking it, we use the Discrete Cosine Transform (DCT) on the log mel obtained by the spectrum. The following formula for the calculation of DCT in equation (3).

Artificial Neural Network (ANN)
ANN is one of the classification algorithm structures to get data accuracy and has a good level of performance [10]. The use of ANN can be applied to classify various features of the recognition system [22]. The structure of the ANN algorithm consists of three layers, namely the input, hidden, and output layers. Each layer consists of several neurons that are interconnected with other neurons. Each connection represents the weight that contributes to each relationship with the right activation function. It also represents a combination that can be optimized by predicting the weights in equation (4) as follows.
where is the representation of the connection weight value, represents the input variable, b represents the bias, and is an activation function.
Development of ANN models based on knowledge, training, and testing. The training process is the process of learning from a database and being trained using datasets that were not previously used in the training process. ANN training consists of database preparation and selection of a large number of variables to reduce errors in the training process, and the range of data must also be broad so that the training gets the optimal value. To validate a trained ANN, it is necessary to test the model using a data set that is not used during the training process. From the input results, the test dataset can be compared with the actual data when testing using the calculation of the root mean square error (RMSE) in equation (5)  Where is the predicted value of ANN, is the actual value, and n is the number of datasets used [23].

Dataset
The dataset is obtained from the conversion of footstep signals using the MFCCs feature extraction. In the process of MFCCs, the data used were 500 footstep records consisting of three people (first 400 records and two other 50 records each). Each footstep has recorded by using a standard microphone that is duration 2-3 seconds in duration. A Recording is done on the floor without using footwear with the microphone beside the feet which are about 10-15 cm apart. The standard microphone used is one. To get footstep signal conversion results, here are the pseudo-code from MFCCs. The pseudo-code above is a way of working to convert and get vector results from footstep signals using the MFCCs feature extraction.

Program of MFCCs
The next step is how the dataset works. The following pseudo-code from dataset. From the pseudo-code above, the conversion results of each footstep signal are converted into one frequency, as in Table 1. From the conversion results, each data has 1260 columns. In this process, the researcher will categorize each of the results of signal conversion. The results after categorization can be displayed in Table 2. Table 2. is the data after being labeled by showing label number 1 is data that will be recognized by the system as actual data and label number 0 is data that is not true (as a comparison when doing classification). The results of categorization are a dataset that will be used during footstep signal training and footstep signal testing. In this process, the dataset that was previously obtained will be divided into two parts, namely the data used for the training process and the testing process. In this process, the dataset is divided into 75% for training data and 25% for testing data. The following pseudo-code of split data.

Training and Testing
The following in Figure 2. Diagram Schema of Training and Testing is a schematic diagram for the footstep signals training process and footstep signals testing. Figure 2. Diagram Schema of Training and Testing, shows the flow process of the footstep training data and footstep signal testing data. In the first process, the dataset that has been previously processed will be separated into two parts, namely training data and testing data. The data will be classified into the input layer, then the results issued by the input layer are forwarded to the hidden layer for computing. Furthermore, the results of the computing will be processed at the output layer and output from the previous process.  Footstep signals training data will be used to build the model. In this process, ANN will study all training data and optimize the classification process. The ANN classification module uses three-layer structures namely, at the input layer and hidden layer, there are 50 neurons each with relu activation and at the output layer, there is one neuron with sigmoid activation. The total training data used was 375 data, as illustrated Figure 3. Layer Structure of ANN Classification.  Figure 3. Layer Structure of ANN Classification with the input layer will receive training data and supply vector inputs to the network, then, the input entered computes the hidden layer, then the results of the hidden layer are used as input from the output layer. After that, the output layer will distribute output signals from network processing.

Testing
After conducting the training process, the test data serves to test and evaluate the model that has been built. In this process, the test will be carried out by classifying data that has not been seen and doing accuracy calculations. The total test data used was 125 data.

Result and Discussion
Testing were carried out using Python language with the Google Collaboratory platform. The device used in this study uses a laptop with AMD Ryzen 5 2500U processor specifications, 4GB of RAM, 128GB hard disk capacity (SSD) + 1TB, and Windows 10 as the operating system.

Cepstrum Result of MFCCs
Footsteps recording is converted into a vector collection form that is processed using MFCCs feature extraction.
In the MFCCs process, the recording is partitioned into several frames and each frame is applied Hamming Windowing. From each frame, you will get a collection of waves by applying FFT. From this wave group, matrix and cepstrum are obtained to be converted into vector. Following is the cepstrum results using MFCCs feature extraction.  The results of the cepstrum will be used as a dataset in testing.

Classification Test Result of ANN
The partition is done in a partition to determine the amount of training set and testing data set. Then the data is conducted training and testing to obtain the highest accuracy results as shown in Figure 5. testing process between the range 0-500. then on the yaxis represents the accuracy value obtained in each experiment. This accuracy graph means, each accuracy value from all footstep signal training obtained by the train data and test data.
Furthermore, there are results of accuracy loss that occurs in the data train and data test, as illustrated in Figure 5. Loss of Training and Testing. Figure 6. Loss of Training and Testing Figure 6. Loss of Training and Testing, it can be seen each result of loss value from all training obtained by the train data and test data. From all the training conducted, the highest accuracy was obtained, as in Table 3. In Table 3. based on the results of the classification using the ANN algorithm, the validation loss value (error when testing) was obtained at Epoch-1 with the highest value of 57.3. Furthermore, validation accuracy on the 94th epoch with the highest accuracy of 92.0%. Then, loss (Error during training) at epoch 0 with the highest value of 193.8. Finally, accuracy (Training Accuracy) on the 108th epoch with the highest accuracy of 100%.

Conclusion
In this research, the implementation of the Mel Frequency Cepstral Coefficients (MFCCs) feature extraction and Artificial Neural Network (ANN) algorithm for footstep signals recognition systems has been carried out. This research uses a footstep signals conversion result dataset containing 500 records consisting of three people (first 400 records, two others 50 records each). Each footstep was recorded using a standard microphone that was approximately 2-3 seconds in length. The recording was performed on the floor without using footwear with the microphone beside the feet, which are about 10-15 cm apart and was added with a label category with the number 1 or 0 as the output label. The results showed that MFCCs feature extraction and ANN algorithm were successfully used to classify the recognition of footstep signals with a validation loss value of 57.3, accuracy (testing accuracy) of 92.0%, loss value (training error) 193.8, and accuracy (training accuracy) 100%. From the classification results, it is expected to be used in further research in implementing security systems.