Analysis of the Effect of Data Scaling on the Performance of the Machine Learning Algorithm for Plant Identification

Analisis Pengaruh Data Scaling Terhadap Performa Algoritma Machine Learning untuk Identifikasi Tanaman

  • Agus Ambarwari Universitas Teknokrat Indonesia
  • Qadhli Jafar Adrian Universitas Teknokrat Indonesia
  • Yeni Herdiyeni Institut Pertanian Bogor
Keywords: min-max normalization, standardization, zero-mean normalization, machine learning algorithms

Abstract

Data scaling has an important role in preprocessing data that has an impact on the performance of machine learning algorithms. This study aims to analyze the effect of min-max normalization techniques and standardization (zero-mean normalization) on the performance of machine learning algorithms. The stages carried out in this study included data normalization on the data of leaf venation features. The results of the normalized dataset, then tested to four machine learning algorithms include KNN, Naïve Bayesian, ANN, SVM with RBF kernels and linear kernels. The analysis was carried out on the results of model evaluations using 10-fold cross-validation, and validation using test data. The results obtained show that Naïve Bayesian has the most stable performance against the use of min-max normalization techniques as well as standardization. The KNN algorithm is quite stable compared to SVM and ANN. However, the combination of the min-max normalization technique with SVM that uses the RBF kernel can provide the best performance results. On the other hand, SVM with a linear kernel, the best performance is obtained when applying standardization techniques (zero-mean normalization). While the ANN algorithm, it is necessary to do a number of trials to find out the best data normalization techniques that match the algorithm.

Downloads

Download data is not yet available.

References

E. Mata-Montero and J. Carranza-Rojas, “Automated Plant Species Identification: Challenges and Opportunities,” IFIP World Inf. Technol. Forum, vol. 481, pp. 26–36, 2016.

E. Hamuda, M. Glavin, and E. Jones, “A survey of image processing techniques for plant extraction and segmentation in the field,” Comput. Electron. Agric., vol. 125, pp. 184–199, 2016.

B. VijayaLakshmi and V. Mohan, “Kernel-based PSO and FRVM: An automatic plant leaf type detection using texture, shape, and color features,” Comput. Electron. Agric., vol. 125, pp. 99–112, 2016.

A. Aakif and M. F. Khan, “Automatic classification of plants based on their leaves,” Biosyst. Eng., vol. 139, pp. 66–75, 2015.

J. D. S. Selda, R. M. R. Ellera, L. C. Cajayon, and N. B. Linsangan, “Plant Identification by Image Processing of Leaf Veins,” in ICISPC 2017 Proceedings of the International Conference on Imaging, Signal Processing and Communication, 2017, pp. 40–44.

T.-L. Le, N.-D. Duong, V.-T. Nguyen, H. Vu, V.-N. Hoang, and T. T.-N. Nguyen, “Complex Background Leaf-based Plant Identification Method Based on Interactive Segmentation and Kernel Descriptor,” in Proceedings of the 2nd International Workshop on Environmental Multimedia Retrieval, 2015, pp. 3–8.

S. Singh and M. S. Bhamrah, “Leaf Identification Using Feature Extraction and Neural Network,” IOSR J. Electron. Commun. Eng., vol. 10, no. 5, pp. 134–140, 2015.

A. Bakhshipour and A. Jafari, “Evaluation of support vector machine and artificial neural networks in weed detection using shape features,” Comput. Electron. Agric., vol. 145, pp. 153–160, 2018.

D. Tomar and S. Agarwal, “Leaf Recognition for Plant Classification Using Direct Acyclic Graph Based Multi-Class Least Squares Twin Support Vector Machine,” Int. J. Image Graph., vol. 16, no. 03, p. 1650012, 2016.

F. R. F. Padao and E. A. Maravillas, “Using Naïve Bayesian method for plant leaf classification based on shape and texture features,” in 2015 International Conference on Humanoid, Nanotechnology, Information Technology,Communication and Control, Environment and Management (HNICEM), 2015, no. December.

A. Ambarwari, Y. Herdiyeni, and I. Hermadi, “Identification of Venation Type Based on Venation Density using Digital Image Processing,” J. Teknoinfo, vol. 12, no. 2, pp. 87–92, 2018.

A. B. A. Graf and S. Borer, “Normalization in support vector machines,” in Radig B., Florczyk S. (eds) Pattern Recognition, 2001, pp. 277–282.

W. Li and Z. Liu, “A method of SVM with normalization in intrusion detection,” Procedia Environ. Sci., vol. 11, pp. 256–262, 2011.

Y. Tang and I. Sutskever, “Data normalization in the learning of restricted Boltzmann machines,” in Department of Computer Science, University of Toronto, Technical Report UTML-TR-11-2, 2011.

T. Munisami, M. Ramsurn, S. Kishnah, and S. Pudaruth, “Plant Leaf Recognition Using Shape Features and Colour Histogram with K-nearest Neighbour Classifiers,” Procedia Comput. Sci., vol. 58, pp. 740–747, 2015.

A. Ambarwari, Y. Herdiyeni, and I. Hermadi, “Biometric Analysis of Leaf Venation Density Based on Digital Image,” TELKOMNIKA (Telecommunication Comput. Electron. Control., vol. 16, no. 4, p. 1735, 2018.

Z. Wang, X. Sun, Y. Zhang, Z. Ying, and Y. Ma, “Leaf recognition based on PCNN,” Neural Comput. Appl., vol. 27, no. 4, pp. 899–908, 2016.

F. Pedregosa et al., “Scikit-learn: Machine Learning in Python Fabian,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.

Published
2020-02-09
How to Cite
Ambarwari, A., Jafar Adrian, Q., & Herdiyeni, Y. (2020). Analysis of the Effect of Data Scaling on the Performance of the Machine Learning Algorithm for Plant Identification. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 4(1), 117 - 122. https://doi.org/10.29207/resti.v4i1.1517
Section
Information Technology Articles