Classification of Fruits Based on Shape and Color using Combined Nearest Mean Classifiers

Fruit classification is an important task in many agriculture industry. The fruit classification system can be used to identify the types and prices of fruit. Manual classification of fruit is not efficient for large amount of fruits. The advancement of information technology has made possible fruit classification be done by a machine. This research aims to propose a fruit classification methodology based on shape and color. To reduce the effect of lighting variability a color normalization is carried out prior to feature extraction. The color features used in this research are mean and standard deviation. The shape features are area, perimeter


Introduction
Fruit classification which is done manually is inefficient and inaccurate for large amount of fruits. The information technology advancement had made possible the fruit classification be done by a machine or computer. Fruit classification is meaningful to various fields i.e. industry, plantation, farming, trading and so forth [1].
Combining multiple classifier or ensemble method is considered as a general solution for pattern classification tasks. Multiple classifier combination goals to achieve the final decision by integrating the predictions of several individual classifiers to obtain comprehensive results. Experimental studies have shown that the combination of several classifiers has been very helpful in improving the classification accuracy [2], [3] There are several ensemble and deep learning methods for fruit classification have been proposed [4]- [7]. However, deep learning in machine learning uses large amounts of data. Many deep learning algorithms use multiple layers of neural networks making computations more complex. In this paper, a new simple method of combining multiple classifier namely combined nearest mean classifiers for fruit classification is proposed. In this method, only a few fruit samples are needed in the training process.
This research treated the fruit classification based on shape and color similarity. The classification is implemented on per fruit item, not the group of fruit. The different feature used is the color feature (mean and standard deviation of color) and shape feature (area, perimeter, and compactness). This research result will be useful in accelerating the sorting and grading process for fruit variants and make easy the fruit trading cost decision based on shape (e.g. big or small) and color (e.g. red or green).

Research Methods
The proposed multiple nearest mean classifier (NMC) for fruit classification consists of three phases namely image preprocessing, feature extraction and classifier combination as depicted in Figure 1. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. The image of the fruit will be preprocessed to obtain the feature of the fruit. Operations such as background subtraction and the normalization of color will be performed on the image of the fruit. Background subtraction is performed to separate the image of the fruit from its background. Color normalization operation is then performed to eliminate the influence of different lighting.
Features in the image of the fruit are extracted and placed in feature vectors. The color features are measured by mean and standard deviation on each red, green, and blue (RGB) channel. The shape features are measured by area, perimeter and compactness. The area of fruit reflects the actual fruit size or weight. The perimeter of fruit is defined as the area that covers the boundary. The compactness of fruit is defined as the ratio of the area of a fruit to the area of a circle with the same perimeter. The multiple NMC combination consists of three nearest mean classifiers [8], [9]. The input features to the first and second classifiers are the color mean and color standard deviation respectively while the input features to the third classifier are the area, perimeter, and compactness. Output from each classifier is the similarity value between the unknown object and samples (or training patterns). The similarity value is obtained by calculating the euclidean distance between the feature vector of the unknown object and the feature vector of the sample class mean.
A sample of 84 fruit images that correspond to 12 categories has been used to form the reference values for each category. The data were divided into a training set (43%) and a testing set (57%), with one to three training samples, were used. All images were 640 x 480 pixels with 24-bit true color, 256 levels of gray and, an RGB color model. The types of fruits that were used are limited to variants of apples, mangoes, oranges, pears, and durian.
According to Sen [10], [11] that the classification used as mentioned above is named supervised classification, since the class has been noticed and the data sample has been available. To develop the supervised classification, earlier a computer system must have the knowledge that can be developed by learning the sample and recording them in a database [12].
The fruit classification system follows the structure of an introduction design system proposed by Yan and Gao [13] that includes censor, processing feature extraction and classifier algorithm. The classification of fruits is done undirectly by capturing the fruit object's image using the censor. The object's image that is identical with its feature as well the reality is in the same class [14].
The censor used as the image capture in this system is a digital camera (or webcam), as shown in Figure 2. Another supporting tool is a tripod to help capture images at the same distance of 35 cm. The background color is made same for each image of the fruit object, which is black to avoid the shadow of the fruit object. The lighting intensity is fixed and not too bright to avoid reflection effects on glossy surfaces which can cause a loss of color information.
According to [15] [16] there are several mechanisms used in computer vision, one of them is the statistical design which uses two phases i.e. training phase and the testing phase's approach. Generally, the process in the classification of fruits consists of the main process i.e. class formation process (training phase) and fruit classification process (recognition phase).
The system can be arranged from sub-systems [17]. The classification of the fruit system is arranged into two sub-systems which are the class formation system (SPK), which undertakes the training process, and the class formation and fruit classification system (SKB), which undertakes the unknown classification of the fruit process into a certain class. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. According to Meshram, et. al. [18], the classification of the fruit architecture system can be seen in the following Figure 3. In the training phase, the image of the fruit sample is captured through a censor. The fruit sample used consists of several fruit samples for each category/sample afterward the processing is done.
The first process is background subtraction to separate the fruit from its background by implementing pixel subtraction operation. The result is the absolute subtraction value of the fruit image with its background [19]. The pixel subtraction operation result is achieved from equation (1).
Where P1 is the fruit image and P2 is the background image. The intensity of R, G, and B in the pixel image is the background subtraction result in which the value is lesser than the threshold value which is 75 that is considered as the background. This value is quite ideal based on several trials that had been done.
The second process is color normalization used to disappear the influence of the different lighting [20]. The color normalization used equations (2), (3), and (4) since they are appropriate to the color feature measured in each RGB channel. The used equation in doing the color normalization in each pixel p is: with each R (P), G (P), and B (P) as the color intensity in each component of R (red), G (green), and B (blue) in pixel p.
The color feature of the fruit image result's processing is extracted. The color feature can consist of statistical data based on the color histogram [21]. The equations (5), (6) and (7) are used to calculate the color mean, then equations (8), (9) and (10) are used to calculate the standard deviation of colors. If the fruit image is x and the pixel number is p, the mean color of the fruit image is: If the fruit image is x and the pixel number is P, the deviation standard of the fruit image color is: The fruit image shape x is represented by x = (a, p, c) T where a is the area, p is the perimeter, and c is compactness.
The shape and color extraction result of the fruit is kept in the image feature database, SPK creates class mean for each fruit. The class mean or centroid is measured by using the equation (13).
where j i x , is the j th sample feature vector from class i or create a view from the feature mean measurement in the image feature database that is classified based on the category or class label.

Recognition Phase
In the recognition phase, the fruits are classified and the images will be captured through the censor, implemented preprocessing and feature extraction as in the training phase.
The extraction result of fruit shape and color is used in the image feature query. The classification is done by measuring the shape and color similarity of the image query with the mean class as equation (13). The unknown fruit is stated as feature vector q that will be classified to class i if it is closer to vector mean class i than others.
The similarity is measured through the vector distance. Two closest vectors will possess similarities and a little bit of difference [14]. Generally, the NMC classifier used euclidean distance [22], even according to [23], NMC classifier is also well-known as Nearest Centroid Classifier [24]. Furthermore, the used distance metric in fruit classification is the L2 metric (euclidean metric). According to Malkov [25] that the euclidean of two vectors x and w is shown by the equation (14). The similarity measurement is done per groups' feature that can fasten and make simple the query process [26]. There are three features which are the color (mean of R, G, and B), colors' deviation standard (R, G, and B of colors's deviation standard) and, shape (area, perimeter, and compactness).
According to [27] [28], the euclidean distance of two image color vectors mean can be measured. If the mean of the color fruit image query is stated as vector q and the mean class vector as x, the euclidean distance of two vectors is is shown by the equation (15).  , ( afterward, if the fruit image query shape is stated as vector q and shape in mean class is vector x, the euclidean distance of those two vectors is is shown by the equation (17).
After the above similarity measurement on each feature group, the similarity measurement is done simultaneously to those three feature groups. The way is by adding up those three groups' distances. However, the distance scale in each feature is different, it is normalized by subtracting each distance of a certain feature with the maximum distance. The normalized distance of each group is around 0-1, so the distance similarity total is around 0-3. The equation used to measure the similarity distance is the equation (18).  The similarity distance measurement is done for all mean classes. The classification rule according to [14] is given to two classes w1 and w2. The object's vectors are written as {x1,...,xn}, if x1 is the w1 mean class, the new object Z is represented in the space as Zx..
-Classify z to w1 if and if only d 2 (z, The fruit is classified if the minimum value is   ) , ( x q d sim with  = 0.75, vise versa the fruit is rejected, if the threshold value is 0.75 (scale 0-3) is achieved empirically. The similarity percentage is realtive toward the distance is 100% if the distance is 0. to be classified, the minimum similarity percentage of fruit is 75% or the similarity distance is 0.75.

Results and Discussions
There are 12  The class formation system forms the mean class of each class categorical label as shown in Figure 4 which is the class formation system's user face that shows the mean class formation to three samples in each class. The fruit classification system's test is done by doing the image query. This image is called image testing used to test the system's success. The feature of the fruit classification system's user is viewed in Figure 5. In evaluating the performance of the proposed method, 48 fruit images from 12 classes were used as the testing images. The first test used the image of the same fruits, however, the positions are being changed. The second test used the image of different fruits that were never used in the training. The first test result is presented in Table 2 and the second test is in Table 3.
Table1 shows that the system is able in recognizing and classifying used for training although the position is different.

Conclusion
Classification of fruits using the proposed multiple nearest mean classifier technique has shown that the technique is capable in producing high accuracy with a small sample size. The sample number in each class influences the system's ability, the system becomes better with the sample advancement's number. Up to 3 samples in each class, the system had been able in doing the classification to 48 fruits with 100% in its successfulness level or having a good reputation. The image capture process needs to be taken into account, so the color and shape of the fruit can be represented well. This way can be applied by using supplemented light and solid-state image censor. In fact, the fruit surfaces are not always stainless, sometimes possessing stains and dust in which their colors are identical to the background thus the background subtraction result is not perfect. It needs an algorithm arrangement and image processing technique to figure out the weaknesses. It needs the fruit classification system's hardware so that the fruit sorting and grading process can be done by a machine or robotic system.