Journal of information and communication convergence engineering 2018; 16(3): 160-165
Published online September 30, 2018
https://doi.org/10.6109/jicce.2018.16.3.160
© Korea Institute of Information and Communication Engineering
This paper designed and developed the image processing system of integrating feature extraction and matching by using convolutional neural network (CNN), rather than relying on the simple method of processing feature extraction and matching separately in the image processing of conventional image recognition system. To implement it, the proposed system enables CNN to operate and analyze the performance of conventional image processing system. This system extracts the features of an image using CNN and then learns them by the neural network. The proposed system showed 84% accuracy of recognition. The proposed system is a model of recognizing learned images by deep learning. Therefore, it can run in batch and work easily under any platform (including embedded platform) that can read all kinds of files anytime. Also, it does not require the implementing of feature extraction algorithm and matching algorithm therefore it can save time and it is efficient. As a result, it can be widely used as an image recognition program.
Keywords Artificial intelligence, Convolutional neural network (CNN), Deep learning, Pattern recognition
The basic image classification system can classify object types of images by using image features. However, this system can classifies the image types simply by using the characteristics and obtain the result of the inaccurate classification. To solve this drawback, a classification system is designed using a machine learning algorithm [1]. The machine learning algorithms includes k-nearest neighbor, support vector machine (SVM) and artificial neural network [2].
Companies such as Google have been very interested such as deep learning of artificial intelligence and neural networks due to the use of deep neural network to image classification, efforts to efficiently classify images using the deep learning method have been actively pursued [3-5]. There are models such as convolutional neural network (CNN), restricted Boltzmann machine (RBM), and deep belief network (DBN) depending on the application of deep learning [6-12].
Conventional image recognition systems have disadvantages in that feature extraction and matching are handled independently of each other. In this paper, we propose an image processing system that combines feature extraction and matching using CNN technology.
In this paper, we research about deep image classification technology using CNN, and design and implement image processing system based on it. Deep image recognition has also been designed and developed to improve on the low accuracy situations of traditional image classification methods. In this paper, an image processing system is constructed by feature extraction and matched integration through the CNN technology. The algorithm of stochastic gradient descent method and normalization of weights are studied and implemented.
The composition of this paper is as follows: Section II describes the design of image processing system based on CNN which is the main focus of this paper. Sections III and IV describe the implementation and performance evaluation of the proposed system. Finally, in Section V, conclusion and future papers of the proposed system are described.
In this section, we have designed an image processing system based on a CNN. The system designed in this paper consists of four steps as shown in Fig. 1. In the first step, we design a system of CNN based learning models. In the second step, the detailed calculation process is designed. In the third step, the back propagation/gradient descent algorithm is designed. The final step is to design the calculation optimization algorithm for the system. In particular, the optimal parameters of the learning model are calculated through the second and third steps.
The model is designed to obtain the optimal effect by taking advantage of the existing CNN learning model. To construct this objective, the proposed model has been designed as follows. The convolutional layer can extract feature values for input data using a small size kernel. The activation layer performs nonlinear transformation calculations on input values using nonlinear functions. The pooling layer reduces the dimensionality of the data and can eliminate noise in the data. The inception layer can extract various feature values of data by using two or more convolutional layers in parallel. The skip connection layer is designed using inception layer and convolutional layer to overcome and suppress the feature loss problem of data according to the model dipping.
The proposed composite structure model has a group with an inception layer and a skip connection layer as shown in Fig. 2. The function of the inception layer of the model processes the input data using various size kernels at the same time, and outputs the result of parallel connection of various results. The function of the skip link layer can reduce the calculation amount and increase the calculation speed according to the model dip. This can be done by modifying the functions of the skip connection layer and the inception layer when modifying the structure of the proposed model according to the purpose of the learning.
The detailed calculation process includes the following parts. The computation of the convolutional layer is designed by using the filter operation method, and the computation of the pooling layer is designed by using the maximum pooling and the average polling method. The computation of the active layer is performed by using a rectified linear unit (ReLU) nonlinear function to perform the conversion operation on the output values of the convolution product layer.
In order to design the calculation of the inception layer, the following three steps are included. The first step is to output the processing result of the input data by using various size filters (two or more kinds) of the convolutional layer and pooling layer. In the second step, the result of the first step is subjected to the concatenation operation using the CONCAT function. In the third step, the result of the second step is subjected to the dimensional zoom transformation.
In Fig. 3, the skip connection involves a three step operation. In the first step, multidimensional data is output for the dimension of the input data using the convolutional/pulling layer and the inception layer. In the second step, the conversion operation is performed on the number of dimensions of the input data. The third step calculates the result of the addition using the Add function for the first output and the second output. The advantage of this structure is that it can extract, assemble, and optimally learn various features of data through three methods: dip, width, and skip.
In order to calculate the correction value of each layer parameter of model using the error between the predicted value and the target value in the back propagation calculation, the following steps are designed. In the first step, the partial differential value of the error between the predicted value and the target value is calculated using (1). In the second step, the partial differential value is calculated for the last hidden layer using (2). In the final step, the partial differential value is calculated using (3) for each hidden layer of the learning model. The gradient descent updates each parameter of the system using (4) and the value in the opposite direction.
loss value, output value, prediction value, truth value, weight value, hierarchy, learning rate, value of the reverse calculation.
The calculation optimization algorithm includes the following parts. The parameter initial value is designed based on the initial value algorithm (5) to suppress the parameter loss problem. The mini-batch normalization is designed based on the normalization algorithm (6) to suppress the overfitting problem. The optimization gradient descent is designed by using modified values and momentum of the learning rate based on the Adam algorithm (7) in order to improve the inefficient calculation problem of the gradient descent.
input value, dispersion value, mean value, step value, learning coefficient, small value, verification value.
This chapter deals with the implementation and test results of the image processing system based on the CNN.
The environment for implementing the system is python language, OPENCV, KERAS, CUDA, and NUMPY programs. It is implemented in platforms such as Windows 10 and Linux. The system performance verification is Windows 10 (Inter Pentium CUP 3.20 GHz, 4.00 GB; 2.3 GHz, 2 GB) operating system.
In Fig. 4, the system shows the image processing results for the image (248 × 300) using the convolution layer, the polling layer, and the active layer of various size filters.
Fig. 5 shows the processing results of the image according to the dipping of the model using the optimization calculation algorithm, and compares the processing results of Fig. 6 (not using the optimization calculation). When using the optimization calculation, fitting problems were suppressed.
Fig. 7 shows the learning and test implementation of the image processing system. The system calculates optimal parameters using 80% of the data (training data) and shows the objective accuracy of the system through 20% of the data (test data).
Table 1 shows the performance comparison between the proposed image processing system and the basic image recognition system. Test methodology is setting the convolution size and parameter amount of each model. And, we records accuracy and learning time by time.
System type | Detailed structure | Accuracy (%) |
---|---|---|
Proposed system | Skip connection, Inception | 84.78-92.41 |
Mostly existing systems | SVM | 36 |
In Fig. 8, the proposed system uses data to learning and test. Fig. 8 shows the proposed system accuracy according to epoch. Solid lines (the accuracy of the model’s test process) indicate better accuracy than the dotted line (the accuracy of the model’s learning process). The proposed system accuracy was 84.78%-92.41%.
According to the test results, the proposed system proved to be higher than the 36% accuracy of the basic SVM model system for the 10 images classification processing. The SVM is machine learning algorithm for pattern recognition and data analysis. And, the basic SVM model is general model of image processing systems.
The proposed system shows the test results on the PC and android platforms in Figs. 9 and 10.
In this paper, we study the deep learning technology and design and implement a convolution based image processing system. The proposed system has 84.78%-92.41% accuracy to 10 images classification processing and can be implemented in multiple platforms. In addition, overfitting and parameter loss of the basic system are suppressed by using rules and initialization methods.
The proposed system has the following advantages and functions. The accuracy of the proposed system (about 84%–92.41%) increases more than twice the accuracy of the basic system (about 36%) and can be set and supported on multiple-platforms. The proposed system suppresses the problems such as parameter/hardness loss, overfitting of the deep CNN through a specific algorithm (initial value setting, normalization) and pursues optimal accuracy.
Future research should improve the classification performance of the proposed system for more than 100 kinds of classification problems, and second, it should implement the function of searching the position of objects in images.
received the B.S. and M.S. degrees from the Department of Electronic Engineering of Hanbat National University, Korea, in 2006 and 2011, respectively and the Ph.D. degree in 2015 from the Department of Computer Engineering of Pai Chai University, Korea. Since 2006, he has worked in the Department of Music & Sound Technology at Korea University of Media Arts as a professor. His current research interests include multimedia sound, broadcast sound, sound reinforcement (SR), sound engineering, and database.
received the B.S. degree in 2001 and M.S. degree in 2011 from the Department of Information Technology of Chonbuk National University, Korea. Since 2001, he has worked for K-Water as a manager especially in Water Treatment Plants. He is currently in a Doctorate course in Department of Computer Engineering of Pai Chai University. His current research interests include 5G, cloud computing, machine learning, bigdata, and artificial intelligence.
received the M.S. degree in 1987 and Ph.D. degree in 1993 from the Department of Computer Engineering of Kwangwoon University, Korea. From 1994 to 2005, he worked for ETRI as a researcher. Since 1994, he has worked in the Department of Computer Engineering at Pai Chai University, where he now works as a professor. His current research interests include multimedia document architecture modeling, machine learning, IoT, bigdata, and artificial intelligence.
Journal of information and communication convergence engineering 2018; 16(3): 160-165
Published online September 30, 2018 https://doi.org/10.6109/jicce.2018.16.3.160
Copyright © Korea Institute of Information and Communication Engineering.
Hankil Kim,Jinyoung Kim,Hoekyung Jung
Korea University of Media Arts,Pai Chai University
This paper designed and developed the image processing system of integrating feature extraction and matching by using convolutional neural network (CNN), rather than relying on the simple method of processing feature extraction and matching separately in the image processing of conventional image recognition system. To implement it, the proposed system enables CNN to operate and analyze the performance of conventional image processing system. This system extracts the features of an image using CNN and then learns them by the neural network. The proposed system showed 84% accuracy of recognition. The proposed system is a model of recognizing learned images by deep learning. Therefore, it can run in batch and work easily under any platform (including embedded platform) that can read all kinds of files anytime. Also, it does not require the implementing of feature extraction algorithm and matching algorithm therefore it can save time and it is efficient. As a result, it can be widely used as an image recognition program.
Keywords: Artificial intelligence, Convolutional neural network (CNN), Deep learning, Pattern recognition
The basic image classification system can classify object types of images by using image features. However, this system can classifies the image types simply by using the characteristics and obtain the result of the inaccurate classification. To solve this drawback, a classification system is designed using a machine learning algorithm [1]. The machine learning algorithms includes k-nearest neighbor, support vector machine (SVM) and artificial neural network [2].
Companies such as Google have been very interested such as deep learning of artificial intelligence and neural networks due to the use of deep neural network to image classification, efforts to efficiently classify images using the deep learning method have been actively pursued [3-5]. There are models such as convolutional neural network (CNN), restricted Boltzmann machine (RBM), and deep belief network (DBN) depending on the application of deep learning [6-12].
Conventional image recognition systems have disadvantages in that feature extraction and matching are handled independently of each other. In this paper, we propose an image processing system that combines feature extraction and matching using CNN technology.
In this paper, we research about deep image classification technology using CNN, and design and implement image processing system based on it. Deep image recognition has also been designed and developed to improve on the low accuracy situations of traditional image classification methods. In this paper, an image processing system is constructed by feature extraction and matched integration through the CNN technology. The algorithm of stochastic gradient descent method and normalization of weights are studied and implemented.
The composition of this paper is as follows: Section II describes the design of image processing system based on CNN which is the main focus of this paper. Sections III and IV describe the implementation and performance evaluation of the proposed system. Finally, in Section V, conclusion and future papers of the proposed system are described.
In this section, we have designed an image processing system based on a CNN. The system designed in this paper consists of four steps as shown in Fig. 1. In the first step, we design a system of CNN based learning models. In the second step, the detailed calculation process is designed. In the third step, the back propagation/gradient descent algorithm is designed. The final step is to design the calculation optimization algorithm for the system. In particular, the optimal parameters of the learning model are calculated through the second and third steps.
The model is designed to obtain the optimal effect by taking advantage of the existing CNN learning model. To construct this objective, the proposed model has been designed as follows. The convolutional layer can extract feature values for input data using a small size kernel. The activation layer performs nonlinear transformation calculations on input values using nonlinear functions. The pooling layer reduces the dimensionality of the data and can eliminate noise in the data. The inception layer can extract various feature values of data by using two or more convolutional layers in parallel. The skip connection layer is designed using inception layer and convolutional layer to overcome and suppress the feature loss problem of data according to the model dipping.
The proposed composite structure model has a group with an inception layer and a skip connection layer as shown in Fig. 2. The function of the inception layer of the model processes the input data using various size kernels at the same time, and outputs the result of parallel connection of various results. The function of the skip link layer can reduce the calculation amount and increase the calculation speed according to the model dip. This can be done by modifying the functions of the skip connection layer and the inception layer when modifying the structure of the proposed model according to the purpose of the learning.
The detailed calculation process includes the following parts. The computation of the convolutional layer is designed by using the filter operation method, and the computation of the pooling layer is designed by using the maximum pooling and the average polling method. The computation of the active layer is performed by using a rectified linear unit (ReLU) nonlinear function to perform the conversion operation on the output values of the convolution product layer.
In order to design the calculation of the inception layer, the following three steps are included. The first step is to output the processing result of the input data by using various size filters (two or more kinds) of the convolutional layer and pooling layer. In the second step, the result of the first step is subjected to the concatenation operation using the CONCAT function. In the third step, the result of the second step is subjected to the dimensional zoom transformation.
In Fig. 3, the skip connection involves a three step operation. In the first step, multidimensional data is output for the dimension of the input data using the convolutional/pulling layer and the inception layer. In the second step, the conversion operation is performed on the number of dimensions of the input data. The third step calculates the result of the addition using the Add function for the first output and the second output. The advantage of this structure is that it can extract, assemble, and optimally learn various features of data through three methods: dip, width, and skip.
In order to calculate the correction value of each layer parameter of model using the error between the predicted value and the target value in the back propagation calculation, the following steps are designed. In the first step, the partial differential value of the error between the predicted value and the target value is calculated using (1). In the second step, the partial differential value is calculated for the last hidden layer using (2). In the final step, the partial differential value is calculated using (3) for each hidden layer of the learning model. The gradient descent updates each parameter of the system using (4) and the value in the opposite direction.
loss value, output value, prediction value, truth value, weight value, hierarchy, learning rate, value of the reverse calculation.
The calculation optimization algorithm includes the following parts. The parameter initial value is designed based on the initial value algorithm (5) to suppress the parameter loss problem. The mini-batch normalization is designed based on the normalization algorithm (6) to suppress the overfitting problem. The optimization gradient descent is designed by using modified values and momentum of the learning rate based on the Adam algorithm (7) in order to improve the inefficient calculation problem of the gradient descent.
input value, dispersion value, mean value, step value, learning coefficient, small value, verification value.
This chapter deals with the implementation and test results of the image processing system based on the CNN.
The environment for implementing the system is python language, OPENCV, KERAS, CUDA, and NUMPY programs. It is implemented in platforms such as Windows 10 and Linux. The system performance verification is Windows 10 (Inter Pentium CUP 3.20 GHz, 4.00 GB; 2.3 GHz, 2 GB) operating system.
In Fig. 4, the system shows the image processing results for the image (248 × 300) using the convolution layer, the polling layer, and the active layer of various size filters.
Fig. 5 shows the processing results of the image according to the dipping of the model using the optimization calculation algorithm, and compares the processing results of Fig. 6 (not using the optimization calculation). When using the optimization calculation, fitting problems were suppressed.
Fig. 7 shows the learning and test implementation of the image processing system. The system calculates optimal parameters using 80% of the data (training data) and shows the objective accuracy of the system through 20% of the data (test data).
Table 1 shows the performance comparison between the proposed image processing system and the basic image recognition system. Test methodology is setting the convolution size and parameter amount of each model. And, we records accuracy and learning time by time.
System type | Detailed structure | Accuracy (%) |
---|---|---|
Proposed system | Skip connection, Inception | 84.78-92.41 |
Mostly existing systems | SVM | 36 |
In Fig. 8, the proposed system uses data to learning and test. Fig. 8 shows the proposed system accuracy according to epoch. Solid lines (the accuracy of the model’s test process) indicate better accuracy than the dotted line (the accuracy of the model’s learning process). The proposed system accuracy was 84.78%-92.41%.
According to the test results, the proposed system proved to be higher than the 36% accuracy of the basic SVM model system for the 10 images classification processing. The SVM is machine learning algorithm for pattern recognition and data analysis. And, the basic SVM model is general model of image processing systems.
The proposed system shows the test results on the PC and android platforms in Figs. 9 and 10.
In this paper, we study the deep learning technology and design and implement a convolution based image processing system. The proposed system has 84.78%-92.41% accuracy to 10 images classification processing and can be implemented in multiple platforms. In addition, overfitting and parameter loss of the basic system are suppressed by using rules and initialization methods.
The proposed system has the following advantages and functions. The accuracy of the proposed system (about 84%–92.41%) increases more than twice the accuracy of the basic system (about 36%) and can be set and supported on multiple-platforms. The proposed system suppresses the problems such as parameter/hardness loss, overfitting of the deep CNN through a specific algorithm (initial value setting, normalization) and pursues optimal accuracy.
Future research should improve the classification performance of the proposed system for more than 100 kinds of classification problems, and second, it should implement the function of searching the position of objects in images.
System type | Detailed structure | Accuracy (%) |
---|---|---|
Proposed system | Skip connection, Inception | 84.78-92.41 |
Mostly existing systems | SVM | 36 |