Journal of information and communication convergence engineering 2018; 16(3): 173-178
Published online September 30, 2018
https://doi.org/10.6109/jicce.2018.16.3.173
© Korea Institute of Information and Communication Engineering
Accurate classification of cloud images is a challenging task. Almost all the existing methods rely on hand-crafted feature extraction. Their limitation is low discriminative power. In the recent years, deep learning with convolution neural networks (CNNs), which can auto extract features, has achieved promising results in many computer vision and image understanding fields. However, deep learning approaches usually need large datasets. This paper proposes a deep learning approach for classification of cloud image patches on small datasets. First, we design a suitable deep learning model for small datasets using a CNN, and then we apply data augmentation and dropout regularization techniques to increase the generalization of the model. The experiments for the proposed approach were performed on SWIMCAT small dataset with k-fold cross-validation. The experimental results demonstrated perfect classification accuracy for most classes on every fold, and confirmed both the high accuracy and the robustness of the proposed model.
Keywords Cloud classification, CNN, Data augmentation, SWIMCAT dataset
Research on the cloud and its characteristics plays a very important role for many applications: e.g., climate modeling, weather prediction, meteorology study, solar energy production and satellite communication [1-6]. Cloud classification is an essential role in cloud observation. However, at present, existing cloud classification is done by professionally trained observers. This method is highly time consuming, and depends on the experience of observers; moreover, there are some problems that cannot be well handled by the human observers [7]. Therefore, automatic classification of the cloud is a much-needed task.
Much research on classification of cloud images has been conducted. Buch and Sun [8] applied binary decision trees to classify pixels in the whole sky imager (WSI) images into five cloud types. Singh and Glennen [9] proposed five different feature extraction methods (autocorrection, co-occurrence matrices, edge frequency, Law’s features and primitive length) and used the k-nearest neighbor and neural network for cloud classification. Calbo and Sabburg [10] applied a Fourier transform for cloud-type recognition. Heinle et al. [11] predefined several statistical features to describe color and texture and used a k-nearest neighbor classifier. Liu et al. [12] used an illumination-invariant completed local ternary pattern descriptor for cloud classification. Liu et al. [13] extracted some cloud structure features and used a simple classifier called the rectangle method for classification of infrared cloud images. Liu et al. [14] proposed a salient local binary pattern for cloud classification. Liu et al. [15] used a weighted local binary descriptor. Dev et al. [16] proposed a modified texton-based approach to categorize cloud image patches. Luo et al. [17] combined manifold features and text features and then used a support vector machine (SVM) to classify cloud images. Gan et al. [18] proposed cloud type classification using duplex norm-bounded sparse coding. Nevertheless, most of these approaches are based on hand-crafted features so each method needs to find its empirical parameters.
Recently, development of deep learning is increasing rapidly; in particular, convolutional neural networks (CNNs) have shown outstanding performance in image classification [19]. CNNs are able to “learn” features from the image data, so there is no need any feature extraction method. Some recent studies used CNNs for cloud classification. Shi et al. [20] proposed a CNN model to extract features and used SVM to classify cloud images. Ye et al. [21, 22] improved feature extraction by using both CNN and Fisher vector and used SVM to classify cloud images. Zhang et al. [23] proposed transferring deep visual information. Generally, these deep learning approaches achieved promising results; however, all the above deep learning approaches only utilized the CNN to extract features. They needed other methods for classification, and, they used pre-trained CNN models to extract features. However, for small datasets, the lack of sufficient image samples makes it difficult to converge in the end-to-end learning manner.
This paper proposes a deep learning approach for classification of cloud images on small datasets. First, we design a deep learning model that includes both the feature extraction part and classification part using CNN, and then we apply two regularization techniques, data augmentation and dropout, to generalize our model. Our experiment is performed on the SWIMCAT dataset.
The Singapore Whole-sky IMaging CATegories (SWIMCAT) dataset was introduced by Dev et al. [16] and the images were captured during 17 months from January 2013 to May 2014 in Singapore using the Wide Angle High-Resolution Sky Imaging System (WAHRSIS), a calibrated groundbased whole-sky imager [24]. The dataset has five distinct categories. The five categories (clear sky, patterned clouds, thick dark clouds, thick white clouds, and veil clouds) are defined on the basic of visual characteristics of sky/cloud conditions and consultation with experts from the Singapore Meteorological Services.
The SWIMCAT dataset has 784 images of sky/cloud patches: clear sky, 224 images; patterned cloud, 89 images; thick dark cloud, 251 images; thick white cloud, 135 images; and veil cloud, 85 images. The dimensions of all the images are 125 × 125 pixels. Some random images of each class from the SWIMCAT dataset are shown in columns in Fig. 1.
CNNs were inspired by the human visual system [25, 26]. They are the state-of-the-art approaches not only for pattern recognition tasks but also for object detection tasks, especially with the development of computing capacity. Krizhevsky et al. [19] won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012 competition [27, 28] with brilliant deep CNNs that show the great power of deep CNNs.
Unlike many other pattern recognition algorithms, CNNs combine both feature extraction and classification. A schematic representation of a basic CNN (inspired by [29]) is shown in Fig. 2. The given network consists of five different layers: input, convolution, pooling, fully-connected, and output. The input layer specifies a fixed size for the input images, i.e., images may have to be resized accordingly. The image is then convolved with multiple learned kernels using shared weights. Next, the pooling layers reduce the size of the image while trying to maintain the contained information. These two layers comprise the feature extraction part. Afterwards, the extracted features are weighted and combined in the fully-connected layers. This represents the classification part of the CNN. Finally, there is one output neuron for each object category in the output layer.
As we mentioned above, the architecture of CNNs follow this pattern:
where IN is the input layer, CONV is the convolution layer, POOL is the pooling layer, FC is the fully connected layer, and OUT is the output layer. However, for “deep learning” these layers are stacked together in a particular pattern that yields a CNN model. Basically, the most common form of CNN architecture is that many layers of CONV and POOL are repeated until the volume, width and height is small, at which point we apply some FC layers. The most common CNN architectures have the following pattern [30]:
where “*” indicates repetition and “?” indicates an optional pooling layer. For simplicity, we do not mention activation, but by default, the activation always follows CONV layers and FC layers. Theoretically, the larger number of convolutional layers (
We selected
In this research, we used the SWIMCAT dataset which consists of only 784 image patches, which is a very small number for deep learning and makes it very easy to get overfitting. To overcome this problem we used two regularization methods. The first augments data passed into the network for training, and the second modifies the network architecture. They are data augmentation and dropout, respectively.
Data augmentation is a method that is used to generate new training samples from the original data samples by augmenting the samples via a number of random transformations such that the class labels are not changed. The purpose of data augmentation is to increase the generalizability of the model. With that, the robustness is improved and overfitting is prevented.
In this study we did the data augmentation by applying random geometric transforms. The detailed configuration parameters of each augmentation method are shown in Table 1. We applied random rotation with range of 40°. We also applied random translation both vertically and horizontally with a range of 20%. Random shear transformation and random zoom are applied with the range of 20%. Finally, a horizontal flip and vertical flip were also performed randomly. By so doing, during the training time, our model will never see the exact same image twice. We only applied data augmentation at training time and did not apply data augmentation at the testing time and evaluation time of our trained networks.
No. | Parameter | Augmentation |
---|---|---|
1 | Rotation (°) | 40 |
2 | Width shift (%) | 20 |
3 | Height shift (%) | 20 |
4 | Shear (%) | 20 |
5 | Zoom (%) | 20 |
6 | Horizontal flip | Yes |
7 | Vertical flip | Yes |
As mentioned above, by applying data augmentation, the network will never see the same input twice, but the inputs it sees are still heavily intercorrelated because they come from a small number of original images. The process cannot produce new information; it can only remix existing information so only data augmentation is not enough to completely get rid of overfitting. We applied one more regularization technique, dropout.
Dropout was proposed by Srivastava et al. [31]; it randomly drops units with probability p from the neural network during training. Fig. 3 visualizes the dropout concept with dropout probability
In this study, we used dropout for both the extraction part and the classification part. In the extraction part, we applied dropout between the second and third convolution groups with probability
The detailed architecture of our implemented CNN model in which we included dropout layers and activation layers is shown in Table 2. We used 32 filters for the first and second convolutional layers and 16 filters for the third convolutional layer. Except that the activation of the output layer is softmax, all the other activations are ReLU. We apply 3 × 3 kernel sizes for CONV layers and 2 × 2 window sizes for the POOL layers.
No. | Layer | Output size | Filter/stride size | Dropout |
---|---|---|---|---|
1 | Input | 125×125×3 | - | - |
2 | Convolution | 123×123×32 | 3×3 | - |
3 | ReLU | 123×123×32 | - | - |
4 | Max Pooling | 61×61×32 | 2×2 | - |
5 | Convolution | 59×59×32 | 3×3 | - |
6 | ReLU | 59×59×32 | - | - |
7 | Max Pooling | 29×29×32 | 2×2 | - |
8 | Dropout | 29×29×32 | - | 0.25 |
9 | Convolution | 27×27×16 | 3×3 | - |
10 | ReLU | 27×27×16 | - | - |
11 | Max pooling | 13×13×16 | 2×2 | - |
12 | Flatten | 1×1×2704 | - | - |
13 | Dropout | 1×1×2704 | - | 0.5 |
14 | Fully connected | 1×1×32 | - | - |
15 | ReLU | 1×1×32 | - | - |
16 | Fully connected | 1×1×5 | - | - |
17 | Softmax | 1×1×5 | - | - |
The gold standard for machine learning model evaluation is
The hardware we used to implement the proposed network was a single PC Intel Core i5-7500 with the graphics processor NVIDIA GeForce GTX 1060 with 4 GB memory equipped. The software was developed using Python programming language based on Keras deep learning library [32] with back-end as TensorFlow [33].
We experimented on our approach as described in Section III on SWIMCAT database. We used batch size of 32 images, and RMSProp optimizer. The experimental result of five test sets after running 1,000 epochs for each fold, are shown in Table 3. We obtained an average accuracy of 0.986 with minimum and maximum accuracies at 0.975 and 0.994, respectively.
No. | Fold | Accuracy (%) |
---|---|---|
1 | Fold 1 | 98.73 |
2 | Fold 2 | 99.36 |
3 | Fold 3 | 99.36 |
4 | Fold 4 | 97.45 |
5 | Fold 5 | 98.06 |
6 | Average | 98.59 |
The confusion matrixes are shown in Fig. 5. In each confusion matrix, each column of the matrix represents the instances in a predicted class, and each row represents the instances in the actual class. Our proposed approach achieves perfect classification accuracy for most classes. The sky, patterned clouds and thick dark cloud classes always achieve 100% accuracy for all 5 folds.
This paper presented a deep learning solution for classification of cloud image patches on small datasets. For this research we used the dataset SWIMCAT, which consist of only 784 images. That is very small for deep learning applications, and it is very easy to get overfitting. To solve this problem, we designed a CNN model with enough convolutional layers for small datasets, and we applied two regularization methods, data augmentation and dropout to thoroughly treat the overfitting problem. In addition, the gold standard for the machine learning model, the
received his B.S. degree in Automatic Control from Hanoi University of Science and Technology, Hanoi, Vietnam in 2005. In 2009, he received his M.S. degree in Automation & Control in Graduate Institute of Automation and Control from National Taiwan University of Science and Technology (TaiwanTech). He is currently conducting research in the area of deep learning for computer vision in the Artificial Intelligence and Computer Vision Lab in the Graduate School of Information and Communications, Hanbat National University. He is interested in deep learning, computer vision, and pattern recognition.
is a Professor of Department of Computer Engineering at College of Information Technology, Hanbat National University, Daejeon, Korea, since 1989. He received the Ph.D. degree in electronics engineering from Chungnam National University in 1989. He was a postdoctoral fellow in Graduate School of Image Science and Technology of Tokyo Institute of Technology in Japan from 1994 to 1995, and a visiting professor in Oregon Graduate Institute of Science and Technology in America from 1998 to 1999. His research interests are in image processing, pattern recognition, computer vision and artificial intelligence.
Journal of information and communication convergence engineering 2018; 16(3): 173-178
Published online September 30, 2018 https://doi.org/10.6109/jicce.2018.16.3.173
Copyright © Korea Institute of Information and Communication Engineering.
Van Hiep Phung,Eun Joo Rhee
Hanbat National University
Accurate classification of cloud images is a challenging task. Almost all the existing methods rely on hand-crafted feature extraction. Their limitation is low discriminative power. In the recent years, deep learning with convolution neural networks (CNNs), which can auto extract features, has achieved promising results in many computer vision and image understanding fields. However, deep learning approaches usually need large datasets. This paper proposes a deep learning approach for classification of cloud image patches on small datasets. First, we design a suitable deep learning model for small datasets using a CNN, and then we apply data augmentation and dropout regularization techniques to increase the generalization of the model. The experiments for the proposed approach were performed on SWIMCAT small dataset with k-fold cross-validation. The experimental results demonstrated perfect classification accuracy for most classes on every fold, and confirmed both the high accuracy and the robustness of the proposed model.
Keywords: Cloud classification, CNN, Data augmentation, SWIMCAT dataset
Research on the cloud and its characteristics plays a very important role for many applications: e.g., climate modeling, weather prediction, meteorology study, solar energy production and satellite communication [1-6]. Cloud classification is an essential role in cloud observation. However, at present, existing cloud classification is done by professionally trained observers. This method is highly time consuming, and depends on the experience of observers; moreover, there are some problems that cannot be well handled by the human observers [7]. Therefore, automatic classification of the cloud is a much-needed task.
Much research on classification of cloud images has been conducted. Buch and Sun [8] applied binary decision trees to classify pixels in the whole sky imager (WSI) images into five cloud types. Singh and Glennen [9] proposed five different feature extraction methods (autocorrection, co-occurrence matrices, edge frequency, Law’s features and primitive length) and used the k-nearest neighbor and neural network for cloud classification. Calbo and Sabburg [10] applied a Fourier transform for cloud-type recognition. Heinle et al. [11] predefined several statistical features to describe color and texture and used a k-nearest neighbor classifier. Liu et al. [12] used an illumination-invariant completed local ternary pattern descriptor for cloud classification. Liu et al. [13] extracted some cloud structure features and used a simple classifier called the rectangle method for classification of infrared cloud images. Liu et al. [14] proposed a salient local binary pattern for cloud classification. Liu et al. [15] used a weighted local binary descriptor. Dev et al. [16] proposed a modified texton-based approach to categorize cloud image patches. Luo et al. [17] combined manifold features and text features and then used a support vector machine (SVM) to classify cloud images. Gan et al. [18] proposed cloud type classification using duplex norm-bounded sparse coding. Nevertheless, most of these approaches are based on hand-crafted features so each method needs to find its empirical parameters.
Recently, development of deep learning is increasing rapidly; in particular, convolutional neural networks (CNNs) have shown outstanding performance in image classification [19]. CNNs are able to “learn” features from the image data, so there is no need any feature extraction method. Some recent studies used CNNs for cloud classification. Shi et al. [20] proposed a CNN model to extract features and used SVM to classify cloud images. Ye et al. [21, 22] improved feature extraction by using both CNN and Fisher vector and used SVM to classify cloud images. Zhang et al. [23] proposed transferring deep visual information. Generally, these deep learning approaches achieved promising results; however, all the above deep learning approaches only utilized the CNN to extract features. They needed other methods for classification, and, they used pre-trained CNN models to extract features. However, for small datasets, the lack of sufficient image samples makes it difficult to converge in the end-to-end learning manner.
This paper proposes a deep learning approach for classification of cloud images on small datasets. First, we design a deep learning model that includes both the feature extraction part and classification part using CNN, and then we apply two regularization techniques, data augmentation and dropout, to generalize our model. Our experiment is performed on the SWIMCAT dataset.
The Singapore Whole-sky IMaging CATegories (SWIMCAT) dataset was introduced by Dev et al. [16] and the images were captured during 17 months from January 2013 to May 2014 in Singapore using the Wide Angle High-Resolution Sky Imaging System (WAHRSIS), a calibrated groundbased whole-sky imager [24]. The dataset has five distinct categories. The five categories (clear sky, patterned clouds, thick dark clouds, thick white clouds, and veil clouds) are defined on the basic of visual characteristics of sky/cloud conditions and consultation with experts from the Singapore Meteorological Services.
The SWIMCAT dataset has 784 images of sky/cloud patches: clear sky, 224 images; patterned cloud, 89 images; thick dark cloud, 251 images; thick white cloud, 135 images; and veil cloud, 85 images. The dimensions of all the images are 125 × 125 pixels. Some random images of each class from the SWIMCAT dataset are shown in columns in Fig. 1.
CNNs were inspired by the human visual system [25, 26]. They are the state-of-the-art approaches not only for pattern recognition tasks but also for object detection tasks, especially with the development of computing capacity. Krizhevsky et al. [19] won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012 competition [27, 28] with brilliant deep CNNs that show the great power of deep CNNs.
Unlike many other pattern recognition algorithms, CNNs combine both feature extraction and classification. A schematic representation of a basic CNN (inspired by [29]) is shown in Fig. 2. The given network consists of five different layers: input, convolution, pooling, fully-connected, and output. The input layer specifies a fixed size for the input images, i.e., images may have to be resized accordingly. The image is then convolved with multiple learned kernels using shared weights. Next, the pooling layers reduce the size of the image while trying to maintain the contained information. These two layers comprise the feature extraction part. Afterwards, the extracted features are weighted and combined in the fully-connected layers. This represents the classification part of the CNN. Finally, there is one output neuron for each object category in the output layer.
As we mentioned above, the architecture of CNNs follow this pattern:
where IN is the input layer, CONV is the convolution layer, POOL is the pooling layer, FC is the fully connected layer, and OUT is the output layer. However, for “deep learning” these layers are stacked together in a particular pattern that yields a CNN model. Basically, the most common form of CNN architecture is that many layers of CONV and POOL are repeated until the volume, width and height is small, at which point we apply some FC layers. The most common CNN architectures have the following pattern [30]:
where “*” indicates repetition and “?” indicates an optional pooling layer. For simplicity, we do not mention activation, but by default, the activation always follows CONV layers and FC layers. Theoretically, the larger number of convolutional layers (
We selected
In this research, we used the SWIMCAT dataset which consists of only 784 image patches, which is a very small number for deep learning and makes it very easy to get overfitting. To overcome this problem we used two regularization methods. The first augments data passed into the network for training, and the second modifies the network architecture. They are data augmentation and dropout, respectively.
Data augmentation is a method that is used to generate new training samples from the original data samples by augmenting the samples via a number of random transformations such that the class labels are not changed. The purpose of data augmentation is to increase the generalizability of the model. With that, the robustness is improved and overfitting is prevented.
In this study we did the data augmentation by applying random geometric transforms. The detailed configuration parameters of each augmentation method are shown in Table 1. We applied random rotation with range of 40°. We also applied random translation both vertically and horizontally with a range of 20%. Random shear transformation and random zoom are applied with the range of 20%. Finally, a horizontal flip and vertical flip were also performed randomly. By so doing, during the training time, our model will never see the exact same image twice. We only applied data augmentation at training time and did not apply data augmentation at the testing time and evaluation time of our trained networks.
No. | Parameter | Augmentation |
---|---|---|
1 | Rotation (°) | 40 |
2 | Width shift (%) | 20 |
3 | Height shift (%) | 20 |
4 | Shear (%) | 20 |
5 | Zoom (%) | 20 |
6 | Horizontal flip | Yes |
7 | Vertical flip | Yes |
As mentioned above, by applying data augmentation, the network will never see the same input twice, but the inputs it sees are still heavily intercorrelated because they come from a small number of original images. The process cannot produce new information; it can only remix existing information so only data augmentation is not enough to completely get rid of overfitting. We applied one more regularization technique, dropout.
Dropout was proposed by Srivastava et al. [31]; it randomly drops units with probability p from the neural network during training. Fig. 3 visualizes the dropout concept with dropout probability
In this study, we used dropout for both the extraction part and the classification part. In the extraction part, we applied dropout between the second and third convolution groups with probability
The detailed architecture of our implemented CNN model in which we included dropout layers and activation layers is shown in Table 2. We used 32 filters for the first and second convolutional layers and 16 filters for the third convolutional layer. Except that the activation of the output layer is softmax, all the other activations are ReLU. We apply 3 × 3 kernel sizes for CONV layers and 2 × 2 window sizes for the POOL layers.
No. | Layer | Output size | Filter/stride size | Dropout |
---|---|---|---|---|
1 | Input | 125×125×3 | - | - |
2 | Convolution | 123×123×32 | 3×3 | - |
3 | ReLU | 123×123×32 | - | - |
4 | Max Pooling | 61×61×32 | 2×2 | - |
5 | Convolution | 59×59×32 | 3×3 | - |
6 | ReLU | 59×59×32 | - | - |
7 | Max Pooling | 29×29×32 | 2×2 | - |
8 | Dropout | 29×29×32 | - | 0.25 |
9 | Convolution | 27×27×16 | 3×3 | - |
10 | ReLU | 27×27×16 | - | - |
11 | Max pooling | 13×13×16 | 2×2 | - |
12 | Flatten | 1×1×2704 | - | - |
13 | Dropout | 1×1×2704 | - | 0.5 |
14 | Fully connected | 1×1×32 | - | - |
15 | ReLU | 1×1×32 | - | - |
16 | Fully connected | 1×1×5 | - | - |
17 | Softmax | 1×1×5 | - | - |
The gold standard for machine learning model evaluation is
The hardware we used to implement the proposed network was a single PC Intel Core i5-7500 with the graphics processor NVIDIA GeForce GTX 1060 with 4 GB memory equipped. The software was developed using Python programming language based on Keras deep learning library [32] with back-end as TensorFlow [33].
We experimented on our approach as described in Section III on SWIMCAT database. We used batch size of 32 images, and RMSProp optimizer. The experimental result of five test sets after running 1,000 epochs for each fold, are shown in Table 3. We obtained an average accuracy of 0.986 with minimum and maximum accuracies at 0.975 and 0.994, respectively.
No. | Fold | Accuracy (%) |
---|---|---|
1 | Fold 1 | 98.73 |
2 | Fold 2 | 99.36 |
3 | Fold 3 | 99.36 |
4 | Fold 4 | 97.45 |
5 | Fold 5 | 98.06 |
6 | Average | 98.59 |
The confusion matrixes are shown in Fig. 5. In each confusion matrix, each column of the matrix represents the instances in a predicted class, and each row represents the instances in the actual class. Our proposed approach achieves perfect classification accuracy for most classes. The sky, patterned clouds and thick dark cloud classes always achieve 100% accuracy for all 5 folds.
This paper presented a deep learning solution for classification of cloud image patches on small datasets. For this research we used the dataset SWIMCAT, which consist of only 784 images. That is very small for deep learning applications, and it is very easy to get overfitting. To solve this problem, we designed a CNN model with enough convolutional layers for small datasets, and we applied two regularization methods, data augmentation and dropout to thoroughly treat the overfitting problem. In addition, the gold standard for the machine learning model, the
No. | Parameter | Augmentation |
---|---|---|
1 | Rotation (°) | 40 |
2 | Width shift (%) | 20 |
3 | Height shift (%) | 20 |
4 | Shear (%) | 20 |
5 | Zoom (%) | 20 |
6 | Horizontal flip | Yes |
7 | Vertical flip | Yes |
No. | Layer | Output size | Filter/stride size | Dropout |
---|---|---|---|---|
1 | Input | 125×125×3 | - | - |
2 | Convolution | 123×123×32 | 3×3 | - |
3 | ReLU | 123×123×32 | - | - |
4 | Max Pooling | 61×61×32 | 2×2 | - |
5 | Convolution | 59×59×32 | 3×3 | - |
6 | ReLU | 59×59×32 | - | - |
7 | Max Pooling | 29×29×32 | 2×2 | - |
8 | Dropout | 29×29×32 | - | 0.25 |
9 | Convolution | 27×27×16 | 3×3 | - |
10 | ReLU | 27×27×16 | - | - |
11 | Max pooling | 13×13×16 | 2×2 | - |
12 | Flatten | 1×1×2704 | - | - |
13 | Dropout | 1×1×2704 | - | 0.5 |
14 | Fully connected | 1×1×32 | - | - |
15 | ReLU | 1×1×32 | - | - |
16 | Fully connected | 1×1×5 | - | - |
17 | Softmax | 1×1×5 | - | - |
No. | Fold | Accuracy (%) |
---|---|---|
1 | Fold 1 | 98.73 |
2 | Fold 2 | 99.36 |
3 | Fold 3 | 99.36 |
4 | Fold 4 | 97.45 |
5 | Fold 5 | 98.06 |
6 | Average | 98.59 |