Journal of information and communication convergence engineering 2023; 21(3): 198-207
Published online September 30, 2023
https://doi.org/10.56977/jicce.2023.21.3.198
© Korea Institute of Information and Communication Engineering
Correspondence to : Il-Gu Lee (E-mail: iglee@sungshin.ac.kr)
Department of Convergence Security Engineering, Sungshin Women’s University, 02844, Korea
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Responding to changes in artificial intelligence models and the data environment is crucial for increasing data-learning accuracy and inference stability of industrial applications. A learning model that is overfitted to specific training data leads to poor learning performance and a deterioration in flexibility. Therefore, an early stopping technique is used to stop learning at an appropriate time. However, this technique does not consider the homogeneity and independence of the data collected by heterogeneous nodes in a differential network environment, thus resulting in low learning accuracy and degradation of system performance. In this study, the generalization performance of neural networks is maximized, whereas the effect of the homogeneity of datasets is minimized by achieving an accuracy of 99.7%. This corresponds to a decrease in delay time by a factor of 2.33 and improvement in performance by a factor of 2.5 compared with the conventional method.
Keywords Deep reinforcement learning, Early stopping, Neural network, Overfitting
Neural networks can restore several damaged neurons or distorted data owing to their fault tolerance and parallelism model relationships between complex data, thus allowing the prompt learning of nonlinear relationships between largescale data. Owing to these advantages, neural networks have been used in various fields, such as text, voice, and image recognition, in addition to natural language processing.
During the training process of an artificial intelligence (AI) model, overfitting occurs when the computational volume increases or the dataset is excessively optimized [1]. When overfitting occurs, the model achieves high accuracy on the learning data; however, the accuracy on the new data is lower owing to the low amount of learning, which significantly influences the neural network and network performance. As a solution to this problem, early stopping is a method of storing models in optimal epochs by terminating learning when the validation loss does not decrease further after a particular epoch [2]. The data should be independent and homogeneous for the valid implementation of early stopping. However, in a heterogeneous distributed network environment where learning data are scarcer than the data required for local processing, the data collected from a node are not independently and identically distributed (IID); therefore, the accuracy is low [3].
Numerous studies have been conducted to optimize the generalization capability of neural networks. The index data division (IDD) algorithm [4,5], a representative method, is a method of division according to the index distance rule, wherein data are distributed. This method demonstrates high performance when the rules by which the data are split are known; however, in an environment where the quality of the learning data is non-homogeneous, the learning accuracy significantly degrades. The random data division (RDD) algorithm [6,7] is a prompt method for randomly segmenting data; however, it can degrade network performance. The block data division (BDD) algorithm [8,9] demonstrates superior performance to alternative methods because it randomly and evenly arranges data. However, it presents a significant difference in the learning precision depending on the data arrangement rules.
Recent studies have been conducted to optimize the generalized capabilities of neural networks using clustering techniques, including fuzzy c-means clustering (FMC) [11,12], center-based clustering [14], density-based clustering [15], and hierarchical clustering [16,17]. These methods improve the clustering precision of neural networks through silhouette analysis. However, these studies did not address the class imbalance problem because the accuracy varies depending on the homogeneity of the dataset. In addition, the original dataset can be freely modified, given that the data are balanced using an oversampling scheme.
The early stopping hyperparameters should be optimized by multiplying the weights with respect to the data-learning ratio to improve the accuracy of conventional FMC techniques. Therefore, this study proposes an early stopping algorithm for the application of an optimal patience hyperparameter. The main contributions of this study are as follows.
• The data homogeneity and learning accuracy were improved by applying an early stopping method based on an optimal patience hyperparameter.
• By applying FMC algorithms to classify the data, we mitigated the problems of network degradation and precision damage associated with the unrefined datasets of conventional methods.
• The proposed method achieved an accuracy of 99.7%, corresponding to a decrease in delay time by a factor of 2.33 and improvement in performance by a factor of 2.5 compared with the conventional method.
The remainder of this paper is organized as follows. Section II presents a comparison and analysis of related studies. Section III proposes a novel data measurement method that improves the conventional early stopping method. Section IV presents an evaluation of the latency, loss, and accuracy of the proposed method in comparison with the FMC and existing early stopping methods (IDD and RDD). Finally, Section V concludes the study.
This section presents an analysis of previous research and the limitations of the data segmentation and clustering methods used for neural network learning. The latency and accuracy of each algorithm were measured in the same environment at 3,000 iterations and 30 epochs, respectively.
During the early stopping algorithm training process, the data segmentation scheme determines the precision of the neural network. In this case, the data division refers to the operation of dividing data into training, validation, and test sets. Table 1 presents an analysis of the conventional early stopping algorithm data division method.
Table 1 . Previous research on the data segmentation method of the early stopping algorithm
Previous research | Feature | Limitation | Latency | Accuracy |
---|---|---|---|---|
IDD [4,5] | Splits data based on manually specified indices | • 262.74 s (low) of processing time • Difficult to quantify objective performance | 1.7 s | 94.6% |
RDD [6,7] | • Splits data based on automatically generated percentages • High performance for large datasets | Overfitting occurs with less than 10,000 datasets | 2.3 s | 87.08% |
BDD [8,9] | • Random data division algorithm and memory allocation in blocks • Reduced memory usage | 83.41% (low) accuracy for data arranged by specific rules | 2.64 s | Depends on the status of the data array |
The IDD algorithm improves segmentation performance by segmenting the data according to a manually specified index. Moreover, IDD demonstrates high performance when the data are segmented according to a particular segmentation rule; however, in most cases, quantifying the objective performance is challenging because the rules by which the datasets are sorted are not known. In addition, the speed is low because the user must manually specify an index.
The RDD algorithm divides data according to an automatically generated percentage. Although it is faster than IDD and BDD, its performance is high when the dataset is large because overfitting is highly likely to occur with less than 10,000 datasets.
The BDD algorithm divides data by allocating memory in blocks to an RDD algorithm, which can reduce memory usage compared with other algorithms. However, because only randomly specified training datasets are used for training, data arranged according to specific rules have a low accuracy of 83.41% [8,9].
As a result of experiments, the latency of IDD, RDD, and BDD were derived as 1.7, 2.3, and 2.64 s, respectively. The accuracies of IDD and RDD were 94.6 and 87.08%, respectively; however, for BDD, different values were derived depending on the state of arrangement of the data. Therefore, IDD and RDD were selected as control groups.
Clustering is a type of unsupervised learning method based on the grouping of data with similar characteristics and is classified as soft clustering [10], hard clustering [13], and hierarchical clustering [15,16] (i.e., the clustering of data based on the probability of belonging to a cluster, depending on the distance between clusters). Table 2 presents an analysis of the related clustering studies.
Table 2 . Related work on clustering
Previous research | Feature | Limitation | Latency | Accuracy | |
---|---|---|---|---|---|
Soft clustering (FMC) [10-12] | • Represents the degree of membership as a probability • Calculates weights by maximizing similarities within clusters and minimizing similarities between clusters | • Accuracy varies with respect to the homogeneity of the dataset • Variations in the original dataset | 2.04 s | Depends on data homogeneity | |
Hard clustering [13] | Center-based clustering [14] | Clusters points closest to the cluster center | • 88.7% (low) accuracy when the number of features increases • Increase in latency and sensitivity when the number of iterations increases | 1.71 s | 96.76% |
Density-based clustering [15] | Clusters by moving the center of a cluster to a dense data location | • Increase in latency when the number of features increases • Performance depends on the cluster radius variable | 1.56 s | 90.53% | |
Hierarchical clustering [16,17] | Clusters by calculating the distance between clusters and creating similar data pairs with short distances into one cluster | Increase in latency when the number of features increases | 1.65 s | 97.2% |
The FMC proposed by J.C. Dunn is a soft clustering method, which is the most high-level clustering method; however, its precision varies with the homogeneity of the dataset because it does not solve the class imbalance problem arising during training. Furthermore, variations may occur in the original dataset during the training process because the data are balanced using an oversampling method.
Hard clustering is classified as center-based clustering [14], which allocates data on a central basis, and densitybased clustering [15], which allocates data on a distance basis. Center-based clustering clusters the points closest to any cluster center. As the number of features increases, the precision of the method decreases. Furthermore, given that center-based clustering uses an iterative algorithm, with an increase in the number of iterations, the latency increases, in addition to the sensitivity of the outliers. Density-based clustering forms clusters by moving a cluster center to a dense data location. Herein, clustering can be achieved without setting the number of clusters; however, the performance is significantly dependent on the cluster radius variable, and the latency increases.
Hierarchical clustering assumes each data point to be a cluster, calculates the distance between clusters, and generates and clusters similar data pairs with short distances into one cluster. Hierarchical clustering is limited because the delay time increases with the number of features.
To overcome the limitations of the algorithms presented above, this study proposes a novel early stopping algorithm based on the FMC algorithm with optimal clustering performance.
This section describes the framework of the enhanced FMC (EFMC), which is an early stopping algorithm based on the FMC algorithm. Fig. 1 shows a structural diagram of the proposed algorithm. As shown in Fig. 1, the EFMC conducts training using the FMC for algorithms that perform data division using the optimal patience hyperparameter values.
The patience hyperparameter stops training when the extent of the increase in performance does not match the increase in the number of epochs in the early stopping mechanism and exhibits a significant influence on neural network training [18].
Average precision (AP) is expressed in (1) [19], where
The mean average decision (mAP) was derived by calculating the area below the AP graph in (1) and the average value for each class. As the initialization was performed randomly during training, each experiment was repeated five times to ensure accuracy. The optimal parameters were determined by comparing the mAP values when the patience hyperparameters were 2, 3, 4, and 5. Fig. 2 presents the test results of the average mAP values for the validation and test sets with respect to the patience hyperparameter values. The mAPs for the validation set with respect to patience hyperparameters 2-5 were 86.74, 89.07, 88.45, and 0, whereas those for the training set were 86.29, 88.32, 87.26, and 0, respectively. In an environment where the optimal hyperparameter varies depending on the dataset, the EMFC framework sets patience hyperparameter 3, which exhibits the optimal mAP value on the dataset used in the experiment, as the optimal value through the above process.
Conventional techniques are limited in that the learning performance is degraded when a non-homogeneous dataset is input because early stopping algorithms are not refined. To overcome this limitation, the data were sorted using FMC [20,21]. 7,860-character data and 6,000 32 × 32 image data were collected from the Modified National Institute of Standards and Technology (MNIST) and Canadian Institute for Advanced Research (CIFAR-10) datasets, respectively. Both datasets were divided into training, validation, and test sets in a ratio of 60:20:20 [22]. Table 3 lists the number of data samples used in the experiments.
Table 3 . Experimental dataset
Dataset category | MNIST dataset | CIFAR-10 dataset |
---|---|---|
Training set | 4,720 | 3,600 |
Validation set | 1,570 | 1,200 |
Test set | 1,570 | 1,200 |
Total | 7,860 | 6,000 |
Data were sorted with respect to the distance from the cluster center, and the data were distributed with respect to the dataset division ratio. The proposed algorithm can improve precision because it uses an optimal algorithm to divide the data from the optimal patience hyperparameter value, and the homogeneity of the dataset has a slight influence because it is independent of the data distribution rule.
The homogeneity of the dataset was calculated as the average distance between the pointer and the center of the cluster used by the FMC. With an increase in the average distance, the homogeneity increases, and when
The operation process of the FMC-based early stopping algorithm is shown in Fig. 3 [24,25], where
The calculation formula of
As shown in Fig. 3, EFMC operates in four stages. When the dataset is input, first, cluster initialization is performed, and
This section describes the evaluation environment for the proposed early stopping method and presents a comparison and analysis method for the speed and accuracy of the training and validation sets with respect to the IDD, RDD, and FMC.
The collected image data were preprocessed through onehot encoding and then augmented using the ImageDataGenerator class in the TensorFlow Keras library. Augmentation means increasing the amount of data by adding noise while maintaining the nature of the original data.
The collected data were extracted and combined with pixel x of the same size from a randomly selected
Fig. 4 shows a KS test cumulative probability distribution diagram for 7,860 nonuniform data samples extracted from 64,580 randomly selected MNIST data samples. In the graph, the x-axis represents the vector x, whereas the y-axis represents the cumulative probability. Evidently, the hypothesis (
TensorFlow Keras library was used to create convolutional neural networks (CNNs) and the patience hyperparameter, which was updated whenever learning was conducted using the perception class. The model was optimized using crossentropy and the Adam optimizer and was set to early stopping if the model remained unchanged more than three times.
Data clusters utilized the scikit-learn fuzzy-c-means package, and the accuracy was evaluated with respect to a logistic- regression class. Finally, it was visualized as a pyplot using the Matplotlib library from the Pandas package. The hardware specifications for the experiment are presented in Table 4.
Table 4 . Hardware specifications
Hardware | Specification |
---|---|
CPU | Intel(R) Core™ i7-1065G7 CPU @ 1.30 GHz |
GPU | Intel(R) Iris(R) Plus Graphics |
GPU Memory | 7.9 GB |
RAM | 16 GB |
SSD | 477 GB |
1) Latency
Fig. 5 presents a comparison graph of the delay times of the EFMC, IDD, RDD, and FMC with respect to the number of iterations and homogeneity of the MNIST character dataset. In the graph, the x-axis represents the homogeneity of the dataset, whereas the y-axis represents the iteration. Homogeneity is an index measured by the ratio of nonhomogeneous noise distributed in a dataset. The noise was measured by multiplying the maximum point of the loss value of the loss function by the epsilon parameter. The epsilon parameter is an index that denotes the degree of damage to the image data, and in the experiment, it was set in the range of 0-0.5. The number of epochs was set to 30, and the iterations and epochs were set as factors with the same meaning. The time required for one epoch was measured as 6 s.
Latency and iterations were inversely proportional, and the latencies of the four models were the same when the homogeneity was 0%. However, when the homogeneity was 10% or higher, the latency of the EFMC was optimal. The latency of EFMC in the homogeneity range of 0-100% and iteration range of 0-8,000 improved by an average factor of 2.05 with a homogeneity of 50% and a factor of up to 2.33 over 3,000 iterations.
Table 5 compares the latencies of the EFMC, IDD, RDD, and FMC with respect to the homogeneity of CIFAR-10. The number of iterations was set to 4,000, and the other conditions were the same as those shown in Fig. 4.
Table 5 . Latency of EFMC, IDD, RDD, and FMC with respect to homogeneity
Latency (s) | EFMC | IDD | RDD | FMC |
---|---|---|---|---|
Homogeneity (%) | ||||
0 | 8 | 8 | 8 | 8 |
10 | 4.7 | 6.13 | 6.88 | 6.35 |
20 | 4.27 | 5.97 | 6.3 | 6.21 |
30 | 3.92 | 5.4 | 6.02 | 5.98 |
40 | 2.43 | 3.06 | 3.47 | 3.19 |
50 | 1.71 | 2.7 | 2.98 | 2.84 |
60 | 1.29 | 2.13 | 2.74 | 2.3 |
70 | 1.32 | 2.17 | 2.8 | 2.32 |
80 | 1.3 | 2.21 | 2.83 | 2.3 |
90 | 1.33 | 2.22 | 2.88 | 2.35 |
100 | 1.33 | 2.22 | 2.87 | 2.34 |
As with the results on the MNIST character dataset, the homogeneity and latency were inversely proportional, and the latencies of the four models were the same when the homogeneity was 0%; however, EFMC had the best latency when the homogeneity was more than 10%. In the homogeneity range of 0-100%, the latency of EFMC improved by an average factor of 1.87 and factor of up to 2.12 with a homogeneity of 60%.
2) Accuracy
Fig. 6 presents a comparison graph of the loss values and accuracies of EFMC, IDD, RDD, and FMC on the MNIST character dataset. Fig. 6(a) presents the measurement results for the loss value with respect to the number of epochs, and Fig. 6(b) presents the evaluation results in terms of the accuracy with respect to the number of epochs. In this case, a complementary relationship between the loss value and the accuracy was established. As the number of epochs increased, the loss value decreased and accuracy improved.
With reference to the experimental results, the proposed EFMC loss value was lower than the others, and it was improved by an average factor of 2.2 compared with the other three models. At epoch 30, the loss improved by a factor of up to 2.71, essentially presenting the smallest deviation among the four models. In addition, EFMC demonstrated the highest accuracy and was improved by an average factor of 1.16. The accuracy improved by a factor of up to 1.24 at epoch 0, essentially presenting the smallest deviation among the four models.
Table 6 presents a comparison of the epoch accuracies of the CIFAR-10 image dataset. Similar to the results for the MNIST character dataset, the epoch and accuracy exhibited a proportional relationship, and EFMC achieved the best accuracy. At 0-30 epochs, EFMC improved by an average factor of 1.1 and achieved the highest accuracy of up to 98.3% at epoch 30.
Table 6 . Accuracy of EFMC, IDD, RDD, and FMC in terms of the number of epochs
Accuracy (%) | EFMC | IDD | RDD | FMC |
---|---|---|---|---|
Epoch | ||||
0 | 91.72 | 87 | 78.86 | 84.98 |
5 | 96.48 | 89.09 | 80.77 | 86.6 |
10 | 97.86 | 90.7 | 81.4 | 87.55 |
15 | 97.9 | 92.28 | 82 | 88.71 |
20 | 98.04 | 92.66 | 85.95 | 91.24 |
25 | 98.21 | 93.67 | 86.08 | 93.3 |
30 | 98.3 | 93.8 | 86.54 | 93.63 |
Fig. 7 presents a distribution diagram comparing the accuracies of EFMC, IDD, RDD, and FMC with respect to the homogeneity of the dataset. Fig. 7(a) presents the result for epoch 30, whereas Fig. 7(b) presents the result for epoch 150. In this experiment, the accuracy was measured using the coefficient of determination (
The
The EFMC algorithm applied using the proposed early stopping method exhibited optimal average clustering performance because it improved the delay time, loss value, and accuracy by factors of 2.33, 2.71, and 2.5, respectively, compared with conventional models.
This study proposed an early stopping algorithm based on the FMC algorithm. Herein, the performance of the proposed algorithm was analyzed. Conventional techniques such as IDD, BDD, and RDD exhibit low accuracy because they are trained with rough datasets, and the network performance is degraded. FMC, center-based clustering, density-based clustering, and hierarchical clustering vary in precision depending on the homogeneity of the dataset and are limited with respect to the transformation of the original dataset during training. Herein, the accuracy was improved by dividing the data by the optimal patience hyperparameter value derived from the neural networks. Based on this, the early stopping algorithm was improved such that the homogeneity of the dataset was unaffected by the FMC algorithm. The experimental results revealed that the proposed technique demonstrated improved clustering performance with respect to latency, loss value, and accuracy by factors of 2.33, 2.71, and 2.5, respectively, compared with the conventional method with fewer epochs.
The proposed EFMC heuristically sets the patience hyperparameter in advance during data preprocessing according to the dataset. Although accuracy and homogeneity are guaranteed, it has a structural limitation that slows down as the types of datasets become more diverse. Future research will focus on increasing efficiency by establishing an experimental environment considering various heterogeneous nodes and by automatically setting an optimal patience hyperparameter.
This work was partially supported by the Korean Government (MOTIE) (P0008703, the Competency Development Program for Industry Specialist), and the MSIT under the ICAN (ICT Challenge and Advanced Network of HRD) program (No. IITP-2022-RS-2022-00156310) supervised by the Institute of Information & Communication Technology Planning & Evaluation (IITP).
is a researcher at the CSE Lab of Convergence Security Engineering, Sungshin Women’s University, Seoul, Korea. Her research interests include convergence security and deep reinforcement learning.
received her B.S. degree in convergence security engineering at Sungshin Women’s University in 2021, and her M.S. degree in the department for future convergence technology engineering at Sungshin Women’s University in 2023. She is currently pursuing a Ph.D. degree in future convergence technology engineering at Sungshin Women’s University, Seoul, Korea. Her research interests include wireless networks, endpoint security, and convergence security using deep learning.
received his B.S. Degree in electrical engineering at Sogang University, Seoul, Korea, in 2003, and M.S. degree from the department of information and communications engineering at Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, in 2005. He received his Ph.D. degree from the Graduate School of Information Security in computer science & engineering, KAIST, in 2016. He is currently a Professor at the Department of Convergence Security Engineering, Sungshin Women’s University (SWU), Seoul, Korea. Before joining SWU in March 2017, he was a Senior Researcher at the Electronics and Telecommunications Research Institute (ETRI) during 2005-2017 and served as a Principal Architect and Project Leader for Newratek (KR) and Newracom (US) during 2014-2017. His research interests are wireless/mobile networks with an emphasis on information security, networks, wireless circuits, and systems. He has authored/coauthored more than 160 technical papers in the areas of information security, wireless networks, and communications, and holds approximately 160 patents. In addition, he is an active participant in and contributor to the IEEE 802.11 WLAN standardization committee.
Journal of information and communication convergence engineering 2023; 21(3): 198-207
Published online September 30, 2023 https://doi.org/10.56977/jicce.2023.21.3.198
Copyright © Korea Institute of Information and Communication Engineering.
Chae-Rim Han 1*, Sun-Jin Lee 2, and Il-Gu Lee1,2*
1Department of Convergence Security Engineering, Sungshin Women’s University, 02844, Korea
2Department of Future Convergence Technology Engineering Sungshin Women’s University, 02844, Korea
Correspondence to:Il-Gu Lee (E-mail: iglee@sungshin.ac.kr)
Department of Convergence Security Engineering, Sungshin Women’s University, 02844, Korea
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Responding to changes in artificial intelligence models and the data environment is crucial for increasing data-learning accuracy and inference stability of industrial applications. A learning model that is overfitted to specific training data leads to poor learning performance and a deterioration in flexibility. Therefore, an early stopping technique is used to stop learning at an appropriate time. However, this technique does not consider the homogeneity and independence of the data collected by heterogeneous nodes in a differential network environment, thus resulting in low learning accuracy and degradation of system performance. In this study, the generalization performance of neural networks is maximized, whereas the effect of the homogeneity of datasets is minimized by achieving an accuracy of 99.7%. This corresponds to a decrease in delay time by a factor of 2.33 and improvement in performance by a factor of 2.5 compared with the conventional method.
Keywords: Deep reinforcement learning, Early stopping, Neural network, Overfitting
Neural networks can restore several damaged neurons or distorted data owing to their fault tolerance and parallelism model relationships between complex data, thus allowing the prompt learning of nonlinear relationships between largescale data. Owing to these advantages, neural networks have been used in various fields, such as text, voice, and image recognition, in addition to natural language processing.
During the training process of an artificial intelligence (AI) model, overfitting occurs when the computational volume increases or the dataset is excessively optimized [1]. When overfitting occurs, the model achieves high accuracy on the learning data; however, the accuracy on the new data is lower owing to the low amount of learning, which significantly influences the neural network and network performance. As a solution to this problem, early stopping is a method of storing models in optimal epochs by terminating learning when the validation loss does not decrease further after a particular epoch [2]. The data should be independent and homogeneous for the valid implementation of early stopping. However, in a heterogeneous distributed network environment where learning data are scarcer than the data required for local processing, the data collected from a node are not independently and identically distributed (IID); therefore, the accuracy is low [3].
Numerous studies have been conducted to optimize the generalization capability of neural networks. The index data division (IDD) algorithm [4,5], a representative method, is a method of division according to the index distance rule, wherein data are distributed. This method demonstrates high performance when the rules by which the data are split are known; however, in an environment where the quality of the learning data is non-homogeneous, the learning accuracy significantly degrades. The random data division (RDD) algorithm [6,7] is a prompt method for randomly segmenting data; however, it can degrade network performance. The block data division (BDD) algorithm [8,9] demonstrates superior performance to alternative methods because it randomly and evenly arranges data. However, it presents a significant difference in the learning precision depending on the data arrangement rules.
Recent studies have been conducted to optimize the generalized capabilities of neural networks using clustering techniques, including fuzzy c-means clustering (FMC) [11,12], center-based clustering [14], density-based clustering [15], and hierarchical clustering [16,17]. These methods improve the clustering precision of neural networks through silhouette analysis. However, these studies did not address the class imbalance problem because the accuracy varies depending on the homogeneity of the dataset. In addition, the original dataset can be freely modified, given that the data are balanced using an oversampling scheme.
The early stopping hyperparameters should be optimized by multiplying the weights with respect to the data-learning ratio to improve the accuracy of conventional FMC techniques. Therefore, this study proposes an early stopping algorithm for the application of an optimal patience hyperparameter. The main contributions of this study are as follows.
• The data homogeneity and learning accuracy were improved by applying an early stopping method based on an optimal patience hyperparameter.
• By applying FMC algorithms to classify the data, we mitigated the problems of network degradation and precision damage associated with the unrefined datasets of conventional methods.
• The proposed method achieved an accuracy of 99.7%, corresponding to a decrease in delay time by a factor of 2.33 and improvement in performance by a factor of 2.5 compared with the conventional method.
The remainder of this paper is organized as follows. Section II presents a comparison and analysis of related studies. Section III proposes a novel data measurement method that improves the conventional early stopping method. Section IV presents an evaluation of the latency, loss, and accuracy of the proposed method in comparison with the FMC and existing early stopping methods (IDD and RDD). Finally, Section V concludes the study.
This section presents an analysis of previous research and the limitations of the data segmentation and clustering methods used for neural network learning. The latency and accuracy of each algorithm were measured in the same environment at 3,000 iterations and 30 epochs, respectively.
During the early stopping algorithm training process, the data segmentation scheme determines the precision of the neural network. In this case, the data division refers to the operation of dividing data into training, validation, and test sets. Table 1 presents an analysis of the conventional early stopping algorithm data division method.
Table 1 . Previous research on the data segmentation method of the early stopping algorithm.
Previous research | Feature | Limitation | Latency | Accuracy |
---|---|---|---|---|
IDD [4,5] | Splits data based on manually specified indices | • 262.74 s (low) of processing time • Difficult to quantify objective performance | 1.7 s | 94.6% |
RDD [6,7] | • Splits data based on automatically generated percentages • High performance for large datasets | Overfitting occurs with less than 10,000 datasets | 2.3 s | 87.08% |
BDD [8,9] | • Random data division algorithm and memory allocation in blocks • Reduced memory usage | 83.41% (low) accuracy for data arranged by specific rules | 2.64 s | Depends on the status of the data array |
The IDD algorithm improves segmentation performance by segmenting the data according to a manually specified index. Moreover, IDD demonstrates high performance when the data are segmented according to a particular segmentation rule; however, in most cases, quantifying the objective performance is challenging because the rules by which the datasets are sorted are not known. In addition, the speed is low because the user must manually specify an index.
The RDD algorithm divides data according to an automatically generated percentage. Although it is faster than IDD and BDD, its performance is high when the dataset is large because overfitting is highly likely to occur with less than 10,000 datasets.
The BDD algorithm divides data by allocating memory in blocks to an RDD algorithm, which can reduce memory usage compared with other algorithms. However, because only randomly specified training datasets are used for training, data arranged according to specific rules have a low accuracy of 83.41% [8,9].
As a result of experiments, the latency of IDD, RDD, and BDD were derived as 1.7, 2.3, and 2.64 s, respectively. The accuracies of IDD and RDD were 94.6 and 87.08%, respectively; however, for BDD, different values were derived depending on the state of arrangement of the data. Therefore, IDD and RDD were selected as control groups.
Clustering is a type of unsupervised learning method based on the grouping of data with similar characteristics and is classified as soft clustering [10], hard clustering [13], and hierarchical clustering [15,16] (i.e., the clustering of data based on the probability of belonging to a cluster, depending on the distance between clusters). Table 2 presents an analysis of the related clustering studies.
Table 2 . Related work on clustering.
Previous research | Feature | Limitation | Latency | Accuracy | |
---|---|---|---|---|---|
Soft clustering (FMC) [10-12] | • Represents the degree of membership as a probability • Calculates weights by maximizing similarities within clusters and minimizing similarities between clusters | • Accuracy varies with respect to the homogeneity of the dataset • Variations in the original dataset | 2.04 s | Depends on data homogeneity | |
Hard clustering [13] | Center-based clustering [14] | Clusters points closest to the cluster center | • 88.7% (low) accuracy when the number of features increases • Increase in latency and sensitivity when the number of iterations increases | 1.71 s | 96.76% |
Density-based clustering [15] | Clusters by moving the center of a cluster to a dense data location | • Increase in latency when the number of features increases • Performance depends on the cluster radius variable | 1.56 s | 90.53% | |
Hierarchical clustering [16,17] | Clusters by calculating the distance between clusters and creating similar data pairs with short distances into one cluster | Increase in latency when the number of features increases | 1.65 s | 97.2% |
The FMC proposed by J.C. Dunn is a soft clustering method, which is the most high-level clustering method; however, its precision varies with the homogeneity of the dataset because it does not solve the class imbalance problem arising during training. Furthermore, variations may occur in the original dataset during the training process because the data are balanced using an oversampling method.
Hard clustering is classified as center-based clustering [14], which allocates data on a central basis, and densitybased clustering [15], which allocates data on a distance basis. Center-based clustering clusters the points closest to any cluster center. As the number of features increases, the precision of the method decreases. Furthermore, given that center-based clustering uses an iterative algorithm, with an increase in the number of iterations, the latency increases, in addition to the sensitivity of the outliers. Density-based clustering forms clusters by moving a cluster center to a dense data location. Herein, clustering can be achieved without setting the number of clusters; however, the performance is significantly dependent on the cluster radius variable, and the latency increases.
Hierarchical clustering assumes each data point to be a cluster, calculates the distance between clusters, and generates and clusters similar data pairs with short distances into one cluster. Hierarchical clustering is limited because the delay time increases with the number of features.
To overcome the limitations of the algorithms presented above, this study proposes a novel early stopping algorithm based on the FMC algorithm with optimal clustering performance.
This section describes the framework of the enhanced FMC (EFMC), which is an early stopping algorithm based on the FMC algorithm. Fig. 1 shows a structural diagram of the proposed algorithm. As shown in Fig. 1, the EFMC conducts training using the FMC for algorithms that perform data division using the optimal patience hyperparameter values.
The patience hyperparameter stops training when the extent of the increase in performance does not match the increase in the number of epochs in the early stopping mechanism and exhibits a significant influence on neural network training [18].
Average precision (AP) is expressed in (1) [19], where
The mean average decision (mAP) was derived by calculating the area below the AP graph in (1) and the average value for each class. As the initialization was performed randomly during training, each experiment was repeated five times to ensure accuracy. The optimal parameters were determined by comparing the mAP values when the patience hyperparameters were 2, 3, 4, and 5. Fig. 2 presents the test results of the average mAP values for the validation and test sets with respect to the patience hyperparameter values. The mAPs for the validation set with respect to patience hyperparameters 2-5 were 86.74, 89.07, 88.45, and 0, whereas those for the training set were 86.29, 88.32, 87.26, and 0, respectively. In an environment where the optimal hyperparameter varies depending on the dataset, the EMFC framework sets patience hyperparameter 3, which exhibits the optimal mAP value on the dataset used in the experiment, as the optimal value through the above process.
Conventional techniques are limited in that the learning performance is degraded when a non-homogeneous dataset is input because early stopping algorithms are not refined. To overcome this limitation, the data were sorted using FMC [20,21]. 7,860-character data and 6,000 32 × 32 image data were collected from the Modified National Institute of Standards and Technology (MNIST) and Canadian Institute for Advanced Research (CIFAR-10) datasets, respectively. Both datasets were divided into training, validation, and test sets in a ratio of 60:20:20 [22]. Table 3 lists the number of data samples used in the experiments.
Table 3 . Experimental dataset.
Dataset category | MNIST dataset | CIFAR-10 dataset |
---|---|---|
Training set | 4,720 | 3,600 |
Validation set | 1,570 | 1,200 |
Test set | 1,570 | 1,200 |
Total | 7,860 | 6,000 |
Data were sorted with respect to the distance from the cluster center, and the data were distributed with respect to the dataset division ratio. The proposed algorithm can improve precision because it uses an optimal algorithm to divide the data from the optimal patience hyperparameter value, and the homogeneity of the dataset has a slight influence because it is independent of the data distribution rule.
The homogeneity of the dataset was calculated as the average distance between the pointer and the center of the cluster used by the FMC. With an increase in the average distance, the homogeneity increases, and when
The operation process of the FMC-based early stopping algorithm is shown in Fig. 3 [24,25], where
The calculation formula of
As shown in Fig. 3, EFMC operates in four stages. When the dataset is input, first, cluster initialization is performed, and
This section describes the evaluation environment for the proposed early stopping method and presents a comparison and analysis method for the speed and accuracy of the training and validation sets with respect to the IDD, RDD, and FMC.
The collected image data were preprocessed through onehot encoding and then augmented using the ImageDataGenerator class in the TensorFlow Keras library. Augmentation means increasing the amount of data by adding noise while maintaining the nature of the original data.
The collected data were extracted and combined with pixel x of the same size from a randomly selected
Fig. 4 shows a KS test cumulative probability distribution diagram for 7,860 nonuniform data samples extracted from 64,580 randomly selected MNIST data samples. In the graph, the x-axis represents the vector x, whereas the y-axis represents the cumulative probability. Evidently, the hypothesis (
TensorFlow Keras library was used to create convolutional neural networks (CNNs) and the patience hyperparameter, which was updated whenever learning was conducted using the perception class. The model was optimized using crossentropy and the Adam optimizer and was set to early stopping if the model remained unchanged more than three times.
Data clusters utilized the scikit-learn fuzzy-c-means package, and the accuracy was evaluated with respect to a logistic- regression class. Finally, it was visualized as a pyplot using the Matplotlib library from the Pandas package. The hardware specifications for the experiment are presented in Table 4.
Table 4 . Hardware specifications.
Hardware | Specification |
---|---|
CPU | Intel(R) Core™ i7-1065G7 CPU @ 1.30 GHz |
GPU | Intel(R) Iris(R) Plus Graphics |
GPU Memory | 7.9 GB |
RAM | 16 GB |
SSD | 477 GB |
1) Latency
Fig. 5 presents a comparison graph of the delay times of the EFMC, IDD, RDD, and FMC with respect to the number of iterations and homogeneity of the MNIST character dataset. In the graph, the x-axis represents the homogeneity of the dataset, whereas the y-axis represents the iteration. Homogeneity is an index measured by the ratio of nonhomogeneous noise distributed in a dataset. The noise was measured by multiplying the maximum point of the loss value of the loss function by the epsilon parameter. The epsilon parameter is an index that denotes the degree of damage to the image data, and in the experiment, it was set in the range of 0-0.5. The number of epochs was set to 30, and the iterations and epochs were set as factors with the same meaning. The time required for one epoch was measured as 6 s.
Latency and iterations were inversely proportional, and the latencies of the four models were the same when the homogeneity was 0%. However, when the homogeneity was 10% or higher, the latency of the EFMC was optimal. The latency of EFMC in the homogeneity range of 0-100% and iteration range of 0-8,000 improved by an average factor of 2.05 with a homogeneity of 50% and a factor of up to 2.33 over 3,000 iterations.
Table 5 compares the latencies of the EFMC, IDD, RDD, and FMC with respect to the homogeneity of CIFAR-10. The number of iterations was set to 4,000, and the other conditions were the same as those shown in Fig. 4.
Table 5 . Latency of EFMC, IDD, RDD, and FMC with respect to homogeneity.
Latency (s) | EFMC | IDD | RDD | FMC |
---|---|---|---|---|
Homogeneity (%) | ||||
0 | 8 | 8 | 8 | 8 |
10 | 4.7 | 6.13 | 6.88 | 6.35 |
20 | 4.27 | 5.97 | 6.3 | 6.21 |
30 | 3.92 | 5.4 | 6.02 | 5.98 |
40 | 2.43 | 3.06 | 3.47 | 3.19 |
50 | 1.71 | 2.7 | 2.98 | 2.84 |
60 | 1.29 | 2.13 | 2.74 | 2.3 |
70 | 1.32 | 2.17 | 2.8 | 2.32 |
80 | 1.3 | 2.21 | 2.83 | 2.3 |
90 | 1.33 | 2.22 | 2.88 | 2.35 |
100 | 1.33 | 2.22 | 2.87 | 2.34 |
As with the results on the MNIST character dataset, the homogeneity and latency were inversely proportional, and the latencies of the four models were the same when the homogeneity was 0%; however, EFMC had the best latency when the homogeneity was more than 10%. In the homogeneity range of 0-100%, the latency of EFMC improved by an average factor of 1.87 and factor of up to 2.12 with a homogeneity of 60%.
2) Accuracy
Fig. 6 presents a comparison graph of the loss values and accuracies of EFMC, IDD, RDD, and FMC on the MNIST character dataset. Fig. 6(a) presents the measurement results for the loss value with respect to the number of epochs, and Fig. 6(b) presents the evaluation results in terms of the accuracy with respect to the number of epochs. In this case, a complementary relationship between the loss value and the accuracy was established. As the number of epochs increased, the loss value decreased and accuracy improved.
With reference to the experimental results, the proposed EFMC loss value was lower than the others, and it was improved by an average factor of 2.2 compared with the other three models. At epoch 30, the loss improved by a factor of up to 2.71, essentially presenting the smallest deviation among the four models. In addition, EFMC demonstrated the highest accuracy and was improved by an average factor of 1.16. The accuracy improved by a factor of up to 1.24 at epoch 0, essentially presenting the smallest deviation among the four models.
Table 6 presents a comparison of the epoch accuracies of the CIFAR-10 image dataset. Similar to the results for the MNIST character dataset, the epoch and accuracy exhibited a proportional relationship, and EFMC achieved the best accuracy. At 0-30 epochs, EFMC improved by an average factor of 1.1 and achieved the highest accuracy of up to 98.3% at epoch 30.
Table 6 . Accuracy of EFMC, IDD, RDD, and FMC in terms of the number of epochs.
Accuracy (%) | EFMC | IDD | RDD | FMC |
---|---|---|---|---|
Epoch | ||||
0 | 91.72 | 87 | 78.86 | 84.98 |
5 | 96.48 | 89.09 | 80.77 | 86.6 |
10 | 97.86 | 90.7 | 81.4 | 87.55 |
15 | 97.9 | 92.28 | 82 | 88.71 |
20 | 98.04 | 92.66 | 85.95 | 91.24 |
25 | 98.21 | 93.67 | 86.08 | 93.3 |
30 | 98.3 | 93.8 | 86.54 | 93.63 |
Fig. 7 presents a distribution diagram comparing the accuracies of EFMC, IDD, RDD, and FMC with respect to the homogeneity of the dataset. Fig. 7(a) presents the result for epoch 30, whereas Fig. 7(b) presents the result for epoch 150. In this experiment, the accuracy was measured using the coefficient of determination (
The
The EFMC algorithm applied using the proposed early stopping method exhibited optimal average clustering performance because it improved the delay time, loss value, and accuracy by factors of 2.33, 2.71, and 2.5, respectively, compared with conventional models.
This study proposed an early stopping algorithm based on the FMC algorithm. Herein, the performance of the proposed algorithm was analyzed. Conventional techniques such as IDD, BDD, and RDD exhibit low accuracy because they are trained with rough datasets, and the network performance is degraded. FMC, center-based clustering, density-based clustering, and hierarchical clustering vary in precision depending on the homogeneity of the dataset and are limited with respect to the transformation of the original dataset during training. Herein, the accuracy was improved by dividing the data by the optimal patience hyperparameter value derived from the neural networks. Based on this, the early stopping algorithm was improved such that the homogeneity of the dataset was unaffected by the FMC algorithm. The experimental results revealed that the proposed technique demonstrated improved clustering performance with respect to latency, loss value, and accuracy by factors of 2.33, 2.71, and 2.5, respectively, compared with the conventional method with fewer epochs.
The proposed EFMC heuristically sets the patience hyperparameter in advance during data preprocessing according to the dataset. Although accuracy and homogeneity are guaranteed, it has a structural limitation that slows down as the types of datasets become more diverse. Future research will focus on increasing efficiency by establishing an experimental environment considering various heterogeneous nodes and by automatically setting an optimal patience hyperparameter.
This work was partially supported by the Korean Government (MOTIE) (P0008703, the Competency Development Program for Industry Specialist), and the MSIT under the ICAN (ICT Challenge and Advanced Network of HRD) program (No. IITP-2022-RS-2022-00156310) supervised by the Institute of Information & Communication Technology Planning & Evaluation (IITP).
Table 1 . Previous research on the data segmentation method of the early stopping algorithm.
Previous research | Feature | Limitation | Latency | Accuracy |
---|---|---|---|---|
IDD [4,5] | Splits data based on manually specified indices | • 262.74 s (low) of processing time • Difficult to quantify objective performance | 1.7 s | 94.6% |
RDD [6,7] | • Splits data based on automatically generated percentages • High performance for large datasets | Overfitting occurs with less than 10,000 datasets | 2.3 s | 87.08% |
BDD [8,9] | • Random data division algorithm and memory allocation in blocks • Reduced memory usage | 83.41% (low) accuracy for data arranged by specific rules | 2.64 s | Depends on the status of the data array |
Table 2 . Related work on clustering.
Previous research | Feature | Limitation | Latency | Accuracy | |
---|---|---|---|---|---|
Soft clustering (FMC) [10-12] | • Represents the degree of membership as a probability • Calculates weights by maximizing similarities within clusters and minimizing similarities between clusters | • Accuracy varies with respect to the homogeneity of the dataset • Variations in the original dataset | 2.04 s | Depends on data homogeneity | |
Hard clustering [13] | Center-based clustering [14] | Clusters points closest to the cluster center | • 88.7% (low) accuracy when the number of features increases • Increase in latency and sensitivity when the number of iterations increases | 1.71 s | 96.76% |
Density-based clustering [15] | Clusters by moving the center of a cluster to a dense data location | • Increase in latency when the number of features increases • Performance depends on the cluster radius variable | 1.56 s | 90.53% | |
Hierarchical clustering [16,17] | Clusters by calculating the distance between clusters and creating similar data pairs with short distances into one cluster | Increase in latency when the number of features increases | 1.65 s | 97.2% |
Table 3 . Experimental dataset.
Dataset category | MNIST dataset | CIFAR-10 dataset |
---|---|---|
Training set | 4,720 | 3,600 |
Validation set | 1,570 | 1,200 |
Test set | 1,570 | 1,200 |
Total | 7,860 | 6,000 |
Table 4 . Hardware specifications.
Hardware | Specification |
---|---|
CPU | Intel(R) Core™ i7-1065G7 CPU @ 1.30 GHz |
GPU | Intel(R) Iris(R) Plus Graphics |
GPU Memory | 7.9 GB |
RAM | 16 GB |
SSD | 477 GB |
Table 5 . Latency of EFMC, IDD, RDD, and FMC with respect to homogeneity.
Latency (s) | EFMC | IDD | RDD | FMC |
---|---|---|---|---|
Homogeneity (%) | ||||
0 | 8 | 8 | 8 | 8 |
10 | 4.7 | 6.13 | 6.88 | 6.35 |
20 | 4.27 | 5.97 | 6.3 | 6.21 |
30 | 3.92 | 5.4 | 6.02 | 5.98 |
40 | 2.43 | 3.06 | 3.47 | 3.19 |
50 | 1.71 | 2.7 | 2.98 | 2.84 |
60 | 1.29 | 2.13 | 2.74 | 2.3 |
70 | 1.32 | 2.17 | 2.8 | 2.32 |
80 | 1.3 | 2.21 | 2.83 | 2.3 |
90 | 1.33 | 2.22 | 2.88 | 2.35 |
100 | 1.33 | 2.22 | 2.87 | 2.34 |
Table 6 . Accuracy of EFMC, IDD, RDD, and FMC in terms of the number of epochs.
Accuracy (%) | EFMC | IDD | RDD | FMC |
---|---|---|---|---|
Epoch | ||||
0 | 91.72 | 87 | 78.86 | 84.98 |
5 | 96.48 | 89.09 | 80.77 | 86.6 |
10 | 97.86 | 90.7 | 81.4 | 87.55 |
15 | 97.9 | 92.28 | 82 | 88.71 |
20 | 98.04 | 92.66 | 85.95 | 91.24 |
25 | 98.21 | 93.67 | 86.08 | 93.3 |
30 | 98.3 | 93.8 | 86.54 | 93.63 |
Jae-Kyung Lee and Jae-Hong Yim
Journal of information and communication convergence engineering 2023; 21(1): 45-53 https://doi.org/10.56977/jicce.2023.21.1.45YA Chen , Tan Juan , and Hoekyung Jung
Journal of information and communication convergence engineering 2023; 21(1): 9-16 https://doi.org/10.56977/jicce.2023.21.1.9Choi, Jae-Seung;
The Korea Institute of Information and Commucation Engineering 2012; 10(2): 162-167 https://doi.org/10.6109/jicce.2012.10.2.162