Journal of information and communication convergence engineering 2021; 19(4): 241-247
Published online December 31, 2021
https://doi.org/10.6109/jicce.2021.19.4.241
© Korea Institute of Information and Communication Engineering
Particulate matter has emerged as a serious global problem, necessitating highly reliable information on the matter. Therefore, various algorithms have been used in studies to predict particulate matter. In this study, we compared the prediction performance of neural network models that have been actively studied for particulate matter prediction. Among the neural network algorithms, a deep neural network (DNN), a recurrent neural network, and long short-term memory were used to design the optimal prediction model using a hyper-parameter search. In the comparative analysis of the prediction performance of each model, the DNN model showed a lower root mean square error (RMSE) than the other algorithms in the performance comparison using the RMSE and the level of accuracy as metrics for evaluation. The stability of the recurrent neural network was slightly lower than that of the other algorithms, although the accuracy was higher.
Keywords Neural network, Deep neural network, Recurrent neural network, Long short-term memory, Particulate matter
Particulate matter (PM) is dust whose particles are invisible to the naked eye. According to some studies, PM has a variety of negative effects, including increasing the risk of development of cardiovascular, respiratory, and cerebrovascular diseases, as well as harming the body’s defense system. The International Agency for Research on Cancer under the World Health Organization has designated PM as a group 1 carcinogen. PM has also been analyzed as a cause of declining economic activities within societies [1-7]. PM is formed as a result of air pollutants emitted from automobiles, factories, and cooking processes, among other sources. Industrial activities based on fossil fuels, such as coal and petroleum, have a significant impact on it. As a result, many people are increasingly showing interest in PM and requesting information to prepare for its occurrence in advance. In South Korea, forecasts for PM have been implemented by combining various numerical model results with the prediction results of the community multi-scale air quality model, which is an air quality prediction model [8, 9]. People, however, are demanding higher forecast accuracy because the accuracy of existing models has not met expectations.
As a result, numerous studies have been conducted to increase the accuracy of PM prediction, and studies using neural networks are actively underway. In relation to PM predictions based on a deep neural network (DNN) among the available neural network algorithms, Dedovic et al. conducted a study using the weather variables and PM concentrations in Sarajevo for three years to achieve a prediction through an artificial neural network consisting of a single hidden layer. The study confirms that the performance in predicting the PM concentration can be improved using an extended input dataset containing PM concentration data from previous years [10]. In a study related to recurrent neural network (RNN)-based PM prediction, Lim et al. proposed an RNN model that uses sequential data of previous PM concentrations to predict air pollutants. The proposed model was used to set the parameters that demonstrated the optimal prediction performance by changing the length of the input data, optimization function, and number of layers and nodes. It was demonstrated that the set parameters improved the PM prediction performance [11]. In a study related to long short-term memory (LSTM)-based PM prediction, Kang et al. conducted an LSTM-based PM prediction using weather data as the parameters. Data normalization was conducted to match the weather data range, and the PM concentration after 1 h was predicted using 3 h data, or the concentration after 12 h was predicted using 24 h data. The results of this study demonstrate that the data prediction performance can be improved using related weather factors [12].
In this study, we constructed PM10 models based on various neural network algorithms and trained them using the same data; we then conducted a comparative analysis of the performance of the algorithms. The aim was to investigate the impact of each algorithm characteristics on the prediction of PM; DNN, RNN, and LSTM algorithms were used to compare the performance of the three prediction models. We used the overall accuracy, detailed accuracy based on the air quality index (AQI), and the root mean square error (RMSE) to evaluate the performance of each model.
To design and test the prediction models, data were compiled based on the results of prior studies. Major data measured at 1 h intervals in Cheonan City, South Korea, during 2009-2018 were collected, as shown in Table 1. In the collected data, we observed that some data were not measured because of the environment and equipment, and for efficient training, all data from the same time were removed when the data were constructed.
Category | Variable | Number of Data | Number of Missing Data |
---|---|---|---|
Meteorological elements | Temperature | 87,622 | 26 |
Wind Speed | 87,612 | 36 | |
Wind Direction | 87,601 | 47 | |
Air pollutants | PM10 | 249,268 | 13,676 |
O3 | 255,464 | 8,480 | |
CO | 252,924 | 10,020 | |
NO2 | 254,244 | 8,700 | |
SO2 | 252,623 | 10,321 |
The data used for training consisted of numerical and categorical data. The data scale may affect the effectiveness of the training, which may degrade the performance of the prediction models.
Therefore, a preprocessing process is required for the collected data to make them suitable for training. Because the wind direction had angles in 16 directions and the data were categorical, they were converted into 0s and 1s using one-hot encoding. The remaining data were numerical data that had different scales and were applied after conversion into values between zero and one using min-max scaling to unify them at the same scale. Hence, after preprocessing, all data could be represented as a value between zero and one.
In the field of artificial intelligence, an artificial neural network (ANN) has an architecture that is modeled after the information-processing structure of the human brain and is used in machine learning and cognitive science. An ANN is defined through the connection pattern between neurons (nodes), that are the smallest unit of a neural network, the learning process for updating the weights of the connections, and finally the activation function, that converts the weighted inputs of neurons into activation level outputs. Sigmoid, hyperbolic tangent, and rectified linear unit (ReLU) functions are commonly used as activation functions; depending on the type, the input value is converted into an output value between zero and 1, or -1, which is then delivered to the next neuron.
The layers of a neural network primarily consist of an input, hidden, and output layers. If a single hidden layer is included, the neural network is classified as an ANN, and if multiple hidden layers are included, it is classified as a DNN. The synapses connecting the neurons of each layer have their own weights. A DNN first initializes them with a certain value and then updates the weights in the direction of reducing the loss of the final output layer. To this end, the DNN uses a gradient descent method that updates the weights of all connections that constitute the DNN using the differential values for the loss function. The gradient descent method consists of two steps, that is, forward and back propagations, in which the final loss is calculated using the forward propagation and each weight is updated in the direction to reduce the loss using the back propagation [10, 13-16].
Therefore, the intensity of the training was adjusted using various parameters to train the prediction model. There are various types of parameters, and the training effect is sensitive to their values. Accordingly, we derived and applied the optimal values for the main parameters through a hyper-parameter search. Table 2 shows the hyper-parameter search results corresponding to the top three ranks; the parameter corresponding to the first rank was applied. ReLU was used as the activation function in the model, and Adam was used as the optimization function. In addition, 100 epochs were applied in the design of the prediction model.
Rank | Layer | Hidden Node | L2 | Dropout Rate | Batch Size | |
---|---|---|---|---|---|---|
DNN | 1 | 2 | 100 | 0.01 | 0.1 | 100 |
2 | 3 | 140 | 0.001 | 0.5 | 60 | |
3 | 2 | 60 | 0.01 | 0.1 | 80 | |
RNN | 1 | 1 | 80 | 0 | 01 | 20 |
2 | 1 | 60 | 0 | 0 | 20 | |
3 | 2 | 100 | 0 | 0 | 20 | |
LSTM | 1 | 2 | 80 | 0 | 0.3 | 60 |
2 | 2 | 40 | 0 | 0.2 | 40 | |
3 | 2 | 40 | 0 | 0.1 | 20 |
An RNN is a type of neural network specializing in time-series data, which has a sequential order, such as in natural language processing or speech recognition. An RNN facilitates the effective modeling of time-series information, such as text or speech, because it has a recursive structure, in which the output value of the hidden layer affects the output value of the next state [15, 17]. Unlike feedforward neural networks, such as a DNN, the RNN algorithm has a structure in which the neural network is recursive between the input and output through a cyclic loop in the internal nodes. Furthermore, it has a structure in which all the weights connecting the internal layers have the same value. A hyperbolic tangent or ReLU function is commonly used as the activation function of an RNN [18, 19].
In the case of an RNN, similar to that of a DNN, the intensity of the training is also adjusted using various parameters for training the prediction model. Similar to a DNN, an RNN has different types of parameters, and timesteps for recursive training are added and applied. A period of 1 d was used as the standard for applying the timesteps to the model, and accordingly, a fixed value of 24 was applied. In the case of other main parameters, the optimal values were checked using a hyper-parameter search. Table 2 shows the hyper-parameter search results corresponding to the top three ranks, and the parameter corresponding to the first rank was applied.
Proposed by Hochreiter and Schmidhuber in 1997, LSTM is a type of RNN that supplements the vanishing gradient problem of an RNN, which is difficult to train on long sequence data. The basic structure of LSTM is similar to that of an RNN; however, the structure of the memory cell that remembers the previous state has been modified using a method for solving the vanishing gradient problem of an RNN. Both RNN and LSTM have a structure in which the neural network is recursive between the input and output. However, RNN controls the information transfer and propagation through a recursion on a single layer, whereas LSTM controls the information transfer and propagation through four different fully connected layers inside the cell: three types of gates, that is, a forget gate, an input gate, and an output gate, and the cell state [20, 21].
Every gate of an LSTM outputs a value between zero and one through a sigmoid operation, and the weight is set individually for each gate. The forget gate remembers all previous information if the value is close to one in a sigmoid operation of the previous output and input values of the current stage and deletes the previous information if it is close to zero. The input gate determines the degree of reflection of the current information, and the data to be stored in the cell state are determined based on the product of the resulting values of the hyperbolic tangent function and the sigmoid function of the input value of the current stage and the previous output. Subsequently, the cell state of the current stage is updated using the information obtained from the forget and input gates. Finally, the final output is determined based on the product of the sigmoid operation result of the previous output, the input value of the current stage, and the result of the tanh function of the cell state [22–24].
Therefore, the intensity of training is adjusted using various parameters for the training of the model, and the training effect is sensitive to the corresponding values in the same manner as with the other algorithms. Accordingly, we derived and applied the optimal values for the main parameters using a hyper-parameter search. Herein, identical to the case of the RNN model, the time steps were set to 24. ReLU and Adam were used as the activation and optimization functions, respectively, and 100 epochs were applied. Table 2 shows the hyper-parameter search results corresponding to the top three ranks, and the parameter corresponding to the first rank was applied to the model design.
The optimal values derived using the hyper-parameter search were applied to the design of each model, and the training set constructed as shown in Fig. 1 was used to train each model. Subsequently, the test set was used to evaluate the performance based on the predicted values, which were the training results of each model. The RMSE was used as a basis for the performance evaluation, and the accuracy of each AQI was used to determine the detailed prediction accuracy. Fig. 2 shows the prediction results of the sections that showed rapid changes in concentration among the test data for each model. Table 3 shows the RMSE and detailed prediction accuracy based on the prediction results of each model.
Indicator | DNN | RNN | LSTM |
---|---|---|---|
RMSE | 8.3459 | 8.3653 | 8.3964 |
Overall accuracy | 87.38% | 87.58% | 87.1% |
Accuracy for “good” AQI | 79.9% | 84.28% | 79.2% |
Accuracy for “moderate” AQI | 93.14% | 91.22% | 92.8% |
Accuracy for “bad” AQI | 75.5% | 75.24% | 76.83% |
Accuracy for “very bad” AQI | 65.44% | 72.79% | 66.18% |
In the case of a DNN, as shown in Fig. 2, the difference between the actual PM concentrations and the predicted values was small; however, in a section of 150-200 μg/m2, where the concentration changed rapidly, the PM concentrations were overestimated or underestimated compared to the actual concentrations. Furthermore, in the case of a PM concentration higher than 100 μg/m2 in that section, the difference in concentration was relatively large compared to that of the other sections. In the case of an RNN, the errors were not large between the actual PM concentrations and the predicted values. However, the concentration was overestimated for a typically high-concentration PM higher than 100 μg/m2. In contrast, in the case of a low-concentration PM, the actual PM concentration and predicted value were relatively close. In the case of LSTM, the concentration was overestimated for a typically high-concentration PM higher than 100 μg/m2. In a section of 150-170 μg/m2, particularly, where rapid changes in concentration were observed, the error was large between the actual PM concentration and the predicted value.
When the overall RMSE was compared, the RMSE of the DNN was 8.3459, indicating that it was a better indicator than the other models. However, in the case of overall accuracy, the RNN had a value of 87.58%, showing a higher accuracy compared to the other models. For a “good” AQI, the RNN model showed the highest accuracy with a value of 84.28%, and in the case of a “moderate” AQI, the DNN model showed the highest accuracy with a value of 93.14%. In the case of a “bad” AQI, the accuracy of the LSTM model was 76.83%, which was higher than that of other models, and in the case of a “very bad” AQI, the accuracy of the RNN model was 72.79%, which was higher than that of other models. When the DNN and RNN models were compared based on the RMSE and overall accuracy, the prediction accuracy of the RNN model was higher than that of the DNN model. However, the margin of error for the samples that were not successfully predicted was narrower in the DNN model than in the RNN model. In this case, the evaluation results showed that the DNN model was more stable than the RNN model in terms of the margin of error.
The air pollution problem caused by increases in population, automobiles, and industrial activities has emerged as a major environmental problem worldwide. In particular, as the effect of PM on the human body has been revealed, people are using PM forecasts in daily life to respond preemptively to the air environment. Accordingly, people are demanding a high level of PM prediction accuracy, and prediction studies using a variety of methods are underway. Among them, studies are actively being conducted on the prediction of PM concentrations using neural networks.
In this study, we used neural network algorithms that have been applied in many studies to predict the PM concentration, and conducted a performance analysis and evaluation for each model to select an algorithm that is suitable for PM concentration prediction. To this end, we collected weather and air pollutant data monitored over a 10 year period in the Cheonan region of South Korea. The training, validation, and test sets were constructed from the collected data for use in the training of the prediction models. The data were then preprocessed to minimize the learning problem that occurs when the characteristics of the data differ from each other. Because the wind direction is categorical data expressed in terms of 16 directions, it was converted into a vector type of 0s and 1s using one-hot encoding. However, because the other data had different scales, min-max scaling was used to convert the range of numerical representations of the data into a range of values between zero and one. DNN, RNN, and LSTM were selected as the neural network algorithms used for the performance evaluation of the prediction models. For optimal prediction performance, the optimal values of the parameters applied to each algorithm are required. A hyper-parameter search was conducted to obtain them, which were then applied to the model design. The same data were used to train the designed models and evaluate their performances.
To compare the performance of the prediction models constructed using neural network algorithms, we checked the trend changes of the actual and predicted values. Furthermore, the detailed prediction accuracies were examined by classifying them based on RMSE and AQI. For the trend changes derived using each model, the differences between the models were minimal. However, in the performance comparison using the RMSE and the level of accuracy, the DNN model showed a lower RMSE than the other algorithms, showing stably predicted values. In the case of an RNN, the stability was slightly lower than that of the other algorithms, although the accuracy was higher.
For a predicted PM concentration, the accuracy is important in terms of consistency between the actual and predicted values; however, considering the wide concentration range, it was determined that a DNN-based prediction model with a small margin of error is suitable. In the future, we plan to use neural network algorithms to conduct a study to reduce the margin of error in PM concentration prediction. It is expected that the construction of a prediction model with better performance will increase the use of reliable prediction information.
received his B.S. in Electronics Engineering from Kongju National University, Cheonan, South Korea, in 2014, and his M.S. in Electrical, Electronics, and Communication Engineering from Korea University of Technology and Education (KOREATECH) in 2016. He is currently pursuing a Ph.D. in Electrical, Electronics, and Communication Engineering at the Korea University of Technology and Education (KOREATECH). His research interests include machine learning, data analysis, and deep learning.
received his B.S. and M.S.E. degrees in telecommunication and information engineering from Korea Aerospace University, Kyunggi-Do, Korea, in 1988 and 1990, respectively. He received Ph.D. degree in avionics engineering from Korea Aerospace University, Kyunggi-Do, Korea, in 1996. From February 1990 to August 1993, he worked with Hanjin Electronics Co., where he was involved in the research and development of radio communication and monitoring systems. From October 1993 to February 1999, he worked with the CDMA R&D Center of Samsung Electronics Co., where he was involved in the design and development of CDMA cellular systems and CDMA PCS systems for successful commercial CDMA deployment in Korea. Since March 1999, he has been with the School of Electrical, Electronics and Communication Eng., Korea University of Technology and Education (KOREATECH), Cheonan, Korea, where he is currently a professor. His research interests are in the areas of wireless/mobile communication, wireless localization, IoT, and engineering education.
Journal of information and communication convergence engineering 2021; 19(4): 241-247
Published online December 31, 2021 https://doi.org/10.6109/jicce.2021.19.4.241
Copyright © Korea Institute of Information and Communication Engineering.
Yongjin Jung, Chang-Heon Oh
Korea University of Technology and Education
Particulate matter has emerged as a serious global problem, necessitating highly reliable information on the matter. Therefore, various algorithms have been used in studies to predict particulate matter. In this study, we compared the prediction performance of neural network models that have been actively studied for particulate matter prediction. Among the neural network algorithms, a deep neural network (DNN), a recurrent neural network, and long short-term memory were used to design the optimal prediction model using a hyper-parameter search. In the comparative analysis of the prediction performance of each model, the DNN model showed a lower root mean square error (RMSE) than the other algorithms in the performance comparison using the RMSE and the level of accuracy as metrics for evaluation. The stability of the recurrent neural network was slightly lower than that of the other algorithms, although the accuracy was higher.
Keywords: Neural network, Deep neural network, Recurrent neural network, Long short-term memory, Particulate matter
Particulate matter (PM) is dust whose particles are invisible to the naked eye. According to some studies, PM has a variety of negative effects, including increasing the risk of development of cardiovascular, respiratory, and cerebrovascular diseases, as well as harming the body’s defense system. The International Agency for Research on Cancer under the World Health Organization has designated PM as a group 1 carcinogen. PM has also been analyzed as a cause of declining economic activities within societies [1-7]. PM is formed as a result of air pollutants emitted from automobiles, factories, and cooking processes, among other sources. Industrial activities based on fossil fuels, such as coal and petroleum, have a significant impact on it. As a result, many people are increasingly showing interest in PM and requesting information to prepare for its occurrence in advance. In South Korea, forecasts for PM have been implemented by combining various numerical model results with the prediction results of the community multi-scale air quality model, which is an air quality prediction model [8, 9]. People, however, are demanding higher forecast accuracy because the accuracy of existing models has not met expectations.
As a result, numerous studies have been conducted to increase the accuracy of PM prediction, and studies using neural networks are actively underway. In relation to PM predictions based on a deep neural network (DNN) among the available neural network algorithms, Dedovic et al. conducted a study using the weather variables and PM concentrations in Sarajevo for three years to achieve a prediction through an artificial neural network consisting of a single hidden layer. The study confirms that the performance in predicting the PM concentration can be improved using an extended input dataset containing PM concentration data from previous years [10]. In a study related to recurrent neural network (RNN)-based PM prediction, Lim et al. proposed an RNN model that uses sequential data of previous PM concentrations to predict air pollutants. The proposed model was used to set the parameters that demonstrated the optimal prediction performance by changing the length of the input data, optimization function, and number of layers and nodes. It was demonstrated that the set parameters improved the PM prediction performance [11]. In a study related to long short-term memory (LSTM)-based PM prediction, Kang et al. conducted an LSTM-based PM prediction using weather data as the parameters. Data normalization was conducted to match the weather data range, and the PM concentration after 1 h was predicted using 3 h data, or the concentration after 12 h was predicted using 24 h data. The results of this study demonstrate that the data prediction performance can be improved using related weather factors [12].
In this study, we constructed PM10 models based on various neural network algorithms and trained them using the same data; we then conducted a comparative analysis of the performance of the algorithms. The aim was to investigate the impact of each algorithm characteristics on the prediction of PM; DNN, RNN, and LSTM algorithms were used to compare the performance of the three prediction models. We used the overall accuracy, detailed accuracy based on the air quality index (AQI), and the root mean square error (RMSE) to evaluate the performance of each model.
To design and test the prediction models, data were compiled based on the results of prior studies. Major data measured at 1 h intervals in Cheonan City, South Korea, during 2009-2018 were collected, as shown in Table 1. In the collected data, we observed that some data were not measured because of the environment and equipment, and for efficient training, all data from the same time were removed when the data were constructed.
Category | Variable | Number of Data | Number of Missing Data |
---|---|---|---|
Meteorological elements | Temperature | 87,622 | 26 |
Wind Speed | 87,612 | 36 | |
Wind Direction | 87,601 | 47 | |
Air pollutants | PM10 | 249,268 | 13,676 |
O3 | 255,464 | 8,480 | |
CO | 252,924 | 10,020 | |
NO2 | 254,244 | 8,700 | |
SO2 | 252,623 | 10,321 |
The data used for training consisted of numerical and categorical data. The data scale may affect the effectiveness of the training, which may degrade the performance of the prediction models.
Therefore, a preprocessing process is required for the collected data to make them suitable for training. Because the wind direction had angles in 16 directions and the data were categorical, they were converted into 0s and 1s using one-hot encoding. The remaining data were numerical data that had different scales and were applied after conversion into values between zero and one using min-max scaling to unify them at the same scale. Hence, after preprocessing, all data could be represented as a value between zero and one.
In the field of artificial intelligence, an artificial neural network (ANN) has an architecture that is modeled after the information-processing structure of the human brain and is used in machine learning and cognitive science. An ANN is defined through the connection pattern between neurons (nodes), that are the smallest unit of a neural network, the learning process for updating the weights of the connections, and finally the activation function, that converts the weighted inputs of neurons into activation level outputs. Sigmoid, hyperbolic tangent, and rectified linear unit (ReLU) functions are commonly used as activation functions; depending on the type, the input value is converted into an output value between zero and 1, or -1, which is then delivered to the next neuron.
The layers of a neural network primarily consist of an input, hidden, and output layers. If a single hidden layer is included, the neural network is classified as an ANN, and if multiple hidden layers are included, it is classified as a DNN. The synapses connecting the neurons of each layer have their own weights. A DNN first initializes them with a certain value and then updates the weights in the direction of reducing the loss of the final output layer. To this end, the DNN uses a gradient descent method that updates the weights of all connections that constitute the DNN using the differential values for the loss function. The gradient descent method consists of two steps, that is, forward and back propagations, in which the final loss is calculated using the forward propagation and each weight is updated in the direction to reduce the loss using the back propagation [10, 13-16].
Therefore, the intensity of the training was adjusted using various parameters to train the prediction model. There are various types of parameters, and the training effect is sensitive to their values. Accordingly, we derived and applied the optimal values for the main parameters through a hyper-parameter search. Table 2 shows the hyper-parameter search results corresponding to the top three ranks; the parameter corresponding to the first rank was applied. ReLU was used as the activation function in the model, and Adam was used as the optimization function. In addition, 100 epochs were applied in the design of the prediction model.
Rank | Layer | Hidden Node | L2 | Dropout Rate | Batch Size | |
---|---|---|---|---|---|---|
DNN | 1 | 2 | 100 | 0.01 | 0.1 | 100 |
2 | 3 | 140 | 0.001 | 0.5 | 60 | |
3 | 2 | 60 | 0.01 | 0.1 | 80 | |
RNN | 1 | 1 | 80 | 0 | 01 | 20 |
2 | 1 | 60 | 0 | 0 | 20 | |
3 | 2 | 100 | 0 | 0 | 20 | |
LSTM | 1 | 2 | 80 | 0 | 0.3 | 60 |
2 | 2 | 40 | 0 | 0.2 | 40 | |
3 | 2 | 40 | 0 | 0.1 | 20 |
An RNN is a type of neural network specializing in time-series data, which has a sequential order, such as in natural language processing or speech recognition. An RNN facilitates the effective modeling of time-series information, such as text or speech, because it has a recursive structure, in which the output value of the hidden layer affects the output value of the next state [15, 17]. Unlike feedforward neural networks, such as a DNN, the RNN algorithm has a structure in which the neural network is recursive between the input and output through a cyclic loop in the internal nodes. Furthermore, it has a structure in which all the weights connecting the internal layers have the same value. A hyperbolic tangent or ReLU function is commonly used as the activation function of an RNN [18, 19].
In the case of an RNN, similar to that of a DNN, the intensity of the training is also adjusted using various parameters for training the prediction model. Similar to a DNN, an RNN has different types of parameters, and timesteps for recursive training are added and applied. A period of 1 d was used as the standard for applying the timesteps to the model, and accordingly, a fixed value of 24 was applied. In the case of other main parameters, the optimal values were checked using a hyper-parameter search. Table 2 shows the hyper-parameter search results corresponding to the top three ranks, and the parameter corresponding to the first rank was applied.
Proposed by Hochreiter and Schmidhuber in 1997, LSTM is a type of RNN that supplements the vanishing gradient problem of an RNN, which is difficult to train on long sequence data. The basic structure of LSTM is similar to that of an RNN; however, the structure of the memory cell that remembers the previous state has been modified using a method for solving the vanishing gradient problem of an RNN. Both RNN and LSTM have a structure in which the neural network is recursive between the input and output. However, RNN controls the information transfer and propagation through a recursion on a single layer, whereas LSTM controls the information transfer and propagation through four different fully connected layers inside the cell: three types of gates, that is, a forget gate, an input gate, and an output gate, and the cell state [20, 21].
Every gate of an LSTM outputs a value between zero and one through a sigmoid operation, and the weight is set individually for each gate. The forget gate remembers all previous information if the value is close to one in a sigmoid operation of the previous output and input values of the current stage and deletes the previous information if it is close to zero. The input gate determines the degree of reflection of the current information, and the data to be stored in the cell state are determined based on the product of the resulting values of the hyperbolic tangent function and the sigmoid function of the input value of the current stage and the previous output. Subsequently, the cell state of the current stage is updated using the information obtained from the forget and input gates. Finally, the final output is determined based on the product of the sigmoid operation result of the previous output, the input value of the current stage, and the result of the tanh function of the cell state [22–24].
Therefore, the intensity of training is adjusted using various parameters for the training of the model, and the training effect is sensitive to the corresponding values in the same manner as with the other algorithms. Accordingly, we derived and applied the optimal values for the main parameters using a hyper-parameter search. Herein, identical to the case of the RNN model, the time steps were set to 24. ReLU and Adam were used as the activation and optimization functions, respectively, and 100 epochs were applied. Table 2 shows the hyper-parameter search results corresponding to the top three ranks, and the parameter corresponding to the first rank was applied to the model design.
The optimal values derived using the hyper-parameter search were applied to the design of each model, and the training set constructed as shown in Fig. 1 was used to train each model. Subsequently, the test set was used to evaluate the performance based on the predicted values, which were the training results of each model. The RMSE was used as a basis for the performance evaluation, and the accuracy of each AQI was used to determine the detailed prediction accuracy. Fig. 2 shows the prediction results of the sections that showed rapid changes in concentration among the test data for each model. Table 3 shows the RMSE and detailed prediction accuracy based on the prediction results of each model.
Indicator | DNN | RNN | LSTM |
---|---|---|---|
RMSE | 8.3459 | 8.3653 | 8.3964 |
Overall accuracy | 87.38% | 87.58% | 87.1% |
Accuracy for “good” AQI | 79.9% | 84.28% | 79.2% |
Accuracy for “moderate” AQI | 93.14% | 91.22% | 92.8% |
Accuracy for “bad” AQI | 75.5% | 75.24% | 76.83% |
Accuracy for “very bad” AQI | 65.44% | 72.79% | 66.18% |
In the case of a DNN, as shown in Fig. 2, the difference between the actual PM concentrations and the predicted values was small; however, in a section of 150-200 μg/m2, where the concentration changed rapidly, the PM concentrations were overestimated or underestimated compared to the actual concentrations. Furthermore, in the case of a PM concentration higher than 100 μg/m2 in that section, the difference in concentration was relatively large compared to that of the other sections. In the case of an RNN, the errors were not large between the actual PM concentrations and the predicted values. However, the concentration was overestimated for a typically high-concentration PM higher than 100 μg/m2. In contrast, in the case of a low-concentration PM, the actual PM concentration and predicted value were relatively close. In the case of LSTM, the concentration was overestimated for a typically high-concentration PM higher than 100 μg/m2. In a section of 150-170 μg/m2, particularly, where rapid changes in concentration were observed, the error was large between the actual PM concentration and the predicted value.
When the overall RMSE was compared, the RMSE of the DNN was 8.3459, indicating that it was a better indicator than the other models. However, in the case of overall accuracy, the RNN had a value of 87.58%, showing a higher accuracy compared to the other models. For a “good” AQI, the RNN model showed the highest accuracy with a value of 84.28%, and in the case of a “moderate” AQI, the DNN model showed the highest accuracy with a value of 93.14%. In the case of a “bad” AQI, the accuracy of the LSTM model was 76.83%, which was higher than that of other models, and in the case of a “very bad” AQI, the accuracy of the RNN model was 72.79%, which was higher than that of other models. When the DNN and RNN models were compared based on the RMSE and overall accuracy, the prediction accuracy of the RNN model was higher than that of the DNN model. However, the margin of error for the samples that were not successfully predicted was narrower in the DNN model than in the RNN model. In this case, the evaluation results showed that the DNN model was more stable than the RNN model in terms of the margin of error.
The air pollution problem caused by increases in population, automobiles, and industrial activities has emerged as a major environmental problem worldwide. In particular, as the effect of PM on the human body has been revealed, people are using PM forecasts in daily life to respond preemptively to the air environment. Accordingly, people are demanding a high level of PM prediction accuracy, and prediction studies using a variety of methods are underway. Among them, studies are actively being conducted on the prediction of PM concentrations using neural networks.
In this study, we used neural network algorithms that have been applied in many studies to predict the PM concentration, and conducted a performance analysis and evaluation for each model to select an algorithm that is suitable for PM concentration prediction. To this end, we collected weather and air pollutant data monitored over a 10 year period in the Cheonan region of South Korea. The training, validation, and test sets were constructed from the collected data for use in the training of the prediction models. The data were then preprocessed to minimize the learning problem that occurs when the characteristics of the data differ from each other. Because the wind direction is categorical data expressed in terms of 16 directions, it was converted into a vector type of 0s and 1s using one-hot encoding. However, because the other data had different scales, min-max scaling was used to convert the range of numerical representations of the data into a range of values between zero and one. DNN, RNN, and LSTM were selected as the neural network algorithms used for the performance evaluation of the prediction models. For optimal prediction performance, the optimal values of the parameters applied to each algorithm are required. A hyper-parameter search was conducted to obtain them, which were then applied to the model design. The same data were used to train the designed models and evaluate their performances.
To compare the performance of the prediction models constructed using neural network algorithms, we checked the trend changes of the actual and predicted values. Furthermore, the detailed prediction accuracies were examined by classifying them based on RMSE and AQI. For the trend changes derived using each model, the differences between the models were minimal. However, in the performance comparison using the RMSE and the level of accuracy, the DNN model showed a lower RMSE than the other algorithms, showing stably predicted values. In the case of an RNN, the stability was slightly lower than that of the other algorithms, although the accuracy was higher.
For a predicted PM concentration, the accuracy is important in terms of consistency between the actual and predicted values; however, considering the wide concentration range, it was determined that a DNN-based prediction model with a small margin of error is suitable. In the future, we plan to use neural network algorithms to conduct a study to reduce the margin of error in PM concentration prediction. It is expected that the construction of a prediction model with better performance will increase the use of reliable prediction information.
Category | Variable | Number of Data | Number of Missing Data |
---|---|---|---|
Meteorological elements | Temperature | 87,622 | 26 |
Wind Speed | 87,612 | 36 | |
Wind Direction | 87,601 | 47 | |
Air pollutants | PM10 | 249,268 | 13,676 |
O3 | 255,464 | 8,480 | |
CO | 252,924 | 10,020 | |
NO2 | 254,244 | 8,700 | |
SO2 | 252,623 | 10,321 |
Rank | Layer | Hidden Node | L2 | Dropout Rate | Batch Size | |
---|---|---|---|---|---|---|
DNN | 1 | 2 | 100 | 0.01 | 0.1 | 100 |
2 | 3 | 140 | 0.001 | 0.5 | 60 | |
3 | 2 | 60 | 0.01 | 0.1 | 80 | |
RNN | 1 | 1 | 80 | 0 | 01 | 20 |
2 | 1 | 60 | 0 | 0 | 20 | |
3 | 2 | 100 | 0 | 0 | 20 | |
LSTM | 1 | 2 | 80 | 0 | 0.3 | 60 |
2 | 2 | 40 | 0 | 0.2 | 40 | |
3 | 2 | 40 | 0 | 0.1 | 20 |
Indicator | DNN | RNN | LSTM |
---|---|---|---|
RMSE | 8.3459 | 8.3653 | 8.3964 |
Overall accuracy | 87.38% | 87.58% | 87.1% |
Accuracy for “good” AQI | 79.9% | 84.28% | 79.2% |
Accuracy for “moderate” AQI | 93.14% | 91.22% | 92.8% |
Accuracy for “bad” AQI | 75.5% | 75.24% | 76.83% |
Accuracy for “very bad” AQI | 65.44% | 72.79% | 66.18% |
Chae-Rim Han, Sun-Jin Lee, and Il-Gu Lee
Journal of information and communication convergence engineering 2023; 21(3): 198-207 https://doi.org/10.56977/jicce.2023.21.3.198Jae-Kyung Lee and Jae-Hong Yim
Journal of information and communication convergence engineering 2023; 21(1): 45-53 https://doi.org/10.56977/jicce.2023.21.1.45YA Chen , Tan Juan , and Hoekyung Jung
Journal of information and communication convergence engineering 2023; 21(1): 9-16 https://doi.org/10.56977/jicce.2023.21.1.9