Journal of information and communication convergence engineering 2023; 21(3): 208-215
Published online September 30, 2023
https://doi.org/10.56977/jicce.2023.21.3.208
© Korea Institute of Information and Communication Engineering
Correspondence to : Somrudee Deepaisarn (E-mail: somrudee@siit.tu.ac.th)
Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani 12120, Thailand
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Deep learning techniques provide powerful solutions to several pattern-recognition problems, including Raman spectral classification. However, these networks require large amounts of labeled data to perform well. Labeled data, which are typically obtained in a laboratory, can potentially be alleviated by data augmentation. This study investigated various data augmentation techniques and applied multiple deep learning methods to Raman spectral classification. Raman spectra yield fingerprint-like information about chemical compositions, but are prone to noise when the particles of the material are small. Five augmentation models were investigated to build robust deep learning classifiers: weighted sums of spectral signals, imitated chemical backgrounds, extended multiplicative signal augmentation, and generated Gaussian and Poisson-distributed noise. We compared the performance of nine state-of-the-art convolutional neural networks with all the augmentation techniques. The LeNet5 models with background noise augmentation yielded the highest accuracy when tested on real-world Raman spectral classification at 88.33% accuracy. A class activation map of the model was generated to provide a qualitative observation of the results.
Keywords Raman Spectral Classification, Data Augmentation, Convolutional Neural Networks (CNN), Hard Disk Drive, Small Particle
Hard disk drives (HDD) are essential for data storage in computer systems. To achieve a high areal density, the flying height at the head-disk interface (HDI) is minimized to the order of nanometers [1]. At this scale, even small contaminants can enter the HDI and potentially cause HDD failure. There are two main sources of contamination [2]: inside the HDD itself, because of cracked particles in the disk, and in the HDD assembly production line. In this work, we are interested in identifying the contaminating particles from the HDD assembly production line so that we can specify the contamination stage and prevent it from recurring.
Spectroscopic techniques are commonly used to study and investigate the characteristics of contaminating particles in HDD. Examples of these techniques are time‐of‐flight‐secondary ion mass spectrometry (TOF‐SIMS), Fourier transform infrared (FT‐IR) spectroscopy, X‐ray photoelectron spectroscopy (XPS), energy-dispersive X-ray spectroscopy (EDS), and Raman spectroscopy [3, 4].
Raman spectroscopy can be used to identify the material types. This principle is based on measuring the molecular vibrational energy stages, which reveal the unique characteristics of the testing materials. A contaminated particle can be represented by a Raman spectrum, which is a plot of signal intensity against wavenumber. Because of the unique spectral pattern of each sample (
Several studies have focused on applying deep-learning (DL) models to identify substances of interest using Raman spectra. The studies in [5-8] showed that DL models can automatically extract useful features and outperform conventional methods based on machine learning, such as K-nearest neighbors. However, DL methods require a large amount of labeled data to train models. Obtaining numerous Raman spectra is time-consuming and tedious. Data augmentation provides reasonable solutions to boost the size of the training data by simulating samples with some added variations to the original samples, which helps improve the robustness and generalization of the models.
In this study, we examine the impact of noise augmentation in terms of the performance and robustness of Raman spectral classification, such that the material identification of contaminants present in HDD is improved. The following five types of augmentation were examined: (1) weighted sums of the spectral signals; (2) imitated chemical backgrounds,
DL approaches have shown potential in computer vision and natural language processing [10]. One of the most popular models is a convolutional neural network (CNN), which comprises a series of convolutional layers for feature extraction, followed by layers for classification. Notable DL models include LeNet-5 [11], AlexNet [12], VGG16 [13], GoogLeNet [14], ResNet [15], SqueezeNet [16], Xception [17], DenseNet [18], and MobileNet [19]. Several state-ofthe- art CNN models have been applied to Raman spectral classification in many application areas. Ho et al. [20] proposed a CNN based on ResNet to classify 30 bacterial pathogens using Raman spectra. The proposed CNN model yielded an accuracy of 82%, outperforming the baseline models (
DL is a technique that requires a large amount of data to build models of the relationships between input and output pairs. Data augmentation was necessary to increase the number of samples in the training dataset by adding useful variations to the collected samples. Several methods have been introduced and applied, such as adding Gaussian noise [21], using an autoencoder [24], and using a generative adversarial network [25]. In this study, we investigated five types of data augmentation and compared them based on the classification performances of nine CNN models.
In this study, we investigated five augmentation methods to boost the number of training spectra by comparing the classification performance of various CNN models. The framework is illustrated in Fig. 1. Considering 10 substances (
All computational experiments in this study were performed using a dataset containing 500 noisy spectra of substances that presumably contaminated the HDD during fabrication. Fig. 2 shows the process of Raman spectral acquisition from a small contaminating particle in the HDD with a schematic diagram. Unfortunately, the smaller the particle, the more challenging the Raman spectrum. In this study, the samples were prepared in the form of micro- to nano-sized particles suspended in a suitable solvent to obtain noisy spectral signals that imitate the contamination of HDD. The dispersion was then dropped onto the HDD. After solvent evaporation, the particles were deposited on the surface of the HDD to acquire the Raman spectra. To generate noisy Raman spectra, a laser was fired at the rims of the particles. This helped achieve small-sized contamination-like spectra, which generally varied from micron to submicron levels. Clean spectra were acquired for larger particles.
Ten common contaminants were observed in HDD product lines. These contaminations can cause disk failure if present in sensitive parts such as the magnetic head. The ten classes of selected samples were cellulose, polycarbonate (PC), lowdensity polyethylene (LDPE), high-density polyethylene (HDPE), polyethylene terephthalate (PET), polyvinyl chloride (PVC), polytetrafluoroethylene (PTFE), polyoxymethylene (POM), polyether ether ketone (PEEK), and polypropylene (PP). Each class contained 50 spectra.
The modality used for the spectral acquisition was the Raman spectrometer (Model: DXR, Thermo Fisher Scientific Inc., USA), which is equipped with the 532 nm laser (visible green light). Appropriate calibration was performed prior to acquisition. The laser power was set at a constant value of 10 mW. The acquisition range was 99-3500 cm−1. A spectrum was generated from accumulated signals of 100 cycles using a 50 μm pinhole. Photobleaching was performed for up to 3 min.
We applied an improved modified polynomial (IModPoly) baseline algorithm [26] to remedy fluorescence noise. The baseline-corrected spectra of each substance are shown in Fig. 3. Thereafter, for each substance, 20 spectra were selected to create a synthetic training dataset, and the remaining 30 spectra were retained in the test set. Specifically, the test dataset consisted of 300 measured noisy spectra, with each class containing 30 spectra.
In this study, we consider a situation in which the number of measured noisy spectra is limited, which is insufficient for effective model training. As previously mentioned, 20 noisy spectra measured per class were selected to create a synthetic training dataset. Our goal was to obtain 200 synthetic spectra per class (2,000 synthetic spectra in total). Therefore, we synthesized 10 spectra from every spectrum in the training set using different augmentation methods and compared the results of each method.
We investigated five spectral augmentation techniques: weighted sums of spectral signals imitated chemical backgrounds from substrates and air, extended multiplicative signal augmentation (EMSA) [27], generated Gaussian-distributed noise, and generated Poisson-distributed noise. We determined a data augmentation approach that performed well in classifying noisy Raman spectra against a baseline approach of generating randomly weighted sums of signal intensity.
1) Weighted-Sum Augmentation
Using this method, a synthetic spectrum
2) Background Noise Augmentation
In this study, we were interested in the contaminations found on the following substrates in the hard disk head: aluminum oxide (Al2O3), aluminum oxide coated with diamond- like carbon (Al2O3 + DLC), aluminum oxide-titanium carbide (Al2O3 + TiC), and aluminum oxide-titanium carbide coated with diamond-like carbon (Al2O3 + TiC + DLC). As a result, the measured Raman spectrum of a substance can be affected by substrate and air noise. We recorded 10 spectra for each substrate and air noise. Based on the background noise augmentation, each synthetic spectrum was generated by superimposing the measured, substrate, and air noise spectra.
3) Extended Multiplicative Signal Augmentation
Extended multiplicative signal augmentation (EMSA) is based on the theoretical concepts underlying extended multiplicative signal correction (EMSC), which is a baseline correction algorithm that is applied to Raman and infrared spectral data. The EMSC eliminates unwanted backgrounds, where modes of variation due to chemical, instrumental, and physical background signals can be modeled with a mathematical expression. EMSA uses knowledge from the extracted background noise to suggest spectral augmentation by adding known captured noise variations [27]. To vary the distortion, Gaussian-distributed random numbers with zero mean and varying standard deviations were applied to alter the parameter coefficients in the EMSA model, resulting in various augmented spectra [26].
4) Statistical Noise Augmentation
There are two popular statistical noise models for simulating the behaviors of natural noise that occur in instrumental data acquisition processes [28]: Gaussian and Poisson noise. The Gaussian noise model represents environmental and electrical noises. Based on a Gaussian noise model, random variations drawn from the same statistical distribution were added to the measured spectrum. By contrast, the Poisson noise model denotes signal-dependent shot noise. Random Poisson noise can be simulated by calculating the square root of signal intensity multiplied by a Gaussian random number [29,30].
The range of the signal intensities (either synthetic or measured) varied from one spectrum to another. Min-max nor- malization was applied to scale the signals into a common range. In this study, we scaled all spectra using min-max normalization over the training and test datasets. Let
We investigated and compared the augmentation methods based on the performance of nine state-of-the-art CNN models: LeNet5 [11], AlexNet [12], VGG16 [13], GoogLeNet [14], ResNet [15], SqueezeNet [16], Xception [17], DenseNet [18], and MobileNet [19]. As they were proposed for image processing, their structures originally consisted of twodimensional (2D) convolutional and 2D pooling layers. In spectral classification applications, we implemented onedimensional (1D) versions of these CNN models by replacing any 2D layers with the corresponding 1D layers. Hyperparameters, such as the number of filters and filter sizes, were kept the same as in the original models.
The structure of the CNN model used in the spectral classification is shown in Fig. 4. It consists of a feature extractor and classifier. A CNN model was applied to extract a set of features (
We generated five synthetic training datasets from 200 measured noisy spectra using the weighted sum, background noise, EMSA, Gaussian noise, and Poisson noise methods. The last two methods are statistical noise-augmentation methods. Each training dataset consisted of 2,000 synthetic spectra (10 classes, 200 spectra per class). All the trained models were evaluated using the same test dataset containing 300 measured noisy spectra.
The models were trained by minimizing the categorical cross-entropy. The Adam optimizer was used with the following default values: training rate = 0.001, β1 = 0.9, β2 = 0.999, and ε = 10−7. The batch size and the number of epochs were 32 and 30, respectively.
The classification performances were measured using the accuracy score (%), which was obtained from Equation (1):
where
This section compares the five augmentation methods. Table 1 lists the accuracy scores of the nine CNN models trained using five synthetic training datasets from the weighted-sum, background noise, EMSA, Gaussian, and Poisson methods, as explained in Section III-C. The signalto- noise ratios for the Gaussian and Poisson methods were set to 20 dB. Accuracy scores were obtained by classifying the test dataset, which consisted of 300 measured noisy spectra.
Table 1 . Accuracy scores (%) of CNN models trained using synthesis spectra from different augmentation methods. Bold indicates the highest accuracy score for each CNN model
Augmentation | LeNet5 | AlexNet | VGG16 | GoogLeNet | ResNet | SqueezeNet | Xception | DenseNet | MobileNet |
---|---|---|---|---|---|---|---|---|---|
Weighted Sum | 48.67 | 44.67 | 46.67 | 52.67 | 44.00 | 44.00 | 59.33 | 63.33 | 47.00 |
Background Noise | 88.33 | 82.33 | 69.33 | 77.33 | 79.00 | 73.67 | 88.00 | 82.33 | 77.00 |
EMSA | 51.33 | 45.00 | 44.33 | 33.33 | 21.33 | 19.33 | 48.00 | 52.00 | 16.33 |
Gaussian | 68.67 | 54.33 | 64.33 | 67.67 | 50.00 | 46.33 | 57.33 | 67.67 | 52.00 |
Poisson | 82.67 | 65.00 | 63.67 | 73.67 | 43.00 | 63.67 | 58.67 | 64.67 | 59.00 |
In addition, we compared the accuracy scores with those of CNN models built using the weighted-sum training dataset. In this method, each synthetic spectrum was computed using a random weighted sum of 20 measured noisy spectra of the same class. Similarly, 200 synthetic spectra were generated for each class. The weighted-sum method represents the baseline performance because it eliminates the effects of noise and produces more data with fewer variations.
The following findings were obtained. First, unlike the weighted-sum method, the other augmentation methods created training datasets by adding more variations to the original sample spectra. Consequently, they achieved higher accuracy scores than the weighted-sum method in many cases, except for the EMSA. Second, background noise augmentation resulted in the highest accuracy score among the methods of interest. Third, LeNet5 achieved the highest accuracy scores in many cases, even though it was a simple and shallow CNN model. Finally, LeNet5, which was trained using the dataset created using the background noise method, achieved the highest accuracy score of 88.33%. An EMSA provides augmented spectra with known functional variations. Similarly, the Gaussian and Poisson noise methods add known statistical distributions to augmented spectra. In contrast, the background noise method adds practical complexity to the training data. This allowed the classification models to seek and learn from the actual variations within the labeled spectra rather than the underlying backgrounds, making the generated spectra robust for training DL models.
We further analyzed the classification performance using a confusion matrix. Fig. 5 shows the confusion matrix according to LeNet5 trained using the dataset created using the background noise method (which offers the best accuracy score, as previously mentioned). The results were based on the classification of the test dataset, which consisted of 30 spectra per substance. The rows represent the actual substances, and the columns denote the predicted substances. The model performed very well in classifying all the substances except POM. Only 17 POM spectra were correctly identified, whereas 10 were misclassified as PVC.
A class activation map (CAM) [31] was used to understand how a model predicted the output. In spectral classification, CAMs can be applied for building heat maps to show the values and their wavenumbers on which the model focuses when computing the prediction. In Fig. 6, we used HiResCAM [32] to show the CAMs of 10 substances according to LeNet5 trained on the dataset created from the background noise method. Compared with the manner in which human experts recognize spectra, the model clearly focuses on peak patterns to differentiate substances, except for LDPE and HDPE. For LDPE and HDPE, we observed common peak locations with the other substances. Consequently, the model seeks other locations to indicate LDPE and HDPE.
Practical approaches for Raman Spectral data augmentation, specifically to improve the DL classification of small contaminants in HDD, were investigated. Nine deep learning classification models were applied to the data from five different augmentation techniques. The results were examined by comparing the classification performance and robustness of the classifiers, assuming that they resulted from the variations generated during augmentation. Augmented data are required to mimic the real-world behavior of noise in Raman spectra to achieve good classification performance. Consequently, an appropriate noise augmentation model can enhance the performance of classifiers when the amount of original spectral data is insufficient for training the DL models. The background noise augmentation resulted in the highest accuracy scores, and the LeNet5 model revealed the highest accuracy scores in many cases, even with the application of a simple CNN model. LeNet5 combined with the background noise method achieved the highest accuracy score of 88.33%. The CAM of LeNet5 provides evidence of the robustness of the model because the highlighted wavenumbers and peaks match those used by chemists to characterize these substances.
Moreover, the gain in accuracy of DL models, regardless of the model, indicates that background noise augmentation has the potential to characterize the variations of noisy spectra acquired from small particles in real-world practice. This allows classification models to learn actual variations overlaid on noisy backgrounds. Hence, the results demonstrate the application of computed noise and data augmentation in a data-centric approach to artificial intelligence, which allows the training of high-performance DL models, even if the available spectral dataset is small.
This study paves the way for future research. For example, a detailed analysis of the statistical noise behavior underlying Raman spectra can be performed to create better approximations of real spectra and further improve the performance of DL classification. The other is the development of classification pipelines that apply deep learning models for denoising purposes to improve the quality of the spectra before attempting to classify them. The latter may also assist classification models in achieving greater accuracy, whereas clean spectra are informative by-products that can be investigated using various techniques.
Journal of information and communication convergence engineering 2023; 21(3): 208-215
Published online September 30, 2023 https://doi.org/10.56977/jicce.2023.21.3.208
Copyright © Korea Institute of Information and Communication Engineering.
Seksan Laitrakun 1, Somrudee Deepaisarn 1*, Sarun Gulyanon 2, Chayud Srisumarnk 1, Nattapol Chiewnawintawat 1, Angkoon Angkoonsawaengsuk 1, Pakorn Opaprakasit 1, Jirawan Jindakaew 1, and Narisara Jaikaew1
1Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani 12120, Thailand
2College of Interdisciplinary Studies, Thammasat University, Pathum Thani 12120, Thailand
Correspondence to:Somrudee Deepaisarn (E-mail: somrudee@siit.tu.ac.th)
Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani 12120, Thailand
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Deep learning techniques provide powerful solutions to several pattern-recognition problems, including Raman spectral classification. However, these networks require large amounts of labeled data to perform well. Labeled data, which are typically obtained in a laboratory, can potentially be alleviated by data augmentation. This study investigated various data augmentation techniques and applied multiple deep learning methods to Raman spectral classification. Raman spectra yield fingerprint-like information about chemical compositions, but are prone to noise when the particles of the material are small. Five augmentation models were investigated to build robust deep learning classifiers: weighted sums of spectral signals, imitated chemical backgrounds, extended multiplicative signal augmentation, and generated Gaussian and Poisson-distributed noise. We compared the performance of nine state-of-the-art convolutional neural networks with all the augmentation techniques. The LeNet5 models with background noise augmentation yielded the highest accuracy when tested on real-world Raman spectral classification at 88.33% accuracy. A class activation map of the model was generated to provide a qualitative observation of the results.
Keywords: Raman Spectral Classification, Data Augmentation, Convolutional Neural Networks (CNN), Hard Disk Drive, Small Particle
Hard disk drives (HDD) are essential for data storage in computer systems. To achieve a high areal density, the flying height at the head-disk interface (HDI) is minimized to the order of nanometers [1]. At this scale, even small contaminants can enter the HDI and potentially cause HDD failure. There are two main sources of contamination [2]: inside the HDD itself, because of cracked particles in the disk, and in the HDD assembly production line. In this work, we are interested in identifying the contaminating particles from the HDD assembly production line so that we can specify the contamination stage and prevent it from recurring.
Spectroscopic techniques are commonly used to study and investigate the characteristics of contaminating particles in HDD. Examples of these techniques are time‐of‐flight‐secondary ion mass spectrometry (TOF‐SIMS), Fourier transform infrared (FT‐IR) spectroscopy, X‐ray photoelectron spectroscopy (XPS), energy-dispersive X-ray spectroscopy (EDS), and Raman spectroscopy [3, 4].
Raman spectroscopy can be used to identify the material types. This principle is based on measuring the molecular vibrational energy stages, which reveal the unique characteristics of the testing materials. A contaminated particle can be represented by a Raman spectrum, which is a plot of signal intensity against wavenumber. Because of the unique spectral pattern of each sample (
Several studies have focused on applying deep-learning (DL) models to identify substances of interest using Raman spectra. The studies in [5-8] showed that DL models can automatically extract useful features and outperform conventional methods based on machine learning, such as K-nearest neighbors. However, DL methods require a large amount of labeled data to train models. Obtaining numerous Raman spectra is time-consuming and tedious. Data augmentation provides reasonable solutions to boost the size of the training data by simulating samples with some added variations to the original samples, which helps improve the robustness and generalization of the models.
In this study, we examine the impact of noise augmentation in terms of the performance and robustness of Raman spectral classification, such that the material identification of contaminants present in HDD is improved. The following five types of augmentation were examined: (1) weighted sums of the spectral signals; (2) imitated chemical backgrounds,
DL approaches have shown potential in computer vision and natural language processing [10]. One of the most popular models is a convolutional neural network (CNN), which comprises a series of convolutional layers for feature extraction, followed by layers for classification. Notable DL models include LeNet-5 [11], AlexNet [12], VGG16 [13], GoogLeNet [14], ResNet [15], SqueezeNet [16], Xception [17], DenseNet [18], and MobileNet [19]. Several state-ofthe- art CNN models have been applied to Raman spectral classification in many application areas. Ho et al. [20] proposed a CNN based on ResNet to classify 30 bacterial pathogens using Raman spectra. The proposed CNN model yielded an accuracy of 82%, outperforming the baseline models (
DL is a technique that requires a large amount of data to build models of the relationships between input and output pairs. Data augmentation was necessary to increase the number of samples in the training dataset by adding useful variations to the collected samples. Several methods have been introduced and applied, such as adding Gaussian noise [21], using an autoencoder [24], and using a generative adversarial network [25]. In this study, we investigated five types of data augmentation and compared them based on the classification performances of nine CNN models.
In this study, we investigated five augmentation methods to boost the number of training spectra by comparing the classification performance of various CNN models. The framework is illustrated in Fig. 1. Considering 10 substances (
All computational experiments in this study were performed using a dataset containing 500 noisy spectra of substances that presumably contaminated the HDD during fabrication. Fig. 2 shows the process of Raman spectral acquisition from a small contaminating particle in the HDD with a schematic diagram. Unfortunately, the smaller the particle, the more challenging the Raman spectrum. In this study, the samples were prepared in the form of micro- to nano-sized particles suspended in a suitable solvent to obtain noisy spectral signals that imitate the contamination of HDD. The dispersion was then dropped onto the HDD. After solvent evaporation, the particles were deposited on the surface of the HDD to acquire the Raman spectra. To generate noisy Raman spectra, a laser was fired at the rims of the particles. This helped achieve small-sized contamination-like spectra, which generally varied from micron to submicron levels. Clean spectra were acquired for larger particles.
Ten common contaminants were observed in HDD product lines. These contaminations can cause disk failure if present in sensitive parts such as the magnetic head. The ten classes of selected samples were cellulose, polycarbonate (PC), lowdensity polyethylene (LDPE), high-density polyethylene (HDPE), polyethylene terephthalate (PET), polyvinyl chloride (PVC), polytetrafluoroethylene (PTFE), polyoxymethylene (POM), polyether ether ketone (PEEK), and polypropylene (PP). Each class contained 50 spectra.
The modality used for the spectral acquisition was the Raman spectrometer (Model: DXR, Thermo Fisher Scientific Inc., USA), which is equipped with the 532 nm laser (visible green light). Appropriate calibration was performed prior to acquisition. The laser power was set at a constant value of 10 mW. The acquisition range was 99-3500 cm−1. A spectrum was generated from accumulated signals of 100 cycles using a 50 μm pinhole. Photobleaching was performed for up to 3 min.
We applied an improved modified polynomial (IModPoly) baseline algorithm [26] to remedy fluorescence noise. The baseline-corrected spectra of each substance are shown in Fig. 3. Thereafter, for each substance, 20 spectra were selected to create a synthetic training dataset, and the remaining 30 spectra were retained in the test set. Specifically, the test dataset consisted of 300 measured noisy spectra, with each class containing 30 spectra.
In this study, we consider a situation in which the number of measured noisy spectra is limited, which is insufficient for effective model training. As previously mentioned, 20 noisy spectra measured per class were selected to create a synthetic training dataset. Our goal was to obtain 200 synthetic spectra per class (2,000 synthetic spectra in total). Therefore, we synthesized 10 spectra from every spectrum in the training set using different augmentation methods and compared the results of each method.
We investigated five spectral augmentation techniques: weighted sums of spectral signals imitated chemical backgrounds from substrates and air, extended multiplicative signal augmentation (EMSA) [27], generated Gaussian-distributed noise, and generated Poisson-distributed noise. We determined a data augmentation approach that performed well in classifying noisy Raman spectra against a baseline approach of generating randomly weighted sums of signal intensity.
1) Weighted-Sum Augmentation
Using this method, a synthetic spectrum
2) Background Noise Augmentation
In this study, we were interested in the contaminations found on the following substrates in the hard disk head: aluminum oxide (Al2O3), aluminum oxide coated with diamond- like carbon (Al2O3 + DLC), aluminum oxide-titanium carbide (Al2O3 + TiC), and aluminum oxide-titanium carbide coated with diamond-like carbon (Al2O3 + TiC + DLC). As a result, the measured Raman spectrum of a substance can be affected by substrate and air noise. We recorded 10 spectra for each substrate and air noise. Based on the background noise augmentation, each synthetic spectrum was generated by superimposing the measured, substrate, and air noise spectra.
3) Extended Multiplicative Signal Augmentation
Extended multiplicative signal augmentation (EMSA) is based on the theoretical concepts underlying extended multiplicative signal correction (EMSC), which is a baseline correction algorithm that is applied to Raman and infrared spectral data. The EMSC eliminates unwanted backgrounds, where modes of variation due to chemical, instrumental, and physical background signals can be modeled with a mathematical expression. EMSA uses knowledge from the extracted background noise to suggest spectral augmentation by adding known captured noise variations [27]. To vary the distortion, Gaussian-distributed random numbers with zero mean and varying standard deviations were applied to alter the parameter coefficients in the EMSA model, resulting in various augmented spectra [26].
4) Statistical Noise Augmentation
There are two popular statistical noise models for simulating the behaviors of natural noise that occur in instrumental data acquisition processes [28]: Gaussian and Poisson noise. The Gaussian noise model represents environmental and electrical noises. Based on a Gaussian noise model, random variations drawn from the same statistical distribution were added to the measured spectrum. By contrast, the Poisson noise model denotes signal-dependent shot noise. Random Poisson noise can be simulated by calculating the square root of signal intensity multiplied by a Gaussian random number [29,30].
The range of the signal intensities (either synthetic or measured) varied from one spectrum to another. Min-max nor- malization was applied to scale the signals into a common range. In this study, we scaled all spectra using min-max normalization over the training and test datasets. Let
We investigated and compared the augmentation methods based on the performance of nine state-of-the-art CNN models: LeNet5 [11], AlexNet [12], VGG16 [13], GoogLeNet [14], ResNet [15], SqueezeNet [16], Xception [17], DenseNet [18], and MobileNet [19]. As they were proposed for image processing, their structures originally consisted of twodimensional (2D) convolutional and 2D pooling layers. In spectral classification applications, we implemented onedimensional (1D) versions of these CNN models by replacing any 2D layers with the corresponding 1D layers. Hyperparameters, such as the number of filters and filter sizes, were kept the same as in the original models.
The structure of the CNN model used in the spectral classification is shown in Fig. 4. It consists of a feature extractor and classifier. A CNN model was applied to extract a set of features (
We generated five synthetic training datasets from 200 measured noisy spectra using the weighted sum, background noise, EMSA, Gaussian noise, and Poisson noise methods. The last two methods are statistical noise-augmentation methods. Each training dataset consisted of 2,000 synthetic spectra (10 classes, 200 spectra per class). All the trained models were evaluated using the same test dataset containing 300 measured noisy spectra.
The models were trained by minimizing the categorical cross-entropy. The Adam optimizer was used with the following default values: training rate = 0.001, β1 = 0.9, β2 = 0.999, and ε = 10−7. The batch size and the number of epochs were 32 and 30, respectively.
The classification performances were measured using the accuracy score (%), which was obtained from Equation (1):
where
This section compares the five augmentation methods. Table 1 lists the accuracy scores of the nine CNN models trained using five synthetic training datasets from the weighted-sum, background noise, EMSA, Gaussian, and Poisson methods, as explained in Section III-C. The signalto- noise ratios for the Gaussian and Poisson methods were set to 20 dB. Accuracy scores were obtained by classifying the test dataset, which consisted of 300 measured noisy spectra.
Table 1 . Accuracy scores (%) of CNN models trained using synthesis spectra from different augmentation methods. Bold indicates the highest accuracy score for each CNN model.
Augmentation | LeNet5 | AlexNet | VGG16 | GoogLeNet | ResNet | SqueezeNet | Xception | DenseNet | MobileNet |
---|---|---|---|---|---|---|---|---|---|
Weighted Sum | 48.67 | 44.67 | 46.67 | 52.67 | 44.00 | 44.00 | 59.33 | 63.33 | 47.00 |
Background Noise | 88.33 | 82.33 | 69.33 | 77.33 | 79.00 | 73.67 | 88.00 | 82.33 | 77.00 |
EMSA | 51.33 | 45.00 | 44.33 | 33.33 | 21.33 | 19.33 | 48.00 | 52.00 | 16.33 |
Gaussian | 68.67 | 54.33 | 64.33 | 67.67 | 50.00 | 46.33 | 57.33 | 67.67 | 52.00 |
Poisson | 82.67 | 65.00 | 63.67 | 73.67 | 43.00 | 63.67 | 58.67 | 64.67 | 59.00 |
In addition, we compared the accuracy scores with those of CNN models built using the weighted-sum training dataset. In this method, each synthetic spectrum was computed using a random weighted sum of 20 measured noisy spectra of the same class. Similarly, 200 synthetic spectra were generated for each class. The weighted-sum method represents the baseline performance because it eliminates the effects of noise and produces more data with fewer variations.
The following findings were obtained. First, unlike the weighted-sum method, the other augmentation methods created training datasets by adding more variations to the original sample spectra. Consequently, they achieved higher accuracy scores than the weighted-sum method in many cases, except for the EMSA. Second, background noise augmentation resulted in the highest accuracy score among the methods of interest. Third, LeNet5 achieved the highest accuracy scores in many cases, even though it was a simple and shallow CNN model. Finally, LeNet5, which was trained using the dataset created using the background noise method, achieved the highest accuracy score of 88.33%. An EMSA provides augmented spectra with known functional variations. Similarly, the Gaussian and Poisson noise methods add known statistical distributions to augmented spectra. In contrast, the background noise method adds practical complexity to the training data. This allowed the classification models to seek and learn from the actual variations within the labeled spectra rather than the underlying backgrounds, making the generated spectra robust for training DL models.
We further analyzed the classification performance using a confusion matrix. Fig. 5 shows the confusion matrix according to LeNet5 trained using the dataset created using the background noise method (which offers the best accuracy score, as previously mentioned). The results were based on the classification of the test dataset, which consisted of 30 spectra per substance. The rows represent the actual substances, and the columns denote the predicted substances. The model performed very well in classifying all the substances except POM. Only 17 POM spectra were correctly identified, whereas 10 were misclassified as PVC.
A class activation map (CAM) [31] was used to understand how a model predicted the output. In spectral classification, CAMs can be applied for building heat maps to show the values and their wavenumbers on which the model focuses when computing the prediction. In Fig. 6, we used HiResCAM [32] to show the CAMs of 10 substances according to LeNet5 trained on the dataset created from the background noise method. Compared with the manner in which human experts recognize spectra, the model clearly focuses on peak patterns to differentiate substances, except for LDPE and HDPE. For LDPE and HDPE, we observed common peak locations with the other substances. Consequently, the model seeks other locations to indicate LDPE and HDPE.
Practical approaches for Raman Spectral data augmentation, specifically to improve the DL classification of small contaminants in HDD, were investigated. Nine deep learning classification models were applied to the data from five different augmentation techniques. The results were examined by comparing the classification performance and robustness of the classifiers, assuming that they resulted from the variations generated during augmentation. Augmented data are required to mimic the real-world behavior of noise in Raman spectra to achieve good classification performance. Consequently, an appropriate noise augmentation model can enhance the performance of classifiers when the amount of original spectral data is insufficient for training the DL models. The background noise augmentation resulted in the highest accuracy scores, and the LeNet5 model revealed the highest accuracy scores in many cases, even with the application of a simple CNN model. LeNet5 combined with the background noise method achieved the highest accuracy score of 88.33%. The CAM of LeNet5 provides evidence of the robustness of the model because the highlighted wavenumbers and peaks match those used by chemists to characterize these substances.
Moreover, the gain in accuracy of DL models, regardless of the model, indicates that background noise augmentation has the potential to characterize the variations of noisy spectra acquired from small particles in real-world practice. This allows classification models to learn actual variations overlaid on noisy backgrounds. Hence, the results demonstrate the application of computed noise and data augmentation in a data-centric approach to artificial intelligence, which allows the training of high-performance DL models, even if the available spectral dataset is small.
This study paves the way for future research. For example, a detailed analysis of the statistical noise behavior underlying Raman spectra can be performed to create better approximations of real spectra and further improve the performance of DL classification. The other is the development of classification pipelines that apply deep learning models for denoising purposes to improve the quality of the spectra before attempting to classify them. The latter may also assist classification models in achieving greater accuracy, whereas clean spectra are informative by-products that can be investigated using various techniques.
Table 1 . Accuracy scores (%) of CNN models trained using synthesis spectra from different augmentation methods. Bold indicates the highest accuracy score for each CNN model.
Augmentation | LeNet5 | AlexNet | VGG16 | GoogLeNet | ResNet | SqueezeNet | Xception | DenseNet | MobileNet |
---|---|---|---|---|---|---|---|---|---|
Weighted Sum | 48.67 | 44.67 | 46.67 | 52.67 | 44.00 | 44.00 | 59.33 | 63.33 | 47.00 |
Background Noise | 88.33 | 82.33 | 69.33 | 77.33 | 79.00 | 73.67 | 88.00 | 82.33 | 77.00 |
EMSA | 51.33 | 45.00 | 44.33 | 33.33 | 21.33 | 19.33 | 48.00 | 52.00 | 16.33 |
Gaussian | 68.67 | 54.33 | 64.33 | 67.67 | 50.00 | 46.33 | 57.33 | 67.67 | 52.00 |
Poisson | 82.67 | 65.00 | 63.67 | 73.67 | 43.00 | 63.67 | 58.67 | 64.67 | 59.00 |