Journal of information and communication convergence engineering 2024; 22(2): 165-171
Published online June 30, 2024
https://doi.org/10.56977/jicce.2024.22.2.165
© Korea Institute of Information and Communication Engineering
Correspondence to : Thanh-Nghi Do (E-mail: dtnghi@ctu.edu.vn)
Department of Computer Networks, Can Tho University, Can Tho 94165, Viet Nam
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
In this study, we present a novel approach for enhancing chest X-ray image classification (normal, Covid-19, edema, mass nodules, and pneumothorax) by combining contrastive learning and machine learning algorithms. A vast amount of unlabeled data was leveraged to learn representations so that data efficiency is improved as a means of addressing the limited availability of labeled data in X-ray images. Our approach involves training classification algorithms using the extracted features from a linear fine-tuned Momentum Contrast (MoCo) model. The MoCo architecture with a Resnet34, Resnet50, or Resnet101 backbone is trained to learn features from unlabeled data. Instead of only fine-tuning the linear classifier layer on the MoCo-pretrained model, we propose training nonlinear classifiers as substitutes for softmax in deep networks. The empirical results show that while the linear fine-tuned ImageNet-pretrained models achieved the highest accuracy of only 82.9% and the linear fine-tuned MoCo-pretrained models an increased highest accuracy of 84.8%, our proposed method offered a significant improvement and achieved the highest accuracy of 87.9%.
Keywords Chest X-ray image, Contrastive learning, Image classification, Self-supervised learning
The lung is an important human organ. Lung diseases can affect health and even lead to death. According to the World Health Organization, the number of deaths attributed to lung-related diseases exceeded 3.3 million individuals in 2017. The Covid-19 pandemic has caused more than 6.9 million deaths and approximately 768 million infections as of June 28, 2023 (WHO, https://www.who.int). Chest X-ray is the most cost-effective diagnostic tool and a common method for diagnosing and screening lung diseases. However, disease diagnosis based on chest X-ray images requires highly skilled radiologists. The potential subjectivity of lung diseases detection from X-ray images can lead to inaccurate diagnostic results and less effective treatments. Therefore, a system that can assist in disease diagnosis from chest radiographs will provide significant treatment cost benefits to patients and improve their physical and mental health.
Deep learning has been widely applied to solve problems related to medical image data such as X-ray, computed tomography (CT), and magnetic resonance imaging (MRI) images in recent years and achieved promising results [1-3]. However, deep learning requires large amounts of annotated data. Unfortunately, the availability of labeled medical data is limited owing to a number of factors such as a lack of domain experts, privacy regulations, and expensive and timeconsuming data labeling processes. Self-supervised learning is an alternative approach in which the wealth of available unlabeled data is leveraged for the learning of useful representations or features without explicit human annotation. Impressive results have been obtained in this approach by utilizing unlabeled data images to generate pretrained models, which were subsequently fine-tuned using a limited amount of labeled data [4-7]. The pretrained model was constructed by maximizing the agreement between different views of the same image and minimizing the agreement between different images based on a loss function. Self-supervised learning offers improved accuracy compared to supervised learning using labeled data [4-8].
In this study, a novel approach is proposed for improving the X-ray image classification of normal lungs and lungs with four lung diseases comprising Covid-19, edema, massnodule, and pneumothorax. In our approach, a self-supervised learning technique is combined with supervised learning algorithms to efficiently classify X-ray images of lung diseases. The linear fine-tuned model obtained from Momentum Contrast (MoCo) contrastive learning is used as a feature extractor for labeled X-ray images. The extracted features are then trained on classification algorithms comprising support vector machine (SVM) [9], LightGBM [10], XGBoost [11], and CatBoost [12], which are used to substitute for softmax in deep networks. The experimental results show that our proposed approach achieves better accuracy than those of linear fine-tuned MoCo-pretrained models.
The remainder of this paper is organized as follows: The related work on X-ray image lung disease classification is briefly discussed in Section II. Our proposed method is presented in Section III and the experimental results in Section IV. The conclusions and future work are presented in Section V.
Deep learning techniques have been applied to lung disease detection from chest X-ray images. A convolutional neural network for image classification problems with 12 classes was proposed and an accuracy of 86% achieved [13]. Chouhan et al. [14] investigated the combination of five deep learning models with transfer learning for pneumonia detection in chest X-ray images and achieved an accuracy of 96.4% using their ensemble model. A CNN architecture called CheXNeXt for classifying 14 different pathologies using chest X-ray images as the input was developed [15]. Bhandary et al. suggested a deep learning framework based on a modified version of AlexNet and a SVM to detect lung abnormalities in chest X-ray and CT images [16]. A novel method that incorporates local variance analysis and a probabilistic neural network for classifying lung carcinomas with 92% accuracy was introduced in [17]. A transfer learning method involving two convolutional neural networks (VGG16 and InceptionV3) was applied in [18] for pneumonia classification in chest X-ray images.
The outbreak of the Covid-19 pandemic has resulted in many studies on diagnosing Covid-19 through the application of deep learning to chest X-ray images. In [19], a SVM model was employed on top of deep networks for detecting Covid-19 in chest X-ray images and the highest accuracy of 96.16% was achieved. In [20], Afshar et al. proposed a model framework based on capsule networks named COVID-CAPS for diagnosing Covid-19 from X-ray images. An accuracy of 95.7%, sensitivity of 90%, and specificity of 95.8% were achieved. A new deep learning framework called COVIDX-Net was presented in [21], in which seven deep convolutional neural network architectures were used to construct a framework for classifying X-ray images. The highest F1-scores of 89% and 91% were achieved for normal patients and patients infected by Covid-19, respectively. Ozturk et al. developed a model for automatic Covid-19 diagnosis using X-ray images as input [22]. Their system achieved an accuracy of 98.08% for two-class scenarios (Covid-19 and No-Findings) and 87.02% for multiclass scenarios (Covid-19, No-Findings, and pneumonia).
Self-supervised learning has recently received considerable attention from the machine learning community because of its ability to construct trained models from unlabeled datasets and improve performance in downstream tasks such as fine-tuning labeled data. This method is especially crucial for addressing the scarcity of annotated data and improving the accuracy of classification models in medical imaging. Chen et al. proposed a self-learning framework called Sim-CLR, which is an improvement over ImageNet [6,7]. In [4,5], the use of MoCo for contrastive learning was proposed and competitive results achieved by fine-tuning a linear classifier using the ImageNet dataset. In [8], Sowrirajan proposed a method called MoCo-CXR for X-ray image classification based on MoCo contrastive learning. This method was designed for detecting several lung diseases such as pleural effusion, tuberculosis, and atelectasis. MoCo-CXR performed better than an ImageNet-pretrained model with only fine-tuning.
Promising results have been achieved in self-supervised learning methods such as MoCo [4,5] by leveraging unlabeled data to generate a pretrained model in which a visual representation encoder is trained based on a loss function. In the MoCo architecture, features are learnt from unlabeled data by training two encoders comprising the encoder and the momentum encoder. Both encoders have the same architecture, which is typically a deep neural network such as a convolutional neural network. MoCo is a method for building large and consistent dictionaries for learning from unlabeled data through a contrastive loss function called InfoNCE, which is applied to measure the agreement between positive and negative image pairs. The positive image pairs are created by applying two data-augmentation operators on the same image. The dictionary is used as a queue for samples and updated by enqueuing the current mini-batch and dequeuing the oldest mini-batch. The training process involves updating the parameters of the encoder to minimize the contrastive loss using backpropagation. At the same time, the momentum encoder parameters are updated by applying an exponential moving average to the parameters of the encoder. Labeled data is subsequently used to fine-tune the MoCo-pretrained model.
The overall flowchart of the X-ray image-based lung disease classification process is shown in Fig. 1. The CheXpert dataset [24] is first used to train the MoCo architecture with a backbone based on the Resnet34, Resnet50, or Resnet101 deep learning network architectures [23]. The parameters from ImageNet are loaded onto the backbone of the deep learning architecture and then trained on MoCo with contrastive learning by applying the image rotation and image flip transformations. Several data augmentation methods for unlabeled data such as random crop, grayscale, jitter, horizontal flip (MoCo v1 [4]), and Gaussian blurring (MoCo v2 [5]) are provided in MoCo. However, because X-ray images are grayscale images and disease diagnosis from X-ray images depends on specific parts of the image, the grayscale, jitter, and blurring transformations are unsuitable for X-ray images as the image labels may be modified. We therefore applied two types of data augmentation comprising random rotation (10°) and horizontal flipping on the images to create pairs of positive images for contrastive learning through a loss function in a similar manner to [8]. The model obtained from MoCo after the first step is denoted as Model 1.
Supervised learning was performed by fine-tuning Model 1. All the layers of the backbone model were frozen and a linear classifier layer trained. The labeled dataset was divided into three subsets comprising the training, valid, and test sets. The training and valid sets were passed through Model 1 to fine-tune the linear layer to obtain Model 2. A test set was used for evaluation. We experimented with finetuning using linear classifiers from an ImageNet model and a MoCo model pretrained with X-ray images.
Our goal is to use the linear fine-tuned MoCo Model 2 to improve X-ray image classification by integrating it with nonlinear classifiers. In other words, the nonlinear classifiers substitute for softmax in the deep networks. In our approach, Model 2 is used as an extractor to extract features (representations) from the labeled dataset. The SVM, LightGBM, XGBoost, and CatBoost nonlinear classifiers are subsequently trained using the extracted features.
The SVM algorithm [9] with a radial basis function (RBF) kernel is a powerful algorithm for multiclass classification. The use of RBFs to handle complex decision boundaries in feature space makes this an effective algorithm for nonlinear data. The hyperparameters C and gamma γ need to be tuned for optimal performance. LightGBM [10] is a gradient boosting framework known for its high speed and efficiency developed by Microsoft. It uses techniques such as gradient-based one-sided sampling (GOSS) and exclusive feature bundling (EFB) to achieve fast training and low memory usage while maintaining high accuracy. Extreme gradient boosting (XGBoost) [11] is a scalable and efficient gradient boosting system used widely owing to its performance and flexibility. The tree construction process is optimized using a regularized objective function and tree pruning techniques to achieve high prediction accuracy and robustness against overfitting. CatBoost [12] is a gradient boosting algorithm developed by Yandex for categorical features. It employs advanced strategies such as ordered boosting and various regularization techniques to achieve high accuracy with minimal hyperparameter tuning.
We demonstrate that better results can be achieved by performing feature extraction with Model 2 to train a nonlinear classifier compared to fine-tuning a MoCo model without trained nonlinear classifiers. The detailed experimental results are presented in Section IV.
An actual chest X-ray dataset was obtained from public sources. The CheXpert X-ray dataset [24] was used as an unlabeled dataset to train the contrastive learning with MoCo. This dataset was published by a Stanford University research team in 2019. We used 120,000 images from CheXpert for training with MoCo.
The dataset has five labels for healthy patients, patients infected by Covid-19, and patients with lung diseases excluding Covid-19 comprising edema, mass nodules and pneumothorax. This dataset was obtained from published datasets with five classes (normal, Covid-19, edema, mass nodule, and pneumothorax) [24-34]. The total number of images is 98,996. The X-ray images were tagged with one of the five classes. Examples of chest X-ray images in the five classes are shown in Fig. 2. All the images were resized to 224 × 224 pixels. The labeled dataset was divided into three subsets with 70% of the images in the training set, 15% in the valid set, and 15% in the test set. The details of the dataset are listed in Table 1.
Table 1 . Labeled chest X-ray dataset
Label | Train set | Valid set | Test set |
---|---|---|---|
Normal | 18,425 | 3,949 | 3,948 |
Covid-19 | 14,252 | 3,054 | 3,054 |
Edema | 23,240 | 4,980 | 4,980 |
Mass-nodule | 4,077 | 873 | 874 |
Pneumothorax | 9,303 | 1,993 | 1,994 |
The X-ray image lung disease classification program was written in Python. The Resnet34, Resnet50, and Resnet101 [23] deep learning network architectures were implemented using the Keras, TensorFlow [35], Scikit-learn [36], and Pytorch libraries. All the experimental results were obtained on a computer running Ubuntu 20.04.5 with an Intel(R) CoreTM i5-10400 CPU @ 2.90GHz × 12, 16 GB of RAM, and a 12 GB GDDR6 NVIDIA GeForce RTX 3060 with 3584 CUDA cores.
The training process for lung disease classification in chest radiographs comprises three main stages. In the first stage, we trained MoCo with 120,000 images extracted from the CheXpert dataset [24]. The model parameters for contrast learning training on MoCo comprise a batch size of 32, learning rate of 10−3, momentum of 0.9, weight decay of 10−3, the Adam optimizer, and 20 epochs. The checkpoint obtained in the first stage was used to fine-tune the linear layer on the labeled dataset in the second stage. Training was performed using three backbones comprising Resnet34, Resnet50, and Resnet101.
In the third stage, the features extracted from the linear fine-tuned MoCo model trained on the training set were used to determine the best hyper-parameters (nonlinear RBF kernel function with γ = 0.0001 and positive constant cost of 105 considering the trade-off between margin size and errors) for the SVM model. LightGBM and CatBoost were trained using a max_depth of 10 and learning_rate of 0.1 and the objective set to multiclass. Model XGBoost was trained with the objective set to multiple:softprob, a learning_rate of 0.1, and max_depth of 8.
The classification results from the various methods are presented in Table 2 and Fig. 3. Model 0 (M0) was trained by fine-tuning a linear layer in a ImageNet-pretrained model using the labeled dataset. Model 2 (M2) was obtained by fine-tuning the linear layer in the MoCo model using chest X-ray images. We experimented with feature extraction using the M0 and M2 models and trained SVM, LightGBM, XGBoost, and CatBoost classifiers.
Table 2 . Classification results on the test set (%)
(a) Resnet34 | ||||
---|---|---|---|---|
Methods | Accuracy | Precision | Recall | F1-score |
Model0 (M0) | 77.5 | 72.4 | 67.8 | 69.6 |
Features(M0)+SVM | 83.0 | 78.6 | 78.0 | 78.0 |
Features(M0)+LightGBM | 82.1 | 78.4 | 72.8 | 74.6 |
Features(M0)+CatBoost | 81.2 | 77.0 | 71.8 | 73.4 |
Features(M0)+XGBoost | 81.7 | 78.6 | 72.4 | 74.2 |
Model 2 (M2) | 82.1 | 78.8 | 74.8 | 75.6 |
Features(M2)+SVM | 86.7 | 84.0 | 82.4 | 83.0 |
Features(M2)+LightGBM | 86.5 | 84.4 | 80.8 | 82.4 |
Features(M2)+CatBoost | 85.4 | 82.4 | 79.0 | 80.2 |
Features(M2)+XGBoost | 86.3 | 84.0 | 80.4 | 81.8 |
(b) Resnet50 | ||||
---|---|---|---|---|
Methods | Accuracy | Precision | Recall | F1-score |
Model0 (M0) | 82.9 | 79.2 | 80.0 | 79.4 |
Features(M0)+SVM | 83.7 | 80.0 | 80.4 | 80.0 |
Features(M0)+LightGBM | 84.4 | 82.2 | 77.6 | 79.2 |
Features(M0)+CatBoost | 83.2 | 80.2 | 75.8 | 77.2 |
Features(M0)+XGBoost | 84.4 | 82.6 | 77.4 | 79.2 |
Model 2 (M2) | 84.8 | 81.8 | 80.0 | 80.8 |
Features(M2)+SVM | 87.7 | 85.4 | 84.0 | 84.6 |
Features(M2)+LightGBM | 87.4 | 85.4 | 82.2 | 83.4 |
Features(M2)+CatBoost | 86.3 | 83.6 | 80.4 | 81.6 |
Features(M2)+XGBoost | 87.3 | 85.4 | 81.8 | 83.0 |
(c) Resnet101 | ||||
---|---|---|---|---|
Methods | Accuracy | Precision | Recall | F1-score |
Model0 (M0) | 82.5 | 78.8 | 77.8 | 78.2 |
Features(M0)+SVM | 83.5 | 80.0 | 79.4 | 79.8 |
Features(M0)+LightGBM | 84.1 | 81.4 | 76.6 | 78.2 |
Features(M0)+CatBoost | 82.6 | 78.8 | 74.6 | 75.8 |
Features(M0)+XGBoost | 83.7 | 81.0 | 76.0 | 77.6 |
Model 2 (M2) | 84.6 | 81.6 | 80.2 | 81.0 |
Features(M2)+SVM | 87.4 | 85.2 | 83.2 | 84.2 |
Features(M2)+LightGBM | 87.9 | 86.2 | 83.4 | 84.6 |
Features(M2)+CatBoost | 87.2 | 85.2 | 82.4 | 83.6 |
Features(M2)+XGBoost | 87.8 | 86.0 | 83.4 | 84.4 |
The comprehensive X-ray image lung disease classification results are presented in Table 2 and Fig. 3. The experimental results on the three backbone architectures show that our proposed method, in which features are extracted from a linear fine-tuned MoCo model combined with a trained nonlinear classifier, achieved better results compared to solely fine-tuning a linear classifier on the ImageNet and MoCo pretrained models. In the experiment, the SVM, LightGBM, CatBoost, and XGBoost classifiers were trained on features extracted from the linear fine-tuned ImageNet and MoCo models pretrained on X-ray images. The proposed method achieved improved accuracy on all three network architectures and four classification algorithms for both M0 and M2. For the linear fine-tuned MoCo model with a Resnet34 backbone, the SVM classifier achieved the highest accuracy of 86.7%, followed by LightGBM (86.5%), XGBoost (86.3%), and CatBoost (85.4%). The same ranking order for accuracy holds for Resnet50 combined with the same classifiers. Accuracies exceeding 87% were obtained by using features extracted by M2 (Resnet101) to train the four classifiers with the highest accuracy obtained from LightGBM (87.9%) followed by XGBoost (87.8%), whereas the remaining classifiers provided accuracies of 87.2% to 87.4%.
We found that compared to the MoCo model with only linear fine-tuning, the accuracy of our proposed method was improved by at least 2.5% on the test set except for M2_Resnet50 + CatBoost (1.5% improvement), as detailed in Table 3 and Fig. 4. The SVM classifier improved the classification accuracy by 4.6, 2.9, and 2.8% compared with the linear fine-tuned MoCo model with a Resnet34, Resnet50, and Resnet101 backbone, respectively. All four classifiers improved the accuracy of the Resnet34 backbone by more than 3.0% with improvements of 4.6, 4.4, 4.2, and 3.3% for SVM, LightGBM, XGBoost, and CatBoost, respectively.
Table 3 . Accuracy improvement on the test set of our proposed method compared to the linear fine-tuned MoCo model (%)
Method | Resnet34 | Resnet 50 | Resnet101 |
---|---|---|---|
Features(M2)+SVM | 4.6 | 2.9 | 2.8 |
Features(M2)+LightGBM | 4.4 | 2.6 | 3.3 |
Features(M2)+CatBoost | 3.3 | 1.5 | 2.6 |
Features(M2)+XGBoost | 4.2 | 2.5 | 3.2 |
We presented a novel approach for improving the performance of X-ray image classification by incorporating selfsupervised learning and classification algorithms. Contrastive learning was employed to learn features (representations) from the abundant pool of unlabeled data to enhance data efficiency and address the limited availability of labeled data in X-ray images. We gathered two datasets comprising an unlabeled dataset (120,000 images) for self-supervised learning and a labeled dataset (98,996 images) with five classes. A linear fine-tuned MoCo model was integrated to extract features for training nonlinear classifiers (SVM, LightGBM, CatBoost, and XGBoost) to improve classification accuracy. The results of experiments with three ResNet architectures show that the linear fine-tuned ImageNet pretrained models, with the exception of ResNet34 (77.5%), achieved accuracies of at least 82.5% on the test set. Although the linear fine-tuned MoCo models with Resnet34, Resnet50, and Resnet101 backbones achieved noteworthy accuracies 82.1, 84.8, and 84.6%, respectively, our proposed method further increased the accuracy by 1.5% to 4.8% compared to MoCo with only fine-tuned linear layers. Through combination with the SVM, LightGBM, XGBoost, and Cat-Boost classification algorithms, the accuracy of our proposed approach was improved by 4.6, 4.4, 4.2, and 3.3% for a Resnet34 backbone, respectively. Our method achieved superior performance compared to solely fine-tuning a linear classifier layer on the MoCo-pretrained model with the highest accuracy of 87.9%.
In the near future, we intend to collect more chest X-ray images of patients with other lung diseases and conduct experiments on the combination of other self-supervised learning methods with other deep networks.
Tri-Thuc Vo
was born in Vinhlong in 1989. He received his MSc. degree in informatics from the University of Brest in 2018. He is currently a lecturer at the College of Information Technology, Cantho University, Vietnam. His research interests include medical data analysis and machine learning.
Thanh-Nghi Do
was born in Cantho in 1974. He received his PhD. degree in informatics from the University of Nantes in 2004. He is currently an associate professor at the College of Information Technology, Cantho University, Vietnam. He is also an associate researcher at UMI UMMISCO 209 (IRD/UPMC), Sorbonne University, and the Pierre and Marie Curie University, France. His research interests include data mining with support vector machines, kernel-based methods, decision tree algorithms, ensemble-based learning, and information visualization. He has served on the program committees of international conferences and is a reviewer for journals in his fields of expertise.
Journal of information and communication convergence engineering 2024; 22(2): 165-171
Published online June 30, 2024 https://doi.org/10.56977/jicce.2024.22.2.165
Copyright © Korea Institute of Information and Communication Engineering.
Tri-Thuc Vo 1 and Thanh-Nghi Do2,3*
1Department of Computer Science, Can Tho University, Can Tho 94000, Viet Nam
2Department of Computer Networks, Can Tho University, Can Tho 94000, Viet Nam
3UMI UMMISCO 209, IRD/UPMC, Paris 75000, France
Correspondence to:Thanh-Nghi Do (E-mail: dtnghi@ctu.edu.vn)
Department of Computer Networks, Can Tho University, Can Tho 94165, Viet Nam
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
In this study, we present a novel approach for enhancing chest X-ray image classification (normal, Covid-19, edema, mass nodules, and pneumothorax) by combining contrastive learning and machine learning algorithms. A vast amount of unlabeled data was leveraged to learn representations so that data efficiency is improved as a means of addressing the limited availability of labeled data in X-ray images. Our approach involves training classification algorithms using the extracted features from a linear fine-tuned Momentum Contrast (MoCo) model. The MoCo architecture with a Resnet34, Resnet50, or Resnet101 backbone is trained to learn features from unlabeled data. Instead of only fine-tuning the linear classifier layer on the MoCo-pretrained model, we propose training nonlinear classifiers as substitutes for softmax in deep networks. The empirical results show that while the linear fine-tuned ImageNet-pretrained models achieved the highest accuracy of only 82.9% and the linear fine-tuned MoCo-pretrained models an increased highest accuracy of 84.8%, our proposed method offered a significant improvement and achieved the highest accuracy of 87.9%.
Keywords: Chest X-ray image, Contrastive learning, Image classification, Self-supervised learning
The lung is an important human organ. Lung diseases can affect health and even lead to death. According to the World Health Organization, the number of deaths attributed to lung-related diseases exceeded 3.3 million individuals in 2017. The Covid-19 pandemic has caused more than 6.9 million deaths and approximately 768 million infections as of June 28, 2023 (WHO, https://www.who.int). Chest X-ray is the most cost-effective diagnostic tool and a common method for diagnosing and screening lung diseases. However, disease diagnosis based on chest X-ray images requires highly skilled radiologists. The potential subjectivity of lung diseases detection from X-ray images can lead to inaccurate diagnostic results and less effective treatments. Therefore, a system that can assist in disease diagnosis from chest radiographs will provide significant treatment cost benefits to patients and improve their physical and mental health.
Deep learning has been widely applied to solve problems related to medical image data such as X-ray, computed tomography (CT), and magnetic resonance imaging (MRI) images in recent years and achieved promising results [1-3]. However, deep learning requires large amounts of annotated data. Unfortunately, the availability of labeled medical data is limited owing to a number of factors such as a lack of domain experts, privacy regulations, and expensive and timeconsuming data labeling processes. Self-supervised learning is an alternative approach in which the wealth of available unlabeled data is leveraged for the learning of useful representations or features without explicit human annotation. Impressive results have been obtained in this approach by utilizing unlabeled data images to generate pretrained models, which were subsequently fine-tuned using a limited amount of labeled data [4-7]. The pretrained model was constructed by maximizing the agreement between different views of the same image and minimizing the agreement between different images based on a loss function. Self-supervised learning offers improved accuracy compared to supervised learning using labeled data [4-8].
In this study, a novel approach is proposed for improving the X-ray image classification of normal lungs and lungs with four lung diseases comprising Covid-19, edema, massnodule, and pneumothorax. In our approach, a self-supervised learning technique is combined with supervised learning algorithms to efficiently classify X-ray images of lung diseases. The linear fine-tuned model obtained from Momentum Contrast (MoCo) contrastive learning is used as a feature extractor for labeled X-ray images. The extracted features are then trained on classification algorithms comprising support vector machine (SVM) [9], LightGBM [10], XGBoost [11], and CatBoost [12], which are used to substitute for softmax in deep networks. The experimental results show that our proposed approach achieves better accuracy than those of linear fine-tuned MoCo-pretrained models.
The remainder of this paper is organized as follows: The related work on X-ray image lung disease classification is briefly discussed in Section II. Our proposed method is presented in Section III and the experimental results in Section IV. The conclusions and future work are presented in Section V.
Deep learning techniques have been applied to lung disease detection from chest X-ray images. A convolutional neural network for image classification problems with 12 classes was proposed and an accuracy of 86% achieved [13]. Chouhan et al. [14] investigated the combination of five deep learning models with transfer learning for pneumonia detection in chest X-ray images and achieved an accuracy of 96.4% using their ensemble model. A CNN architecture called CheXNeXt for classifying 14 different pathologies using chest X-ray images as the input was developed [15]. Bhandary et al. suggested a deep learning framework based on a modified version of AlexNet and a SVM to detect lung abnormalities in chest X-ray and CT images [16]. A novel method that incorporates local variance analysis and a probabilistic neural network for classifying lung carcinomas with 92% accuracy was introduced in [17]. A transfer learning method involving two convolutional neural networks (VGG16 and InceptionV3) was applied in [18] for pneumonia classification in chest X-ray images.
The outbreak of the Covid-19 pandemic has resulted in many studies on diagnosing Covid-19 through the application of deep learning to chest X-ray images. In [19], a SVM model was employed on top of deep networks for detecting Covid-19 in chest X-ray images and the highest accuracy of 96.16% was achieved. In [20], Afshar et al. proposed a model framework based on capsule networks named COVID-CAPS for diagnosing Covid-19 from X-ray images. An accuracy of 95.7%, sensitivity of 90%, and specificity of 95.8% were achieved. A new deep learning framework called COVIDX-Net was presented in [21], in which seven deep convolutional neural network architectures were used to construct a framework for classifying X-ray images. The highest F1-scores of 89% and 91% were achieved for normal patients and patients infected by Covid-19, respectively. Ozturk et al. developed a model for automatic Covid-19 diagnosis using X-ray images as input [22]. Their system achieved an accuracy of 98.08% for two-class scenarios (Covid-19 and No-Findings) and 87.02% for multiclass scenarios (Covid-19, No-Findings, and pneumonia).
Self-supervised learning has recently received considerable attention from the machine learning community because of its ability to construct trained models from unlabeled datasets and improve performance in downstream tasks such as fine-tuning labeled data. This method is especially crucial for addressing the scarcity of annotated data and improving the accuracy of classification models in medical imaging. Chen et al. proposed a self-learning framework called Sim-CLR, which is an improvement over ImageNet [6,7]. In [4,5], the use of MoCo for contrastive learning was proposed and competitive results achieved by fine-tuning a linear classifier using the ImageNet dataset. In [8], Sowrirajan proposed a method called MoCo-CXR for X-ray image classification based on MoCo contrastive learning. This method was designed for detecting several lung diseases such as pleural effusion, tuberculosis, and atelectasis. MoCo-CXR performed better than an ImageNet-pretrained model with only fine-tuning.
Promising results have been achieved in self-supervised learning methods such as MoCo [4,5] by leveraging unlabeled data to generate a pretrained model in which a visual representation encoder is trained based on a loss function. In the MoCo architecture, features are learnt from unlabeled data by training two encoders comprising the encoder and the momentum encoder. Both encoders have the same architecture, which is typically a deep neural network such as a convolutional neural network. MoCo is a method for building large and consistent dictionaries for learning from unlabeled data through a contrastive loss function called InfoNCE, which is applied to measure the agreement between positive and negative image pairs. The positive image pairs are created by applying two data-augmentation operators on the same image. The dictionary is used as a queue for samples and updated by enqueuing the current mini-batch and dequeuing the oldest mini-batch. The training process involves updating the parameters of the encoder to minimize the contrastive loss using backpropagation. At the same time, the momentum encoder parameters are updated by applying an exponential moving average to the parameters of the encoder. Labeled data is subsequently used to fine-tune the MoCo-pretrained model.
The overall flowchart of the X-ray image-based lung disease classification process is shown in Fig. 1. The CheXpert dataset [24] is first used to train the MoCo architecture with a backbone based on the Resnet34, Resnet50, or Resnet101 deep learning network architectures [23]. The parameters from ImageNet are loaded onto the backbone of the deep learning architecture and then trained on MoCo with contrastive learning by applying the image rotation and image flip transformations. Several data augmentation methods for unlabeled data such as random crop, grayscale, jitter, horizontal flip (MoCo v1 [4]), and Gaussian blurring (MoCo v2 [5]) are provided in MoCo. However, because X-ray images are grayscale images and disease diagnosis from X-ray images depends on specific parts of the image, the grayscale, jitter, and blurring transformations are unsuitable for X-ray images as the image labels may be modified. We therefore applied two types of data augmentation comprising random rotation (10°) and horizontal flipping on the images to create pairs of positive images for contrastive learning through a loss function in a similar manner to [8]. The model obtained from MoCo after the first step is denoted as Model 1.
Supervised learning was performed by fine-tuning Model 1. All the layers of the backbone model were frozen and a linear classifier layer trained. The labeled dataset was divided into three subsets comprising the training, valid, and test sets. The training and valid sets were passed through Model 1 to fine-tune the linear layer to obtain Model 2. A test set was used for evaluation. We experimented with finetuning using linear classifiers from an ImageNet model and a MoCo model pretrained with X-ray images.
Our goal is to use the linear fine-tuned MoCo Model 2 to improve X-ray image classification by integrating it with nonlinear classifiers. In other words, the nonlinear classifiers substitute for softmax in the deep networks. In our approach, Model 2 is used as an extractor to extract features (representations) from the labeled dataset. The SVM, LightGBM, XGBoost, and CatBoost nonlinear classifiers are subsequently trained using the extracted features.
The SVM algorithm [9] with a radial basis function (RBF) kernel is a powerful algorithm for multiclass classification. The use of RBFs to handle complex decision boundaries in feature space makes this an effective algorithm for nonlinear data. The hyperparameters C and gamma γ need to be tuned for optimal performance. LightGBM [10] is a gradient boosting framework known for its high speed and efficiency developed by Microsoft. It uses techniques such as gradient-based one-sided sampling (GOSS) and exclusive feature bundling (EFB) to achieve fast training and low memory usage while maintaining high accuracy. Extreme gradient boosting (XGBoost) [11] is a scalable and efficient gradient boosting system used widely owing to its performance and flexibility. The tree construction process is optimized using a regularized objective function and tree pruning techniques to achieve high prediction accuracy and robustness against overfitting. CatBoost [12] is a gradient boosting algorithm developed by Yandex for categorical features. It employs advanced strategies such as ordered boosting and various regularization techniques to achieve high accuracy with minimal hyperparameter tuning.
We demonstrate that better results can be achieved by performing feature extraction with Model 2 to train a nonlinear classifier compared to fine-tuning a MoCo model without trained nonlinear classifiers. The detailed experimental results are presented in Section IV.
An actual chest X-ray dataset was obtained from public sources. The CheXpert X-ray dataset [24] was used as an unlabeled dataset to train the contrastive learning with MoCo. This dataset was published by a Stanford University research team in 2019. We used 120,000 images from CheXpert for training with MoCo.
The dataset has five labels for healthy patients, patients infected by Covid-19, and patients with lung diseases excluding Covid-19 comprising edema, mass nodules and pneumothorax. This dataset was obtained from published datasets with five classes (normal, Covid-19, edema, mass nodule, and pneumothorax) [24-34]. The total number of images is 98,996. The X-ray images were tagged with one of the five classes. Examples of chest X-ray images in the five classes are shown in Fig. 2. All the images were resized to 224 × 224 pixels. The labeled dataset was divided into three subsets with 70% of the images in the training set, 15% in the valid set, and 15% in the test set. The details of the dataset are listed in Table 1.
Table 1 . Labeled chest X-ray dataset.
Label | Train set | Valid set | Test set |
---|---|---|---|
Normal | 18,425 | 3,949 | 3,948 |
Covid-19 | 14,252 | 3,054 | 3,054 |
Edema | 23,240 | 4,980 | 4,980 |
Mass-nodule | 4,077 | 873 | 874 |
Pneumothorax | 9,303 | 1,993 | 1,994 |
The X-ray image lung disease classification program was written in Python. The Resnet34, Resnet50, and Resnet101 [23] deep learning network architectures were implemented using the Keras, TensorFlow [35], Scikit-learn [36], and Pytorch libraries. All the experimental results were obtained on a computer running Ubuntu 20.04.5 with an Intel(R) CoreTM i5-10400 CPU @ 2.90GHz × 12, 16 GB of RAM, and a 12 GB GDDR6 NVIDIA GeForce RTX 3060 with 3584 CUDA cores.
The training process for lung disease classification in chest radiographs comprises three main stages. In the first stage, we trained MoCo with 120,000 images extracted from the CheXpert dataset [24]. The model parameters for contrast learning training on MoCo comprise a batch size of 32, learning rate of 10−3, momentum of 0.9, weight decay of 10−3, the Adam optimizer, and 20 epochs. The checkpoint obtained in the first stage was used to fine-tune the linear layer on the labeled dataset in the second stage. Training was performed using three backbones comprising Resnet34, Resnet50, and Resnet101.
In the third stage, the features extracted from the linear fine-tuned MoCo model trained on the training set were used to determine the best hyper-parameters (nonlinear RBF kernel function with γ = 0.0001 and positive constant cost of 105 considering the trade-off between margin size and errors) for the SVM model. LightGBM and CatBoost were trained using a max_depth of 10 and learning_rate of 0.1 and the objective set to multiclass. Model XGBoost was trained with the objective set to multiple:softprob, a learning_rate of 0.1, and max_depth of 8.
The classification results from the various methods are presented in Table 2 and Fig. 3. Model 0 (M0) was trained by fine-tuning a linear layer in a ImageNet-pretrained model using the labeled dataset. Model 2 (M2) was obtained by fine-tuning the linear layer in the MoCo model using chest X-ray images. We experimented with feature extraction using the M0 and M2 models and trained SVM, LightGBM, XGBoost, and CatBoost classifiers.
Table 2 . Classification results on the test set (%).
(a) Resnet34 | ||||
---|---|---|---|---|
Methods | Accuracy | Precision | Recall | F1-score |
Model0 (M0) | 77.5 | 72.4 | 67.8 | 69.6 |
Features(M0)+SVM | 83.0 | 78.6 | 78.0 | 78.0 |
Features(M0)+LightGBM | 82.1 | 78.4 | 72.8 | 74.6 |
Features(M0)+CatBoost | 81.2 | 77.0 | 71.8 | 73.4 |
Features(M0)+XGBoost | 81.7 | 78.6 | 72.4 | 74.2 |
Model 2 (M2) | 82.1 | 78.8 | 74.8 | 75.6 |
Features(M2)+SVM | 86.7 | 84.0 | 82.4 | 83.0 |
Features(M2)+LightGBM | 86.5 | 84.4 | 80.8 | 82.4 |
Features(M2)+CatBoost | 85.4 | 82.4 | 79.0 | 80.2 |
Features(M2)+XGBoost | 86.3 | 84.0 | 80.4 | 81.8 |
(b) Resnet50 | ||||
---|---|---|---|---|
Methods | Accuracy | Precision | Recall | F1-score |
Model0 (M0) | 82.9 | 79.2 | 80.0 | 79.4 |
Features(M0)+SVM | 83.7 | 80.0 | 80.4 | 80.0 |
Features(M0)+LightGBM | 84.4 | 82.2 | 77.6 | 79.2 |
Features(M0)+CatBoost | 83.2 | 80.2 | 75.8 | 77.2 |
Features(M0)+XGBoost | 84.4 | 82.6 | 77.4 | 79.2 |
Model 2 (M2) | 84.8 | 81.8 | 80.0 | 80.8 |
Features(M2)+SVM | 87.7 | 85.4 | 84.0 | 84.6 |
Features(M2)+LightGBM | 87.4 | 85.4 | 82.2 | 83.4 |
Features(M2)+CatBoost | 86.3 | 83.6 | 80.4 | 81.6 |
Features(M2)+XGBoost | 87.3 | 85.4 | 81.8 | 83.0 |
(c) Resnet101 | ||||
---|---|---|---|---|
Methods | Accuracy | Precision | Recall | F1-score |
Model0 (M0) | 82.5 | 78.8 | 77.8 | 78.2 |
Features(M0)+SVM | 83.5 | 80.0 | 79.4 | 79.8 |
Features(M0)+LightGBM | 84.1 | 81.4 | 76.6 | 78.2 |
Features(M0)+CatBoost | 82.6 | 78.8 | 74.6 | 75.8 |
Features(M0)+XGBoost | 83.7 | 81.0 | 76.0 | 77.6 |
Model 2 (M2) | 84.6 | 81.6 | 80.2 | 81.0 |
Features(M2)+SVM | 87.4 | 85.2 | 83.2 | 84.2 |
Features(M2)+LightGBM | 87.9 | 86.2 | 83.4 | 84.6 |
Features(M2)+CatBoost | 87.2 | 85.2 | 82.4 | 83.6 |
Features(M2)+XGBoost | 87.8 | 86.0 | 83.4 | 84.4 |
The comprehensive X-ray image lung disease classification results are presented in Table 2 and Fig. 3. The experimental results on the three backbone architectures show that our proposed method, in which features are extracted from a linear fine-tuned MoCo model combined with a trained nonlinear classifier, achieved better results compared to solely fine-tuning a linear classifier on the ImageNet and MoCo pretrained models. In the experiment, the SVM, LightGBM, CatBoost, and XGBoost classifiers were trained on features extracted from the linear fine-tuned ImageNet and MoCo models pretrained on X-ray images. The proposed method achieved improved accuracy on all three network architectures and four classification algorithms for both M0 and M2. For the linear fine-tuned MoCo model with a Resnet34 backbone, the SVM classifier achieved the highest accuracy of 86.7%, followed by LightGBM (86.5%), XGBoost (86.3%), and CatBoost (85.4%). The same ranking order for accuracy holds for Resnet50 combined with the same classifiers. Accuracies exceeding 87% were obtained by using features extracted by M2 (Resnet101) to train the four classifiers with the highest accuracy obtained from LightGBM (87.9%) followed by XGBoost (87.8%), whereas the remaining classifiers provided accuracies of 87.2% to 87.4%.
We found that compared to the MoCo model with only linear fine-tuning, the accuracy of our proposed method was improved by at least 2.5% on the test set except for M2_Resnet50 + CatBoost (1.5% improvement), as detailed in Table 3 and Fig. 4. The SVM classifier improved the classification accuracy by 4.6, 2.9, and 2.8% compared with the linear fine-tuned MoCo model with a Resnet34, Resnet50, and Resnet101 backbone, respectively. All four classifiers improved the accuracy of the Resnet34 backbone by more than 3.0% with improvements of 4.6, 4.4, 4.2, and 3.3% for SVM, LightGBM, XGBoost, and CatBoost, respectively.
Table 3 . Accuracy improvement on the test set of our proposed method compared to the linear fine-tuned MoCo model (%).
Method | Resnet34 | Resnet 50 | Resnet101 |
---|---|---|---|
Features(M2)+SVM | 4.6 | 2.9 | 2.8 |
Features(M2)+LightGBM | 4.4 | 2.6 | 3.3 |
Features(M2)+CatBoost | 3.3 | 1.5 | 2.6 |
Features(M2)+XGBoost | 4.2 | 2.5 | 3.2 |
We presented a novel approach for improving the performance of X-ray image classification by incorporating selfsupervised learning and classification algorithms. Contrastive learning was employed to learn features (representations) from the abundant pool of unlabeled data to enhance data efficiency and address the limited availability of labeled data in X-ray images. We gathered two datasets comprising an unlabeled dataset (120,000 images) for self-supervised learning and a labeled dataset (98,996 images) with five classes. A linear fine-tuned MoCo model was integrated to extract features for training nonlinear classifiers (SVM, LightGBM, CatBoost, and XGBoost) to improve classification accuracy. The results of experiments with three ResNet architectures show that the linear fine-tuned ImageNet pretrained models, with the exception of ResNet34 (77.5%), achieved accuracies of at least 82.5% on the test set. Although the linear fine-tuned MoCo models with Resnet34, Resnet50, and Resnet101 backbones achieved noteworthy accuracies 82.1, 84.8, and 84.6%, respectively, our proposed method further increased the accuracy by 1.5% to 4.8% compared to MoCo with only fine-tuned linear layers. Through combination with the SVM, LightGBM, XGBoost, and Cat-Boost classification algorithms, the accuracy of our proposed approach was improved by 4.6, 4.4, 4.2, and 3.3% for a Resnet34 backbone, respectively. Our method achieved superior performance compared to solely fine-tuning a linear classifier layer on the MoCo-pretrained model with the highest accuracy of 87.9%.
In the near future, we intend to collect more chest X-ray images of patients with other lung diseases and conduct experiments on the combination of other self-supervised learning methods with other deep networks.
Table 1 . Labeled chest X-ray dataset.
Label | Train set | Valid set | Test set |
---|---|---|---|
Normal | 18,425 | 3,949 | 3,948 |
Covid-19 | 14,252 | 3,054 | 3,054 |
Edema | 23,240 | 4,980 | 4,980 |
Mass-nodule | 4,077 | 873 | 874 |
Pneumothorax | 9,303 | 1,993 | 1,994 |
Table 2 . Classification results on the test set (%).
(a) Resnet34 | ||||
---|---|---|---|---|
Methods | Accuracy | Precision | Recall | F1-score |
Model0 (M0) | 77.5 | 72.4 | 67.8 | 69.6 |
Features(M0)+SVM | 83.0 | 78.6 | 78.0 | 78.0 |
Features(M0)+LightGBM | 82.1 | 78.4 | 72.8 | 74.6 |
Features(M0)+CatBoost | 81.2 | 77.0 | 71.8 | 73.4 |
Features(M0)+XGBoost | 81.7 | 78.6 | 72.4 | 74.2 |
Model 2 (M2) | 82.1 | 78.8 | 74.8 | 75.6 |
Features(M2)+SVM | 86.7 | 84.0 | 82.4 | 83.0 |
Features(M2)+LightGBM | 86.5 | 84.4 | 80.8 | 82.4 |
Features(M2)+CatBoost | 85.4 | 82.4 | 79.0 | 80.2 |
Features(M2)+XGBoost | 86.3 | 84.0 | 80.4 | 81.8 |
(b) Resnet50 | ||||
---|---|---|---|---|
Methods | Accuracy | Precision | Recall | F1-score |
Model0 (M0) | 82.9 | 79.2 | 80.0 | 79.4 |
Features(M0)+SVM | 83.7 | 80.0 | 80.4 | 80.0 |
Features(M0)+LightGBM | 84.4 | 82.2 | 77.6 | 79.2 |
Features(M0)+CatBoost | 83.2 | 80.2 | 75.8 | 77.2 |
Features(M0)+XGBoost | 84.4 | 82.6 | 77.4 | 79.2 |
Model 2 (M2) | 84.8 | 81.8 | 80.0 | 80.8 |
Features(M2)+SVM | 87.7 | 85.4 | 84.0 | 84.6 |
Features(M2)+LightGBM | 87.4 | 85.4 | 82.2 | 83.4 |
Features(M2)+CatBoost | 86.3 | 83.6 | 80.4 | 81.6 |
Features(M2)+XGBoost | 87.3 | 85.4 | 81.8 | 83.0 |
(c) Resnet101 | ||||
---|---|---|---|---|
Methods | Accuracy | Precision | Recall | F1-score |
Model0 (M0) | 82.5 | 78.8 | 77.8 | 78.2 |
Features(M0)+SVM | 83.5 | 80.0 | 79.4 | 79.8 |
Features(M0)+LightGBM | 84.1 | 81.4 | 76.6 | 78.2 |
Features(M0)+CatBoost | 82.6 | 78.8 | 74.6 | 75.8 |
Features(M0)+XGBoost | 83.7 | 81.0 | 76.0 | 77.6 |
Model 2 (M2) | 84.6 | 81.6 | 80.2 | 81.0 |
Features(M2)+SVM | 87.4 | 85.2 | 83.2 | 84.2 |
Features(M2)+LightGBM | 87.9 | 86.2 | 83.4 | 84.6 |
Features(M2)+CatBoost | 87.2 | 85.2 | 82.4 | 83.6 |
Features(M2)+XGBoost | 87.8 | 86.0 | 83.4 | 84.4 |
Table 3 . Accuracy improvement on the test set of our proposed method compared to the linear fine-tuned MoCo model (%).
Method | Resnet34 | Resnet 50 | Resnet101 |
---|---|---|---|
Features(M2)+SVM | 4.6 | 2.9 | 2.8 |
Features(M2)+LightGBM | 4.4 | 2.6 | 3.3 |
Features(M2)+CatBoost | 3.3 | 1.5 | 2.6 |
Features(M2)+XGBoost | 4.2 | 2.5 | 3.2 |
Richard Evan Sutanto,Sukho Lee
Journal of information and communication convergence engineering 2018; 16(3): 148-152 https://doi.org/10.6109/jicce.2018.16.3.148Wang, Guangxing;Lee, Kwang-Chan;Shin, Seong-Yoon;
Journal of information and communication convergence engineering 2021; 19(2): 79-83 https://doi.org/10.6109/jicce.2021.19.2.79