Journal of information and communication convergence engineering 2023; 21(2): 139-144
Published online June 30, 2023
https://doi.org/10.56977/jicce.2023.21.2.139
© Korea Institute of Information and Communication Engineering
Correspondence to : Youngbok Cho (E-mail: ybcho@dju.ac.kr)
Department of Information Security, Daejeon University, Daejeon 300-716, Republic of Korea
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Engineers prefer deep neural networks (DNNs) for solving computer vision problems. However, DNNs pose two major problems. First, neural networks require large amounts of well-labeled data for training. Second, the covariate shift problem is common in computer vision problems. Domain adaptation has been proposed to mitigate this problem. Recent work on adversarial-learning-based unsupervised domain adaptation (UDA) has explained transferability and enabled the model to learn robust features. Despite this advantage, current methods do not guarantee the distinguishability of the latent space unless they consider class-aware information of the target domain. Furthermore, source and target examples alone cannot efficiently extract domain-invariant features from the encoded spaces. To alleviate the problems of existing UDA methods, we propose the mixup regularization in adversarial discriminative domain adaptation (ADDA) method. We validated the effectiveness and generality of the proposed method by performing experiments under three adaptation scenarios: MNIST to USPS, SVHN to MNIST, and MNIST to MNIST-M.
Keywords Adversarial discriminative domain adaptation (ADDA), Domain-invariant, Mixup, Unsupervised domain adaptation (UDA)
Adversarial discriminative domain adaptation (ADDA) was first introduced in [1] and is categorized as a featurebased domain adaptation method. In recent unsupervised domain adaptation (UDA) techniques, adversarial learning has been employed to learn invariant features across domains. Using a min-max two-player game in which generators master confusing domain discriminators, ADDA models learn discriminative representations and invariant features. Although it is highly successful at various tasks, such as image classification and semantic segmentation, it has two major shortcomings. First, domain classifiers only differentiate features as sources or targets, and do not consider task-specific decision boundaries between classes. Second, it attempts to perfectly align the feature distributions between different domains on the basis of the characteristics of each domain. To address existing UDA issues, category mixup regularization (CMR) was applied in this study to ADDA. We implemented it on the source and target domain samples to make the model predictions insensitive to perturbations. This approach helps the model learn meaningful representations across domains. Our main contributions are summarized as follows:
- We propose a category mixup-regularized learning technique to improve ADDA generality, which maps the source and target domains to a common latent code and transfers the learned knowledge from the annotated source domain to the label-free target domain.
- Experimental results show that regularization techniques enable deep-learning models to learn better discriminative and domain-invariant representations, which effectively reduce large domain shifts, such as the SVHN to MNIST adaptation task.
- We validated the effectiveness and generality of the proposed method by evaluating it on three adaptation tasks. According to the experimental results, our improved ADDA method is a strong competitor to previous UDA methods.
- Regularization techniques using CMR also address the problem of training unstable deep networks.
The remainder of this paper is organized as follows: In Section 2, interpolation-based regularization and domain adaptation are described in related works, and in Section 3, the proposed method is presented in detail. In Section 4, the experimental setup and results are described, and the conclusions are presented in Section 5.
Interpolation-based regularization [2,3] was recently proposed for supervised learning and can mitigate model uncertainty in adversarial training and vulnerability to adversarial samples. Consequently, [3] suggested that a mixup is a superior implementation for training models on virtual example sets as linear combinations of inputs and labels. This end is achieved by regularizing the output of the model for a convex combination of the two inputs. Mixup [4] was used to confirm the consistent predictions in the data distribution. In recent years, several versions of Mixup have been studied. Among them, manifold mixup [1] was proposed to interpolate latent space representations.
Domain adaptation (DA) transfers knowledge from one domain with abundant labeled data to another domain with no labeled data. DA discovers shared latent representations across the source and target domains. It adapts them to minimize marginal and conditional inconsistencies in the embedded space. Recently, UDA methods have been considered for learning domain-invariant representations using adversarial training. Cycle-consistent adversarial domain adaptation (CyCADA) [5] transforms feature representations at both the pixel and feature levels while executing pixel and semantic consistency. The cycle-consistency term forces the crossdomain transformation to retain pixel information, whereas the semantic-loss term forces semantic consistency. Multiadversarial domain adaptation (MADA) [6] exploits the generative interconnection between feature representations and label predictions to execute adversarial learning. Generateto- adapt (GTA) [7] introduces a novel adversarial image generation method that learns a latent representation that reduces the domain shift between the source and target domains. Pixel-level adapt [8] transforms source-domain images into target-like images. Two main approaches have been attempted. The first attempt was to determine a mapping function from the source-to-target domain representation. The second attempt was to identify domain-invariant representations that were not restricted to one domain. Existing DAs learn the target encoders according to the target task. However, because it learns to separate the target and reduces the domain shift between real and fake images at the pixel level, it is less affected by the tasks.
Mixup [3] was proposed as a data- and domain-agnostic technique. Deep neural networks (DNNs) memorize the corrupted labels. To address this problem, a mixup is proposed that combines the features of different samples such that the network is not overconfident about the relationship between features and labels. This technique can be extended to various data modes such as computer vision, natural language processing, and speech recognition.
Equation (1) first expresses the original input vector and the one-hot label encoding based on it as follows: The values are in the range [0,1], and are sampled from the beta distribution.
Neural networks share two common attributes for successful application. To reduce mean error, deep learning model is trained using a weighted combination of features and labels. This technique is known as empirical risk minimization (ERM) [3]. Subsequently, the performance of these state-ofthe- art neural networks [3] is improved by increasing the amount of training data. The predictions of the neural networks trained with the ERM are unstable when tested on samples with different data distributions. In recent years, mixup regularization has been used to alleviate this problem when training neural networks using annotated data.
A feature-based domain adaptation method called ADDA [1] ensures that target mapping reduces the distance between the source and target domains. Training robust deep networks using generative-adversarial loss reduces domain shifts and allows the generator network to produce synthetic samples across various domains without sharing weights with the discriminator. To fool discriminator networks, ADDA masters a novel feature representation. This feature representation is learned from two encoder networks. The source encoder is designed to yield superior features for learning tasks in the source domain. The task is mastered using a task network conditioned on a source encoder. The target encoder and discriminator are trained adversarially. The parameters of the four networks are optimized using a two-step algorithm, in which the source and task networks are fitted first, as shown by the following optimization problem:
In Eq. (2), (
This section describes the training process for improved ADDA using CMR. Fig. 1 illustrates the network structure proposed in this study. Feature extractors learn domaininvariant features while extracting high-level representations of scenes from input images (“shape of digit 1 in front of images”). The discriminator determines whether the input data come from a source domain or target domain by analyzing its distribution. This step is performed without sharing weights with the feature extractor network. Digit classification is performed using a classifier. According to Fig. 1, the CMR mechanism is implemented in both the source and target domains to improve the latent representation. To improve the generality of deep-learning models, it is essential to use a consistent regularization technique. The CMR forces model prediction to be insensitive to perturbed inputs by separating feature representations. During the testing phase, we evaluate the improved ADDA generality on the target test data using a learned feature extractor and classifier. We provide more details on updating the gradients of the feature extractor, classifier, and discriminator networks in the Optimization Problem section.
Our method uses source and target images to update the gradients of the feature extractors, classifiers, and discriminators. We perform adversarial learning to learn invariant features across domains. For the source domain, because the label information is available, we employ mixed-source examples and their corresponding labels to implement consistent predictions.
where
where
where
In this section, we provide details regarding the experimental environment setup and results. Fig. 1 shows that our network consists of three components: a generator, a classifier, and a discriminator. All the networks were built using DNNs. The Google Colab platform was used for this experiment. The Google Colab platform had an Intel CPU with two cores at a clock speed of 2.30 GHz in the session. Google Colab offers 12 GB of free memory. The GPU was a Tesla T4, and the Linux operating system version was 20.04.5 LTS. All the experiments were implemented on the PyTorch platform, and the Torch version was 1.13.0+cu116. All the networks were trained from scratch using the Adam optimizer. The learning rate and batch size were set to 0.0001 and 64, respectively.
The aim of this study was to improve the ADDA method. Further investigation was conducted to determine how CMR enhances the generality of the ADDA methods. Table 1 shows that all adaptation tasks performed poorly after the category mixup regularizations were disconnected. Specifically, the performance of the M→MM and S→M adaptation tasks degraded significantly by 9% and 8%, respectively. The results also suggest that the proposed approach improves the generality of the ADDA method. Compared to previous state-of-the-art approaches, such as pixel-level adaptation [8], CyCADA [5], and MCD [9], our approach is considered a strong competitor. We further visualized the feature distribution of the target domain in the MNIST to MNIST-M (M→MM), MNIST to USPS (M→U), and SVHN to MNIST (S→M) unsupervised adaptation tasks using the T-SNE tool. The T-SNE visualizations are shown in Fig. 3. The T-SNE plots indicate that the features of the different classes were clearly distinguishable in the latent space. All the experiments show that the proposed consistent mixup regularization enables deep learning models to learn from meaningful representations.
Table 1 . Comparison of the performance between our proposed method and existing UDA methods
The training algorithm of our improved ADDA approach, which uses the Adam optimizer, is presented as pseudocode in Algorithm 1. In every iteration, samples from the source domain are first mixed at the pixel level and fed into the feature extractor to obtain an embedding, which is then utilized by the classifier to predict the source label. The second step involves mixing the target samples at the pixel level and feeding them into the feature extractor to obtain the embedding. Because label information is unavailable, we calculate the L1- Norm as a penalty term. The CMR losses are calculated using Eqs. (3) and (4) for the source and target domains, respectively. In addition, the discriminator network gradients are updated using the
Three adaptation scenarios were considered: M→U, S→M, and M→MM. The MNIST dataset contains 60,000 handwritten digit images. SVHN contains 73,257 examples of digits and numbers in natural settings. and USPS contains 7,291 images. MNIST-M contains 60,000 MNIST images with color patches. Furthermore, the number of input channels was set to three for the experiments on the S→M and M→MM adaptation tasks. In contrast, the number of channels was set to 1 for the M→U adaptation task only. We evaluated the pretrained model on the target test datasets of MNIST, USPS, and MNIST-M at the end of the training phase.
The big-data environment in the era of the “fourth Industrial Revolution” is changing significantly. Well-annotated datasets are required to improve deep-learning models. UDA learns domain-invariant features using adversarial learning. In this paper, we propose an innovative learning mechanism for the ADDA method. By generating neighboring training examples, mixup regularization can regularize the output distribution. It penalizes per-pixel inconsistent predictions between neighboring and training examples. This learning mechanism also considers the categorical distribution of the target domain during the training phase. Specifically, deep neural network training was stabilized after plugging the CMR into the ADDA. The T-SNE plots also indicated that the models could learn from distinctive representations. According to our comparison results, our proposed method can compete with existing UDA methods.
This study was supported by a Daejeon University Research Grant (2021).
Bayarchimeg Kalina completed her bachelor’s degree in computer science from the Information and Network Security Department of Mongolian University of Science and Technology. She is now a master's student in the Computer and Information Security Department of Daejeon University. Currently, she is interested in deep learning, meta-learning, and natural language processing.
Young-Bok Cho is an Associate Professor at the Department of Computer & Information Security, Daejeon University, Daejeon, South Korea. She received her M.S. and Ph.D. degrees in Computer Science from Chungbuk National University, in 2006 and 2012, respectively. She has been involved in national research and has been an expert adviser of SME in Korea to follow-up research projects and an evaluator of Korea R&D, a national research proposal. She is currently working in the department of information security, Daejeon University. Her research interests include network security, medical information security, and medical image processing.
Journal of information and communication convergence engineering 2023; 21(2): 139-144
Published online June 30, 2023 https://doi.org/10.56977/jicce.2023.21.2.139
Copyright © Korea Institute of Information and Communication Engineering.
Bayarchimeg Kalina 1 and Youngbok Cho1*
1Department of Information Security, Daejeon University, Daejeon 300-716, Republic of Korea
Correspondence to:Youngbok Cho (E-mail: ybcho@dju.ac.kr)
Department of Information Security, Daejeon University, Daejeon 300-716, Republic of Korea
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Engineers prefer deep neural networks (DNNs) for solving computer vision problems. However, DNNs pose two major problems. First, neural networks require large amounts of well-labeled data for training. Second, the covariate shift problem is common in computer vision problems. Domain adaptation has been proposed to mitigate this problem. Recent work on adversarial-learning-based unsupervised domain adaptation (UDA) has explained transferability and enabled the model to learn robust features. Despite this advantage, current methods do not guarantee the distinguishability of the latent space unless they consider class-aware information of the target domain. Furthermore, source and target examples alone cannot efficiently extract domain-invariant features from the encoded spaces. To alleviate the problems of existing UDA methods, we propose the mixup regularization in adversarial discriminative domain adaptation (ADDA) method. We validated the effectiveness and generality of the proposed method by performing experiments under three adaptation scenarios: MNIST to USPS, SVHN to MNIST, and MNIST to MNIST-M.
Keywords: Adversarial discriminative domain adaptation (ADDA), Domain-invariant, Mixup, Unsupervised domain adaptation (UDA)
Adversarial discriminative domain adaptation (ADDA) was first introduced in [1] and is categorized as a featurebased domain adaptation method. In recent unsupervised domain adaptation (UDA) techniques, adversarial learning has been employed to learn invariant features across domains. Using a min-max two-player game in which generators master confusing domain discriminators, ADDA models learn discriminative representations and invariant features. Although it is highly successful at various tasks, such as image classification and semantic segmentation, it has two major shortcomings. First, domain classifiers only differentiate features as sources or targets, and do not consider task-specific decision boundaries between classes. Second, it attempts to perfectly align the feature distributions between different domains on the basis of the characteristics of each domain. To address existing UDA issues, category mixup regularization (CMR) was applied in this study to ADDA. We implemented it on the source and target domain samples to make the model predictions insensitive to perturbations. This approach helps the model learn meaningful representations across domains. Our main contributions are summarized as follows:
- We propose a category mixup-regularized learning technique to improve ADDA generality, which maps the source and target domains to a common latent code and transfers the learned knowledge from the annotated source domain to the label-free target domain.
- Experimental results show that regularization techniques enable deep-learning models to learn better discriminative and domain-invariant representations, which effectively reduce large domain shifts, such as the SVHN to MNIST adaptation task.
- We validated the effectiveness and generality of the proposed method by evaluating it on three adaptation tasks. According to the experimental results, our improved ADDA method is a strong competitor to previous UDA methods.
- Regularization techniques using CMR also address the problem of training unstable deep networks.
The remainder of this paper is organized as follows: In Section 2, interpolation-based regularization and domain adaptation are described in related works, and in Section 3, the proposed method is presented in detail. In Section 4, the experimental setup and results are described, and the conclusions are presented in Section 5.
Interpolation-based regularization [2,3] was recently proposed for supervised learning and can mitigate model uncertainty in adversarial training and vulnerability to adversarial samples. Consequently, [3] suggested that a mixup is a superior implementation for training models on virtual example sets as linear combinations of inputs and labels. This end is achieved by regularizing the output of the model for a convex combination of the two inputs. Mixup [4] was used to confirm the consistent predictions in the data distribution. In recent years, several versions of Mixup have been studied. Among them, manifold mixup [1] was proposed to interpolate latent space representations.
Domain adaptation (DA) transfers knowledge from one domain with abundant labeled data to another domain with no labeled data. DA discovers shared latent representations across the source and target domains. It adapts them to minimize marginal and conditional inconsistencies in the embedded space. Recently, UDA methods have been considered for learning domain-invariant representations using adversarial training. Cycle-consistent adversarial domain adaptation (CyCADA) [5] transforms feature representations at both the pixel and feature levels while executing pixel and semantic consistency. The cycle-consistency term forces the crossdomain transformation to retain pixel information, whereas the semantic-loss term forces semantic consistency. Multiadversarial domain adaptation (MADA) [6] exploits the generative interconnection between feature representations and label predictions to execute adversarial learning. Generateto- adapt (GTA) [7] introduces a novel adversarial image generation method that learns a latent representation that reduces the domain shift between the source and target domains. Pixel-level adapt [8] transforms source-domain images into target-like images. Two main approaches have been attempted. The first attempt was to determine a mapping function from the source-to-target domain representation. The second attempt was to identify domain-invariant representations that were not restricted to one domain. Existing DAs learn the target encoders according to the target task. However, because it learns to separate the target and reduces the domain shift between real and fake images at the pixel level, it is less affected by the tasks.
Mixup [3] was proposed as a data- and domain-agnostic technique. Deep neural networks (DNNs) memorize the corrupted labels. To address this problem, a mixup is proposed that combines the features of different samples such that the network is not overconfident about the relationship between features and labels. This technique can be extended to various data modes such as computer vision, natural language processing, and speech recognition.
Equation (1) first expresses the original input vector and the one-hot label encoding based on it as follows: The values are in the range [0,1], and are sampled from the beta distribution.
Neural networks share two common attributes for successful application. To reduce mean error, deep learning model is trained using a weighted combination of features and labels. This technique is known as empirical risk minimization (ERM) [3]. Subsequently, the performance of these state-ofthe- art neural networks [3] is improved by increasing the amount of training data. The predictions of the neural networks trained with the ERM are unstable when tested on samples with different data distributions. In recent years, mixup regularization has been used to alleviate this problem when training neural networks using annotated data.
A feature-based domain adaptation method called ADDA [1] ensures that target mapping reduces the distance between the source and target domains. Training robust deep networks using generative-adversarial loss reduces domain shifts and allows the generator network to produce synthetic samples across various domains without sharing weights with the discriminator. To fool discriminator networks, ADDA masters a novel feature representation. This feature representation is learned from two encoder networks. The source encoder is designed to yield superior features for learning tasks in the source domain. The task is mastered using a task network conditioned on a source encoder. The target encoder and discriminator are trained adversarially. The parameters of the four networks are optimized using a two-step algorithm, in which the source and task networks are fitted first, as shown by the following optimization problem:
In Eq. (2), (
This section describes the training process for improved ADDA using CMR. Fig. 1 illustrates the network structure proposed in this study. Feature extractors learn domaininvariant features while extracting high-level representations of scenes from input images (“shape of digit 1 in front of images”). The discriminator determines whether the input data come from a source domain or target domain by analyzing its distribution. This step is performed without sharing weights with the feature extractor network. Digit classification is performed using a classifier. According to Fig. 1, the CMR mechanism is implemented in both the source and target domains to improve the latent representation. To improve the generality of deep-learning models, it is essential to use a consistent regularization technique. The CMR forces model prediction to be insensitive to perturbed inputs by separating feature representations. During the testing phase, we evaluate the improved ADDA generality on the target test data using a learned feature extractor and classifier. We provide more details on updating the gradients of the feature extractor, classifier, and discriminator networks in the Optimization Problem section.
Our method uses source and target images to update the gradients of the feature extractors, classifiers, and discriminators. We perform adversarial learning to learn invariant features across domains. For the source domain, because the label information is available, we employ mixed-source examples and their corresponding labels to implement consistent predictions.
where
where
where
In this section, we provide details regarding the experimental environment setup and results. Fig. 1 shows that our network consists of three components: a generator, a classifier, and a discriminator. All the networks were built using DNNs. The Google Colab platform was used for this experiment. The Google Colab platform had an Intel CPU with two cores at a clock speed of 2.30 GHz in the session. Google Colab offers 12 GB of free memory. The GPU was a Tesla T4, and the Linux operating system version was 20.04.5 LTS. All the experiments were implemented on the PyTorch platform, and the Torch version was 1.13.0+cu116. All the networks were trained from scratch using the Adam optimizer. The learning rate and batch size were set to 0.0001 and 64, respectively.
The aim of this study was to improve the ADDA method. Further investigation was conducted to determine how CMR enhances the generality of the ADDA methods. Table 1 shows that all adaptation tasks performed poorly after the category mixup regularizations were disconnected. Specifically, the performance of the M→MM and S→M adaptation tasks degraded significantly by 9% and 8%, respectively. The results also suggest that the proposed approach improves the generality of the ADDA method. Compared to previous state-of-the-art approaches, such as pixel-level adaptation [8], CyCADA [5], and MCD [9], our approach is considered a strong competitor. We further visualized the feature distribution of the target domain in the MNIST to MNIST-M (M→MM), MNIST to USPS (M→U), and SVHN to MNIST (S→M) unsupervised adaptation tasks using the T-SNE tool. The T-SNE visualizations are shown in Fig. 3. The T-SNE plots indicate that the features of the different classes were clearly distinguishable in the latent space. All the experiments show that the proposed consistent mixup regularization enables deep learning models to learn from meaningful representations.
Table 1 . Comparison of the performance between our proposed method and existing UDA methods.
The training algorithm of our improved ADDA approach, which uses the Adam optimizer, is presented as pseudocode in Algorithm 1. In every iteration, samples from the source domain are first mixed at the pixel level and fed into the feature extractor to obtain an embedding, which is then utilized by the classifier to predict the source label. The second step involves mixing the target samples at the pixel level and feeding them into the feature extractor to obtain the embedding. Because label information is unavailable, we calculate the L1- Norm as a penalty term. The CMR losses are calculated using Eqs. (3) and (4) for the source and target domains, respectively. In addition, the discriminator network gradients are updated using the
Three adaptation scenarios were considered: M→U, S→M, and M→MM. The MNIST dataset contains 60,000 handwritten digit images. SVHN contains 73,257 examples of digits and numbers in natural settings. and USPS contains 7,291 images. MNIST-M contains 60,000 MNIST images with color patches. Furthermore, the number of input channels was set to three for the experiments on the S→M and M→MM adaptation tasks. In contrast, the number of channels was set to 1 for the M→U adaptation task only. We evaluated the pretrained model on the target test datasets of MNIST, USPS, and MNIST-M at the end of the training phase.
The big-data environment in the era of the “fourth Industrial Revolution” is changing significantly. Well-annotated datasets are required to improve deep-learning models. UDA learns domain-invariant features using adversarial learning. In this paper, we propose an innovative learning mechanism for the ADDA method. By generating neighboring training examples, mixup regularization can regularize the output distribution. It penalizes per-pixel inconsistent predictions between neighboring and training examples. This learning mechanism also considers the categorical distribution of the target domain during the training phase. Specifically, deep neural network training was stabilized after plugging the CMR into the ADDA. The T-SNE plots also indicated that the models could learn from distinctive representations. According to our comparison results, our proposed method can compete with existing UDA methods.
This study was supported by a Daejeon University Research Grant (2021).
Table 1 . Comparison of the performance between our proposed method and existing UDA methods.