Search 닫기

Regular paper

Split Viewer

Journal of information and communication convergence engineering 2023; 21(2): 110-116

Published online June 30, 2023

https://doi.org/10.56977/jicce.2023.21.2.110

© Korea Institute of Information and Communication Engineering

Design and Implementation of a Body Fat Classification Model using Human Body Size Data

Taejun Lee 1, Hakseong Kim 1, and Hoekyung Jung1* , Member, KIICE

1Department of Computer Engineering, PaiChai University, Daejeon 155-40, Republic of Korea

Correspondence to : Hoekyung Jung (E-mail: hkjung@pcu.ac.kr, Tel:+82-42-520-5640)
Department of Computer Engineering, PaiChai University, Daejeon, 35345 Republic of Korea

Received: October 30, 2021; Revised: December 6, 2021; Accepted: December 9, 2021

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Recently, as various examples of machine learning have been applied in the healthcare field, deep learning technology has been applied to various tasks, such as electrocardiogram examination and body composition analysis using wearable devices such as smart watches. To utilize deep learning, securing data is the most important procedure, where human intervention, such as data classification, is required. In this study, we propose a model that uses a clustering algorithm, namely, the K-means clustering, to label body fat according to gender and age considering body size aspects, such as chest circumference and waist circumference, and classifies body fat into five groups from high risk to low risk using a convolutional neural network (CNN). As a result of model validation, accuracy, precision, and recall results of more than 95% were obtained. Thus, rational decision making can be made in the field of healthcare or obesity analysis using the proposed method.

Keywords Classification, CNN, fat rate percentage, human body size, K-means clustering

Deep learning is being applied in various fields, including the healthcare field. Particularly, as the personal smartphone and wearable device markets grow, body composition analysis, electrocardiogram testing, and diet management are provided in an easily accessible manner to improve eating habits.

Securing data is important when applying deep learning. In the past, only medical records, medical insurance information, and data measured by users were used; however, information collected through various channels, such as the public, social media services, and genetic data, have recently been used [1].

It can be said that the pre-processing process of the collected information is the most important step. The preprocessing requires human intervention, such as removing missing values or outliers and clustering, which is in the domain of unsupervised learning [2-6].

In this study, using only easy-to-measure aspects, such as waist circumference, reported in the Size Korea public dataset provided by the National Institute of Technology, the body fat percentage data were clustered according to gender and age to create a target setpoint. Accordingly, we propose a model that undergoes a pre-processing step generate the desired setpoint.

It is believed that a model using both supervised and unsupervised learning can be used in obesity analysis or body shape management service cases.

In this section, we examine human body size data provided by the National Institute of Technology and Standards and related research.

A. Human Body Size Data

Since 1978, the National Institute of Technology and Standards governed by the Ministry of Trade, Industry, and Energy has been promoting the human body dimensional survey dissemination project to measure and standardize the human body information of Koreans and produce products that are convenient for Koreans to use with ergonomic designs [7].

The latest data are provided in the 7th human body size dataset released in December 2015. The total number of people measured was 6,413, which were categorized based on age and gender as reported in Table 1.

Table 1 . People groups included in the 7th Korean human body size survey project

ManWomanTotal
16-199989261,924
20-298696681,573
30-396556751,330
40-49311359670
50-59221358579
60-69145228373
Total3,1993,2146,413


B. Related Research Conducted using the Human Body Size Data

There are various studies that are in line with the human body size survey project. Specifically, there is a study that divides the lower body types of men in their 30s to design appropriate pant patterns [8]. In addition, another study divided the upper body type of male college students in their 20s into four clusters by extracting factors through principal component and K-means cluster analyses [9]. The first study was significant in that it suggested changes in clothing design or dimensions by statistically analyzing changes in human body dimensions.

In this section, we discuss the overall structure of the proposed model, pre-processing of the training data, and design.

A. Overall Structure

Fig. 1 shows the data pre-processing and model training processes for the human body size data. The data pre-processing procedure is discussed in Section III-B, and the convolutional neural network (CNN) classification process is discussed in Section III-C.

Fig. 1. Overall structure.

B. Data Item Selection and Pre-processing

Several items of human body size data were originally considered in the model. However, only nine of them were used, namely, the bust circumference, under bust circumference, waist circumference (natural indentation), waist circumference (omphailon), hip circumference, thigh circumference, knee circumference, calf circumference, and minimum leg circumference. The names set by the National Institute of Technology and Standards were used as is, and there were no data regarding the under bust circumference for men. Fig. 2 depicts the nine human body aspects using a model photo.

Fig. 2. Nine human body sizes selected for this study.

Nine items were cut in the column direction from the existing human body size data, and rows with missing values were removed. Afterwards, the data were scaled so that the mean was 0 and variance was 1, followed by standardization.

Figs. 3 and 4 show box plots of the data distribution before and after standardization for men in their 20s. The nine items mentioned above were sequentially expressed from left to right.

Fig. 3. Box plot for men in their 20s before standardization.

Fig. 4. Box plot for men in their 20s after standardization.

After the above process, clustering was performed according to body fat percentage using the K-means algorithm. Fig. 5 shows the pseudocode of the K-means algorithm [6].

Fig. 5. K-means clustering algorithm.

The elbow method was used to determine the value of k. Fig. 6 shows the results when only men in their 20s were considered; similar results were shown for other age groups and genders.

Fig. 6. Value of K when the elbow method is applied to men in 20s

However, if the group is divided into three groups, such as a cluster representing below average body fat percentage, average cluster, and cluster representing above average, results with a large amount of data in one cluster inevitably occur, making detailed classification difficult.

In this study, we did not use the optimal value but divided the values into five categories from the lowest to the highest case; the corresponding results are reported in Table 2.

Table 2 . Clustering results

Cluster numberMen’s fat rateWomen’s fat rate
Lowest13.774193548324.3294478527
Low16.188257575727.9261682242
20Average19.666666666631.6592039800
High23.836619718334.8061538461
Highest29.205263157842.7833333333

Lowest14.982812499924.7364197530
Low18.391954022928.4421874999
30Average20.224731182731.4851190476
High23.069426751535.4985714285
Highest27.690000000040.4210526315

Lowest16.581818181826.3729411764
Low19.321052631529.4008771929
40Average20.957943925232.4777777777
High24.584507042236.7283018867
Highest28.117391304342.7222222222

Lowest17.27528.2071428571
Low18.649152542331.0783132530
50Average21.380357142832.9857142857
High24.287500000035.22125
Highest25.064444444437.6268292682

Lowest16.380000000029.0653061224
Low21.276190476133.1511627906
60Average23.238461538433.7983870967
High26.125925925934.7949999999
Highest28.078571428539.6866666666


From the cluster results reported in Table 2, observe that when the average body fat percentage in each cluster is calculated, men and women in their 20s to 40s are evenly distributed.

However, after 50 years of age, the difference in the average body fat percentage interval corresponding to each group was not significant. The values represented in bold in Table 2 indicate cases where there was a difference of 1 or less. These results are because of the fact that the number of samples from the 50s onwards was much smaller than the number of people represented in the 7th body size dataset reported in Table 1.

C. CNN Classification Model Design

A CNN classification model was used to transform the Inception-v2 module developed by Google to fit the dimensions of the human body size dataset [10]. The reason for using this model is that it suggests extracting features from different hierarchies from different stems and combining them again, which is widely used.

Fig. 7 shows the model modified to fit the dimensions of the human body data to the Inception-v2 module.

Fig. 7. CNN classification model proposed in this study using the inception-v2 module.

As shown in Fig. 7, the Inception module can be imported and used directly from the Keras library. This is because the input data dimensions must match the preprocessed data dimensions.

For the input layer (?, 8, 1), eight items of data should be given as input; in the case of men, this means eight items of the human body size without the data for under bust circumference. For women (?, 9, 1) all nine items of human body size are used.

The first flow from the left is a section in which the features are extracted across three convolutional layers by extending them to 32 dimensions. The second flow is an interval in which features are extracted across two convolutional layers by extending them to 32 dimensions. The third flow is an average pooling layer to calculate the statistics and reduce the number of features by half to 32. This is a section that increases in dimensions. In the fourth flow, features are extracted by extending them to 32 dimensions using only one convolutional layer.

Because all four flows mentioned above have the same output dimension of 32, they are combined (concatenation layer) into one dimension (flatten layer) and then go through the dense layer using only 50 nodes. The final output dimensions became (?, 5) to classify the final five items using the dropout layer, which was used by leaving only 0.7% of the nodes. All convolutional layers used a rectified linear unit (ReLU).

Because this was a multi-classification task, the SoftMax function was used as the final activation function. Assuming that is the number of neurons in the output layer, is the input signal from the previous layer, and is the th output, the Soft- Max function can be defined as stated in Eq. (1).

yk=exp(ak)i=1nexp(ai)

D. Setting Model Hyperparameters

For the CNN classification model, the categorical crossentropy error was used as the loss function. Assuming that the number of data dimensions is k, value estimated by the neural network is yk, and target and correct answer is tk, the cross-entropy function can be defined as expressed in Eq. (2).

CEE=k t k logy k

Adam was used as the optimizer where the number of training iterations (epochs) was set to 100 for both training and validation. The batch size was set to 1000 for both training and validation.

For the model evaluation, the training and validation dataset proportion was 80 to 20, and a random shuffle was used. The Stratify technique was used to prevent the class ratio from being biased toward either training or validation datasets. Table 3 lists the hardware specifications.

Table 3 . Hardware specifications

CPUAMD Ryzen 7 2700x (Core x8, 3.70 Ghz)
RAM16 GB
GPUNvidia Geforce GTX 1660ti
StorageSSD 500 GB / HDD 2TB
OSWindows 10
VirtualEnvironment Anaconda 3

In this section, we discuss the performance indicators, considerations, and inference results of the final training results of the model.

A. Performance Indicators

Accuracy, precision, recall, and final loss were used as performance indicators. The accuracy, precision, and recall are defined as expressed in Equations 3 and 4. The loss values were previously mentioned in Section III-D.

Accuarcy=TP+TNTP+FN+FP+TN

Percision=TPTP+FPRecall=TPTP+FN

B. Learning Results

Table 4 reports the final learning results for both men and women in their 20s to 60s according to the performance indicators. The last column represents the learning time.

Table 4 . Final learning results

AccuracyPrecisionRecallLossTimes [s]
20Men0.87450.89620.86000.29435.52
Women0.90150.92490.88640.23593.29
30Men0.91730.93240.90190.22173.31
Women0.92590.94510.92410.19553.13
40Men0.89920.90530.88710.24503.28
Women0.88190.91640.87500.26813.14
50Men0.88640.90480.86360.27093.65
Women0.89510.91970.88110.27113.26
60Men0.92040.91740.88500.26763.33
Women0.93440.95430.91260.21383.38


Table 5 summarizes the verification results based on the performance indicators.

Table 5 . Final verification results

AccuracyPrecisionRecallLoss
20Men0.97700.97700.97700.1079
Women0.95480.95480.95480.1408
30Men0.96940.96940.96940.1018
Women0.95550.95550.95550.1014
40Men0.93540.93540.93540.1675
Women0.91660.91660.91660.1553
50Men0.97720.97720.97720.1405
Women0.98610.98610.98610.0893
60Men1.01.01.00.1245
Women0.93470.93180.89130.1489


C. Discussion

As a result of the proposed model training, when averaged by performance index, observe that the accuracy is 0.90366, precision is 0.92165, recall is 0.88768, loss value is 0.249, and learning time is 3.529 seconds.

When learning was verified correctly, the average values were 0.96067, 0.96038, 0.95633, and 0.12779 for accuracy, precision, recall, and loss, respectively.

This result indicates that clustering was performed correctly when labeling was performed with the clustering algorithm using the nine items of the human body size data. Subsequently, the average value of the corresponding cluster was calculated.

However, a correct classification cannot be achieved because the human body size data, which are the learning data, do not properly cluster the average body fat percentage in the groups classifying participants in their 50s to 60s compared to those classifying the participants in their 20s to 40s.

D. Inference Results

The cluster model files corresponding to the previously learned age and gender were saved as Pickle files, and the CNN classification model files were saved as H5 files so that inferences could be made separately. The average body fat percentage of each group was divided based on age and gender and saved in a CSV file.

The environment in which the user inputs data and inference result can be checked was implemented based on the Flask Web framework. The processing procedure comprised only the loading and execution of the aforementioned files. Fig. 8 shows the form used by the user to input the data, and Fig. 9 shows the corresponding results.

Fig. 8. Human body size input form using flask.

Fig. 9. Result for the test represented in Fig. 8.

In this study, easy-to-measure data comprising nine items, such as the chest and waist measurements, were utilized through human body size dataset, which is a public dataset provided by the National Institute of Technology and Standards. Based on age and gender, clustering was performed using the K-means algorithm, and the average body fat percentage of each cluster was calculated to obtain meaningful clustering results.

Based on these results, a model was proposed to use the Inception-v2 module to obtain five classification groups from the highest body fat percentage to the lowest body fat percentage by modifying the Inception-v2 module according to the data shape. As a result of the verification, accuracy, precision, and recall of more than 95% were obtained.

There was no significant difference between the results of the clusters in the 50s and 60s groups compared with the results of the clusters between those in their 20s and 40s.

Future research should supplement the current results and use a better clustering algorithm or CNN architecture. In addition, it is believed that rational decision-making will be possible in the field of healthcare or obesity analysis using the device through 3D scanning for circumference measurement.

  1. Smart & Company Ltd Convergence of artificial intelligence and healthcare, growing the technology commercialization market [Internet]. Available: http://m.elec4.co.kr/article/articleView.asp?idx=26993.
  2. J. Kim and B. Kang and H. Jung, Determination of coagulant input rate in water purification plant using K-means algorithm and GBR algorithm, Journal of the Korea Institute of Information and Communication Engineering, vol. 25, no. 6, pp. 792-798, Jun., 2021. DOI: 10.6109/jkiice.2021.25.6.792.
    CrossRef
  3. P.-H. Huynh and V. H. Nguyen and T.-N. Do, Enhancing gene expression classification of support vector machines with generative adversarial networks, Journal of Information and Communication Convergence Engineering, vol. 17, no. 1, pp. 14-20, Mar., 2019. DOI: 10.6109/jicce.2019.17.1.14.
    CrossRef
  4. R. Chefira, and S. Rakrak, A knowledge extraction pipeline between supervised and unsupervised machine learning using gaussian mixture models for anomaly detection, Journal of Computing Science and Engineering, vol. 15, no. 1, pp. 1-17, Mar., 2021. DOI: 10.5626/JCSE.2021.15.1.1.
    CrossRef
  5. J. Li and G. Huang and Y. Zhou, A sentiment classification approach of sentences clustering in webcast barrages, Journal of Information Processing Systems, vol. 16, no. 3, pp. 718-732, Jun., 2020. DOI: 10.3745/JIPS.04.0174.
    CrossRef
  6. J. Kim, and E. Park, and K. Han, and J. Lee, and H. J. Lee, A two-stage learning method of cnn and k-means rgb cluster for sentiment classification of images, Journal of Intelligence and Information Systems, vol. 27, no. 3, pp. 139-156, 2021. DOI: 10.13088/jiis.2021.27.3.139.
    CrossRef
  7. Korea Agency for Technology and Standards Size korea (korean human body size measurements) [Internet]. Available: https://size korea.kr/.
  8. E.-K. Kim, and Y. R. Nam, Analysis of lower body shape of men in their 30s for pants pattern designs - focus on changes in human dimensions and body type classification, Journal of the Korea Fashion & Costume Design Association, vol. 23, no. 2, pp. 133-146, 2021. DOI: 10.30751/kfcda.2021.23.2.133.
    CrossRef
  9. C. S. Joung, A study on the types of upper body shape of male university students, Journal of The Korean Society Design Culture, vol. 25, no. 1, pp. 453-463, Mar., 2019. DOI: 10.18208/ksdc.2019.25.1.453.
    CrossRef
  10. C. Szegedy, and V. Vanhoucke, and S. Ioffe, and J. Shlens, and Z. Wojna, Rethinking the inception architecture for computer vision, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, pp. 2818-2826, 2016. DOI: 10.1109/CVPR.2016.308.
    CrossRef

Taejun Lee

He received his bachelor's degree and master’s degree in computer science from Pai Chai University in 2020 and 2023, respectively. Since 2023, he has been pursuing a Ph.D. at Pai Chai University. His current research interests are deep learning, machine learning, big data, and computer vision.


Hakseong Kim

He received his bachelor's degree in electronics from Jeon Buk University in 1998. Then, he received his master's degree in information and communication engineering from Chung Nam University in 2010. His current research interests are deep learning, machine learning, big data, AI, and instrumentation & control.


Hoekyung Jung

He received his master’s degree in 1987 and Ph.D. in 1993 from the Department of Computer Engineering of Kwangwoon University, Korea. From 1994 to 1995, he worked for ETRI as a researcher. Since 1994, he has worked in the Department of Computer Engineering at Pai Chai University, where he now works as a professor. His current research interests include multimedia document architecture modeling, information processing, embedded system, machine learning, big data, and IoT.


Article

Regular paper

Journal of information and communication convergence engineering 2023; 21(2): 110-116

Published online June 30, 2023 https://doi.org/10.56977/jicce.2023.21.2.110

Copyright © Korea Institute of Information and Communication Engineering.

Design and Implementation of a Body Fat Classification Model using Human Body Size Data

Taejun Lee 1, Hakseong Kim 1, and Hoekyung Jung1* , Member, KIICE

1Department of Computer Engineering, PaiChai University, Daejeon 155-40, Republic of Korea

Correspondence to:Hoekyung Jung (E-mail: hkjung@pcu.ac.kr, Tel:+82-42-520-5640)
Department of Computer Engineering, PaiChai University, Daejeon, 35345 Republic of Korea

Received: October 30, 2021; Revised: December 6, 2021; Accepted: December 9, 2021

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Recently, as various examples of machine learning have been applied in the healthcare field, deep learning technology has been applied to various tasks, such as electrocardiogram examination and body composition analysis using wearable devices such as smart watches. To utilize deep learning, securing data is the most important procedure, where human intervention, such as data classification, is required. In this study, we propose a model that uses a clustering algorithm, namely, the K-means clustering, to label body fat according to gender and age considering body size aspects, such as chest circumference and waist circumference, and classifies body fat into five groups from high risk to low risk using a convolutional neural network (CNN). As a result of model validation, accuracy, precision, and recall results of more than 95% were obtained. Thus, rational decision making can be made in the field of healthcare or obesity analysis using the proposed method.

Keywords: Classification, CNN, fat rate percentage, human body size, K-means clustering

I. INTRODUCTION

Deep learning is being applied in various fields, including the healthcare field. Particularly, as the personal smartphone and wearable device markets grow, body composition analysis, electrocardiogram testing, and diet management are provided in an easily accessible manner to improve eating habits.

Securing data is important when applying deep learning. In the past, only medical records, medical insurance information, and data measured by users were used; however, information collected through various channels, such as the public, social media services, and genetic data, have recently been used [1].

It can be said that the pre-processing process of the collected information is the most important step. The preprocessing requires human intervention, such as removing missing values or outliers and clustering, which is in the domain of unsupervised learning [2-6].

In this study, using only easy-to-measure aspects, such as waist circumference, reported in the Size Korea public dataset provided by the National Institute of Technology, the body fat percentage data were clustered according to gender and age to create a target setpoint. Accordingly, we propose a model that undergoes a pre-processing step generate the desired setpoint.

It is believed that a model using both supervised and unsupervised learning can be used in obesity analysis or body shape management service cases.

II. RELATED WORKS

In this section, we examine human body size data provided by the National Institute of Technology and Standards and related research.

A. Human Body Size Data

Since 1978, the National Institute of Technology and Standards governed by the Ministry of Trade, Industry, and Energy has been promoting the human body dimensional survey dissemination project to measure and standardize the human body information of Koreans and produce products that are convenient for Koreans to use with ergonomic designs [7].

The latest data are provided in the 7th human body size dataset released in December 2015. The total number of people measured was 6,413, which were categorized based on age and gender as reported in Table 1.

Table 1 . People groups included in the 7th Korean human body size survey project.

ManWomanTotal
16-199989261,924
20-298696681,573
30-396556751,330
40-49311359670
50-59221358579
60-69145228373
Total3,1993,2146,413


B. Related Research Conducted using the Human Body Size Data

There are various studies that are in line with the human body size survey project. Specifically, there is a study that divides the lower body types of men in their 30s to design appropriate pant patterns [8]. In addition, another study divided the upper body type of male college students in their 20s into four clusters by extracting factors through principal component and K-means cluster analyses [9]. The first study was significant in that it suggested changes in clothing design or dimensions by statistically analyzing changes in human body dimensions.

III. DATA PREPROCESSING AND MODEL DESIGN

In this section, we discuss the overall structure of the proposed model, pre-processing of the training data, and design.

A. Overall Structure

Fig. 1 shows the data pre-processing and model training processes for the human body size data. The data pre-processing procedure is discussed in Section III-B, and the convolutional neural network (CNN) classification process is discussed in Section III-C.

Figure 1. Overall structure.

B. Data Item Selection and Pre-processing

Several items of human body size data were originally considered in the model. However, only nine of them were used, namely, the bust circumference, under bust circumference, waist circumference (natural indentation), waist circumference (omphailon), hip circumference, thigh circumference, knee circumference, calf circumference, and minimum leg circumference. The names set by the National Institute of Technology and Standards were used as is, and there were no data regarding the under bust circumference for men. Fig. 2 depicts the nine human body aspects using a model photo.

Figure 2. Nine human body sizes selected for this study.

Nine items were cut in the column direction from the existing human body size data, and rows with missing values were removed. Afterwards, the data were scaled so that the mean was 0 and variance was 1, followed by standardization.

Figs. 3 and 4 show box plots of the data distribution before and after standardization for men in their 20s. The nine items mentioned above were sequentially expressed from left to right.

Figure 3. Box plot for men in their 20s before standardization.

Figure 4. Box plot for men in their 20s after standardization.

After the above process, clustering was performed according to body fat percentage using the K-means algorithm. Fig. 5 shows the pseudocode of the K-means algorithm [6].

Figure 5. K-means clustering algorithm.

The elbow method was used to determine the value of k. Fig. 6 shows the results when only men in their 20s were considered; similar results were shown for other age groups and genders.

Figure 6. Value of K when the elbow method is applied to men in 20s

However, if the group is divided into three groups, such as a cluster representing below average body fat percentage, average cluster, and cluster representing above average, results with a large amount of data in one cluster inevitably occur, making detailed classification difficult.

In this study, we did not use the optimal value but divided the values into five categories from the lowest to the highest case; the corresponding results are reported in Table 2.

Table 2 . Clustering results.

Cluster numberMen’s fat rateWomen’s fat rate
Lowest13.774193548324.3294478527
Low16.188257575727.9261682242
20Average19.666666666631.6592039800
High23.836619718334.8061538461
Highest29.205263157842.7833333333

Lowest14.982812499924.7364197530
Low18.391954022928.4421874999
30Average20.224731182731.4851190476
High23.069426751535.4985714285
Highest27.690000000040.4210526315

Lowest16.581818181826.3729411764
Low19.321052631529.4008771929
40Average20.957943925232.4777777777
High24.584507042236.7283018867
Highest28.117391304342.7222222222

Lowest17.27528.2071428571
Low18.649152542331.0783132530
50Average21.380357142832.9857142857
High24.287500000035.22125
Highest25.064444444437.6268292682

Lowest16.380000000029.0653061224
Low21.276190476133.1511627906
60Average23.238461538433.7983870967
High26.125925925934.7949999999
Highest28.078571428539.6866666666


From the cluster results reported in Table 2, observe that when the average body fat percentage in each cluster is calculated, men and women in their 20s to 40s are evenly distributed.

However, after 50 years of age, the difference in the average body fat percentage interval corresponding to each group was not significant. The values represented in bold in Table 2 indicate cases where there was a difference of 1 or less. These results are because of the fact that the number of samples from the 50s onwards was much smaller than the number of people represented in the 7th body size dataset reported in Table 1.

C. CNN Classification Model Design

A CNN classification model was used to transform the Inception-v2 module developed by Google to fit the dimensions of the human body size dataset [10]. The reason for using this model is that it suggests extracting features from different hierarchies from different stems and combining them again, which is widely used.

Fig. 7 shows the model modified to fit the dimensions of the human body data to the Inception-v2 module.

Figure 7. CNN classification model proposed in this study using the inception-v2 module.

As shown in Fig. 7, the Inception module can be imported and used directly from the Keras library. This is because the input data dimensions must match the preprocessed data dimensions.

For the input layer (?, 8, 1), eight items of data should be given as input; in the case of men, this means eight items of the human body size without the data for under bust circumference. For women (?, 9, 1) all nine items of human body size are used.

The first flow from the left is a section in which the features are extracted across three convolutional layers by extending them to 32 dimensions. The second flow is an interval in which features are extracted across two convolutional layers by extending them to 32 dimensions. The third flow is an average pooling layer to calculate the statistics and reduce the number of features by half to 32. This is a section that increases in dimensions. In the fourth flow, features are extracted by extending them to 32 dimensions using only one convolutional layer.

Because all four flows mentioned above have the same output dimension of 32, they are combined (concatenation layer) into one dimension (flatten layer) and then go through the dense layer using only 50 nodes. The final output dimensions became (?, 5) to classify the final five items using the dropout layer, which was used by leaving only 0.7% of the nodes. All convolutional layers used a rectified linear unit (ReLU).

Because this was a multi-classification task, the SoftMax function was used as the final activation function. Assuming that is the number of neurons in the output layer, is the input signal from the previous layer, and is the th output, the Soft- Max function can be defined as stated in Eq. (1).

yk=exp(ak)i=1nexp(ai)

D. Setting Model Hyperparameters

For the CNN classification model, the categorical crossentropy error was used as the loss function. Assuming that the number of data dimensions is k, value estimated by the neural network is yk, and target and correct answer is tk, the cross-entropy function can be defined as expressed in Eq. (2).

CEE=k t k logy k

Adam was used as the optimizer where the number of training iterations (epochs) was set to 100 for both training and validation. The batch size was set to 1000 for both training and validation.

For the model evaluation, the training and validation dataset proportion was 80 to 20, and a random shuffle was used. The Stratify technique was used to prevent the class ratio from being biased toward either training or validation datasets. Table 3 lists the hardware specifications.

Table 3 . Hardware specifications.

CPUAMD Ryzen 7 2700x (Core x8, 3.70 Ghz)
RAM16 GB
GPUNvidia Geforce GTX 1660ti
StorageSSD 500 GB / HDD 2TB
OSWindows 10
VirtualEnvironment Anaconda 3

IV. MODEL TRAINING RESULTS

In this section, we discuss the performance indicators, considerations, and inference results of the final training results of the model.

A. Performance Indicators

Accuracy, precision, recall, and final loss were used as performance indicators. The accuracy, precision, and recall are defined as expressed in Equations 3 and 4. The loss values were previously mentioned in Section III-D.

Accuarcy=TP+TNTP+FN+FP+TN

Percision=TPTP+FPRecall=TPTP+FN

B. Learning Results

Table 4 reports the final learning results for both men and women in their 20s to 60s according to the performance indicators. The last column represents the learning time.

Table 4 . Final learning results.

AccuracyPrecisionRecallLossTimes [s]
20Men0.87450.89620.86000.29435.52
Women0.90150.92490.88640.23593.29
30Men0.91730.93240.90190.22173.31
Women0.92590.94510.92410.19553.13
40Men0.89920.90530.88710.24503.28
Women0.88190.91640.87500.26813.14
50Men0.88640.90480.86360.27093.65
Women0.89510.91970.88110.27113.26
60Men0.92040.91740.88500.26763.33
Women0.93440.95430.91260.21383.38


Table 5 summarizes the verification results based on the performance indicators.

Table 5 . Final verification results.

AccuracyPrecisionRecallLoss
20Men0.97700.97700.97700.1079
Women0.95480.95480.95480.1408
30Men0.96940.96940.96940.1018
Women0.95550.95550.95550.1014
40Men0.93540.93540.93540.1675
Women0.91660.91660.91660.1553
50Men0.97720.97720.97720.1405
Women0.98610.98610.98610.0893
60Men1.01.01.00.1245
Women0.93470.93180.89130.1489


C. Discussion

As a result of the proposed model training, when averaged by performance index, observe that the accuracy is 0.90366, precision is 0.92165, recall is 0.88768, loss value is 0.249, and learning time is 3.529 seconds.

When learning was verified correctly, the average values were 0.96067, 0.96038, 0.95633, and 0.12779 for accuracy, precision, recall, and loss, respectively.

This result indicates that clustering was performed correctly when labeling was performed with the clustering algorithm using the nine items of the human body size data. Subsequently, the average value of the corresponding cluster was calculated.

However, a correct classification cannot be achieved because the human body size data, which are the learning data, do not properly cluster the average body fat percentage in the groups classifying participants in their 50s to 60s compared to those classifying the participants in their 20s to 40s.

D. Inference Results

The cluster model files corresponding to the previously learned age and gender were saved as Pickle files, and the CNN classification model files were saved as H5 files so that inferences could be made separately. The average body fat percentage of each group was divided based on age and gender and saved in a CSV file.

The environment in which the user inputs data and inference result can be checked was implemented based on the Flask Web framework. The processing procedure comprised only the loading and execution of the aforementioned files. Fig. 8 shows the form used by the user to input the data, and Fig. 9 shows the corresponding results.

Figure 8. Human body size input form using flask.

Figure 9. Result for the test represented in Fig. 8.

V. CONCLUSIONS

In this study, easy-to-measure data comprising nine items, such as the chest and waist measurements, were utilized through human body size dataset, which is a public dataset provided by the National Institute of Technology and Standards. Based on age and gender, clustering was performed using the K-means algorithm, and the average body fat percentage of each cluster was calculated to obtain meaningful clustering results.

Based on these results, a model was proposed to use the Inception-v2 module to obtain five classification groups from the highest body fat percentage to the lowest body fat percentage by modifying the Inception-v2 module according to the data shape. As a result of the verification, accuracy, precision, and recall of more than 95% were obtained.

There was no significant difference between the results of the clusters in the 50s and 60s groups compared with the results of the clusters between those in their 20s and 40s.

Future research should supplement the current results and use a better clustering algorithm or CNN architecture. In addition, it is believed that rational decision-making will be possible in the field of healthcare or obesity analysis using the device through 3D scanning for circumference measurement.

ACKNOWLEDGMENTS

This research was supported by a 2023 Pai Chai University research grant.

Fig 1.

Figure 1.Overall structure.
Journal of Information and Communication Convergence Engineering 2023; 21: 110-116https://doi.org/10.56977/jicce.2023.21.2.110

Fig 2.

Figure 2.Nine human body sizes selected for this study.
Journal of Information and Communication Convergence Engineering 2023; 21: 110-116https://doi.org/10.56977/jicce.2023.21.2.110

Fig 3.

Figure 3.Box plot for men in their 20s before standardization.
Journal of Information and Communication Convergence Engineering 2023; 21: 110-116https://doi.org/10.56977/jicce.2023.21.2.110

Fig 4.

Figure 4.Box plot for men in their 20s after standardization.
Journal of Information and Communication Convergence Engineering 2023; 21: 110-116https://doi.org/10.56977/jicce.2023.21.2.110

Fig 5.

Figure 5.K-means clustering algorithm.
Journal of Information and Communication Convergence Engineering 2023; 21: 110-116https://doi.org/10.56977/jicce.2023.21.2.110

Fig 6.

Figure 6.Value of K when the elbow method is applied to men in 20s
Journal of Information and Communication Convergence Engineering 2023; 21: 110-116https://doi.org/10.56977/jicce.2023.21.2.110

Fig 7.

Figure 7.CNN classification model proposed in this study using the inception-v2 module.
Journal of Information and Communication Convergence Engineering 2023; 21: 110-116https://doi.org/10.56977/jicce.2023.21.2.110

Fig 8.

Figure 8.Human body size input form using flask.
Journal of Information and Communication Convergence Engineering 2023; 21: 110-116https://doi.org/10.56977/jicce.2023.21.2.110

Fig 9.

Figure 9.Result for the test represented in Fig. 8.
Journal of Information and Communication Convergence Engineering 2023; 21: 110-116https://doi.org/10.56977/jicce.2023.21.2.110

Table 1 . People groups included in the 7th Korean human body size survey project.

ManWomanTotal
16-199989261,924
20-298696681,573
30-396556751,330
40-49311359670
50-59221358579
60-69145228373
Total3,1993,2146,413

Table 2 . Clustering results.

Cluster numberMen’s fat rateWomen’s fat rate
Lowest13.774193548324.3294478527
Low16.188257575727.9261682242
20Average19.666666666631.6592039800
High23.836619718334.8061538461
Highest29.205263157842.7833333333

Lowest14.982812499924.7364197530
Low18.391954022928.4421874999
30Average20.224731182731.4851190476
High23.069426751535.4985714285
Highest27.690000000040.4210526315

Lowest16.581818181826.3729411764
Low19.321052631529.4008771929
40Average20.957943925232.4777777777
High24.584507042236.7283018867
Highest28.117391304342.7222222222

Lowest17.27528.2071428571
Low18.649152542331.0783132530
50Average21.380357142832.9857142857
High24.287500000035.22125
Highest25.064444444437.6268292682

Lowest16.380000000029.0653061224
Low21.276190476133.1511627906
60Average23.238461538433.7983870967
High26.125925925934.7949999999
Highest28.078571428539.6866666666

Table 3 . Hardware specifications.

CPUAMD Ryzen 7 2700x (Core x8, 3.70 Ghz)
RAM16 GB
GPUNvidia Geforce GTX 1660ti
StorageSSD 500 GB / HDD 2TB
OSWindows 10
VirtualEnvironment Anaconda 3

Table 4 . Final learning results.

AccuracyPrecisionRecallLossTimes [s]
20Men0.87450.89620.86000.29435.52
Women0.90150.92490.88640.23593.29
30Men0.91730.93240.90190.22173.31
Women0.92590.94510.92410.19553.13
40Men0.89920.90530.88710.24503.28
Women0.88190.91640.87500.26813.14
50Men0.88640.90480.86360.27093.65
Women0.89510.91970.88110.27113.26
60Men0.92040.91740.88500.26763.33
Women0.93440.95430.91260.21383.38

Table 5 . Final verification results.

AccuracyPrecisionRecallLoss
20Men0.97700.97700.97700.1079
Women0.95480.95480.95480.1408
30Men0.96940.96940.96940.1018
Women0.95550.95550.95550.1014
40Men0.93540.93540.93540.1675
Women0.91660.91660.91660.1553
50Men0.97720.97720.97720.1405
Women0.98610.98610.98610.0893
60Men1.01.01.00.1245
Women0.93470.93180.89130.1489

References

  1. Smart & Company Ltd Convergence of artificial intelligence and healthcare, growing the technology commercialization market [Internet]. Available: http://m.elec4.co.kr/article/articleView.asp?idx=26993.
  2. J. Kim and B. Kang and H. Jung, Determination of coagulant input rate in water purification plant using K-means algorithm and GBR algorithm, Journal of the Korea Institute of Information and Communication Engineering, vol. 25, no. 6, pp. 792-798, Jun., 2021. DOI: 10.6109/jkiice.2021.25.6.792.
    CrossRef
  3. P.-H. Huynh and V. H. Nguyen and T.-N. Do, Enhancing gene expression classification of support vector machines with generative adversarial networks, Journal of Information and Communication Convergence Engineering, vol. 17, no. 1, pp. 14-20, Mar., 2019. DOI: 10.6109/jicce.2019.17.1.14.
    CrossRef
  4. R. Chefira, and S. Rakrak, A knowledge extraction pipeline between supervised and unsupervised machine learning using gaussian mixture models for anomaly detection, Journal of Computing Science and Engineering, vol. 15, no. 1, pp. 1-17, Mar., 2021. DOI: 10.5626/JCSE.2021.15.1.1.
    CrossRef
  5. J. Li and G. Huang and Y. Zhou, A sentiment classification approach of sentences clustering in webcast barrages, Journal of Information Processing Systems, vol. 16, no. 3, pp. 718-732, Jun., 2020. DOI: 10.3745/JIPS.04.0174.
    CrossRef
  6. J. Kim, and E. Park, and K. Han, and J. Lee, and H. J. Lee, A two-stage learning method of cnn and k-means rgb cluster for sentiment classification of images, Journal of Intelligence and Information Systems, vol. 27, no. 3, pp. 139-156, 2021. DOI: 10.13088/jiis.2021.27.3.139.
    CrossRef
  7. Korea Agency for Technology and Standards Size korea (korean human body size measurements) [Internet]. Available: https://size korea.kr/.
  8. E.-K. Kim, and Y. R. Nam, Analysis of lower body shape of men in their 30s for pants pattern designs - focus on changes in human dimensions and body type classification, Journal of the Korea Fashion & Costume Design Association, vol. 23, no. 2, pp. 133-146, 2021. DOI: 10.30751/kfcda.2021.23.2.133.
    CrossRef
  9. C. S. Joung, A study on the types of upper body shape of male university students, Journal of The Korean Society Design Culture, vol. 25, no. 1, pp. 453-463, Mar., 2019. DOI: 10.18208/ksdc.2019.25.1.453.
    CrossRef
  10. C. Szegedy, and V. Vanhoucke, and S. Ioffe, and J. Shlens, and Z. Wojna, Rethinking the inception architecture for computer vision, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, pp. 2818-2826, 2016. DOI: 10.1109/CVPR.2016.308.
    CrossRef
JICCE
Sep 30, 2024 Vol.22 No.3, pp. 173~266

Stats or Metrics

Share this article on

  • line

Related articles in JICCE

Journal of Information and Communication Convergence Engineering Jouranl of information and
communication convergence engineering
(J. Inf. Commun. Converg. Eng.)

eISSN 2234-8883
pISSN 2234-8255