Journal of information and communication convergence engineering 2023; 21(2): 110-116
Published online June 30, 2023
https://doi.org/10.56977/jicce.2023.21.2.110
© Korea Institute of Information and Communication Engineering
Correspondence to : Hoekyung Jung (E-mail: hkjung@pcu.ac.kr, Tel:+82-42-520-5640)
Department of Computer Engineering, PaiChai University, Daejeon, 35345 Republic of Korea
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Recently, as various examples of machine learning have been applied in the healthcare field, deep learning technology has been applied to various tasks, such as electrocardiogram examination and body composition analysis using wearable devices such as smart watches. To utilize deep learning, securing data is the most important procedure, where human intervention, such as data classification, is required. In this study, we propose a model that uses a clustering algorithm, namely, the K-means clustering, to label body fat according to gender and age considering body size aspects, such as chest circumference and waist circumference, and classifies body fat into five groups from high risk to low risk using a convolutional neural network (CNN). As a result of model validation, accuracy, precision, and recall results of more than 95% were obtained. Thus, rational decision making can be made in the field of healthcare or obesity analysis using the proposed method.
Keywords Classification, CNN, fat rate percentage, human body size, K-means clustering
Deep learning is being applied in various fields, including the healthcare field. Particularly, as the personal smartphone and wearable device markets grow, body composition analysis, electrocardiogram testing, and diet management are provided in an easily accessible manner to improve eating habits.
Securing data is important when applying deep learning. In the past, only medical records, medical insurance information, and data measured by users were used; however, information collected through various channels, such as the public, social media services, and genetic data, have recently been used [1].
It can be said that the pre-processing process of the collected information is the most important step. The preprocessing requires human intervention, such as removing missing values or outliers and clustering, which is in the domain of unsupervised learning [2-6].
In this study, using only easy-to-measure aspects, such as waist circumference, reported in the Size Korea public dataset provided by the National Institute of Technology, the body fat percentage data were clustered according to gender and age to create a target setpoint. Accordingly, we propose a model that undergoes a pre-processing step generate the desired setpoint.
It is believed that a model using both supervised and unsupervised learning can be used in obesity analysis or body shape management service cases.
In this section, we examine human body size data provided by the National Institute of Technology and Standards and related research.
Since 1978, the National Institute of Technology and Standards governed by the Ministry of Trade, Industry, and Energy has been promoting the human body dimensional survey dissemination project to measure and standardize the human body information of Koreans and produce products that are convenient for Koreans to use with ergonomic designs [7].
The latest data are provided in the 7th human body size dataset released in December 2015. The total number of people measured was 6,413, which were categorized based on age and gender as reported in Table 1.
Table 1 . People groups included in the 7th Korean human body size survey project
Man | Woman | Total | |
---|---|---|---|
16-19 | 998 | 926 | 1,924 |
20-29 | 869 | 668 | 1,573 |
30-39 | 655 | 675 | 1,330 |
40-49 | 311 | 359 | 670 |
50-59 | 221 | 358 | 579 |
60-69 | 145 | 228 | 373 |
Total | 3,199 | 3,214 | 6,413 |
There are various studies that are in line with the human body size survey project. Specifically, there is a study that divides the lower body types of men in their 30s to design appropriate pant patterns [8]. In addition, another study divided the upper body type of male college students in their 20s into four clusters by extracting factors through principal component and K-means cluster analyses [9]. The first study was significant in that it suggested changes in clothing design or dimensions by statistically analyzing changes in human body dimensions.
In this section, we discuss the overall structure of the proposed model, pre-processing of the training data, and design.
Fig. 1 shows the data pre-processing and model training processes for the human body size data. The data pre-processing procedure is discussed in Section III-B, and the convolutional neural network (CNN) classification process is discussed in Section III-C.
Several items of human body size data were originally considered in the model. However, only nine of them were used, namely, the bust circumference, under bust circumference, waist circumference (natural indentation), waist circumference (omphailon), hip circumference, thigh circumference, knee circumference, calf circumference, and minimum leg circumference. The names set by the National Institute of Technology and Standards were used as is, and there were no data regarding the under bust circumference for men. Fig. 2 depicts the nine human body aspects using a model photo.
Nine items were cut in the column direction from the existing human body size data, and rows with missing values were removed. Afterwards, the data were scaled so that the mean was 0 and variance was 1, followed by standardization.
Figs. 3 and 4 show box plots of the data distribution before and after standardization for men in their 20s. The nine items mentioned above were sequentially expressed from left to right.
After the above process, clustering was performed according to body fat percentage using the K-means algorithm. Fig. 5 shows the pseudocode of the K-means algorithm [6].
The elbow method was used to determine the value of
However, if the group is divided into three groups, such as a cluster representing below average body fat percentage, average cluster, and cluster representing above average, results with a large amount of data in one cluster inevitably occur, making detailed classification difficult.
In this study, we did not use the optimal value but divided the values into five categories from the lowest to the highest case; the corresponding results are reported in Table 2.
Table 2 . Clustering results
Cluster number | Men’s fat rate | Women’s fat rate | |
---|---|---|---|
Lowest | 13.7741935483 | 24.3294478527 | |
Low | 16.1882575757 | 27.9261682242 | |
20 | Average | 19.6666666666 | 31.6592039800 |
High | 23.8366197183 | 34.8061538461 | |
Highest | 29.2052631578 | 42.7833333333 | |
Lowest | 14.9828124999 | 24.7364197530 | |
Low | 18.3919540229 | 28.4421874999 | |
30 | Average | 20.2247311827 | 31.4851190476 |
High | 23.0694267515 | 35.4985714285 | |
Highest | 27.6900000000 | 40.4210526315 | |
Lowest | 16.5818181818 | 26.3729411764 | |
Low | 19.3210526315 | 29.4008771929 | |
40 | Average | 20.9579439252 | 32.4777777777 |
High | 24.5845070422 | 36.7283018867 | |
Highest | 28.1173913043 | 42.7222222222 | |
Lowest | 17.275 | 28.2071428571 | |
Low | 18.6491525423 | 31.0783132530 | |
50 | Average | 21.3803571428 | 32.9857142857 |
High | 24.2875000000 | 35.22125 | |
Highest | 25.0644444444 | 37.6268292682 | |
Lowest | 16.3800000000 | 29.0653061224 | |
Low | 21.2761904761 | 33.1511627906 | |
60 | Average | 23.2384615384 | 33.7983870967 |
High | 26.1259259259 | 34.7949999999 | |
Highest | 28.0785714285 | 39.6866666666 |
From the cluster results reported in Table 2, observe that when the average body fat percentage in each cluster is calculated, men and women in their 20s to 40s are evenly distributed.
However, after 50 years of age, the difference in the average body fat percentage interval corresponding to each group was not significant. The values represented in bold in Table 2 indicate cases where there was a difference of 1 or less. These results are because of the fact that the number of samples from the 50s onwards was much smaller than the number of people represented in the 7th body size dataset reported in Table 1.
A CNN classification model was used to transform the Inception-v2 module developed by Google to fit the dimensions of the human body size dataset [10]. The reason for using this model is that it suggests extracting features from different hierarchies from different stems and combining them again, which is widely used.
Fig. 7 shows the model modified to fit the dimensions of the human body data to the Inception-v2 module.
As shown in Fig. 7, the Inception module can be imported and used directly from the Keras library. This is because the input data dimensions must match the preprocessed data dimensions.
For the input layer (?, 8, 1), eight items of data should be given as input; in the case of men, this means eight items of the human body size without the data for under bust circumference. For women (?, 9, 1) all nine items of human body size are used.
The first flow from the left is a section in which the features are extracted across three convolutional layers by extending them to 32 dimensions. The second flow is an interval in which features are extracted across two convolutional layers by extending them to 32 dimensions. The third flow is an average pooling layer to calculate the statistics and reduce the number of features by half to 32. This is a section that increases in dimensions. In the fourth flow, features are extracted by extending them to 32 dimensions using only one convolutional layer.
Because all four flows mentioned above have the same output dimension of 32, they are combined (concatenation layer) into one dimension (flatten layer) and then go through the dense layer using only 50 nodes. The final output dimensions became (?, 5) to classify the final five items using the dropout layer, which was used by leaving only 0.7% of the nodes. All convolutional layers used a rectified linear unit (ReLU).
Because this was a multi-classification task, the SoftMax function was used as the final activation function. Assuming that is the number of neurons in the output layer, is the input signal from the previous layer, and is the th output, the Soft- Max function can be defined as stated in Eq. (1).
For the CNN classification model, the categorical crossentropy error was used as the loss function. Assuming that the number of data dimensions is
Adam was used as the optimizer where the number of training iterations (epochs) was set to 100 for both training and validation. The batch size was set to 1000 for both training and validation.
For the model evaluation, the training and validation dataset proportion was 80 to 20, and a random shuffle was used. The Stratify technique was used to prevent the class ratio from being biased toward either training or validation datasets. Table 3 lists the hardware specifications.
Table 3 . Hardware specifications
CPU | AMD Ryzen 7 2700x (Core x8, 3.70 Ghz) |
RAM | 16 GB |
GPU | Nvidia Geforce GTX 1660ti |
Storage | SSD 500 GB / HDD 2TB |
OS | Windows 10 |
Virtual | Environment Anaconda 3 |
In this section, we discuss the performance indicators, considerations, and inference results of the final training results of the model.
Accuracy, precision, recall, and final loss were used as performance indicators. The accuracy, precision, and recall are defined as expressed in Equations 3 and 4. The loss values were previously mentioned in Section III-D.
Table 4 reports the final learning results for both men and women in their 20s to 60s according to the performance indicators. The last column represents the learning time.
Table 4 . Final learning results
Accuracy | Precision | Recall | Loss | Times [s] | ||
---|---|---|---|---|---|---|
20 | Men | 0.8745 | 0.8962 | 0.8600 | 0.2943 | 5.52 |
Women | 0.9015 | 0.9249 | 0.8864 | 0.2359 | 3.29 | |
30 | Men | 0.9173 | 0.9324 | 0.9019 | 0.2217 | 3.31 |
Women | 0.9259 | 0.9451 | 0.9241 | 0.1955 | 3.13 | |
40 | Men | 0.8992 | 0.9053 | 0.8871 | 0.2450 | 3.28 |
Women | 0.8819 | 0.9164 | 0.8750 | 0.2681 | 3.14 | |
50 | Men | 0.8864 | 0.9048 | 0.8636 | 0.2709 | 3.65 |
Women | 0.8951 | 0.9197 | 0.8811 | 0.2711 | 3.26 | |
60 | Men | 0.9204 | 0.9174 | 0.8850 | 0.2676 | 3.33 |
Women | 0.9344 | 0.9543 | 0.9126 | 0.2138 | 3.38 |
Table 5 summarizes the verification results based on the performance indicators.
Table 5 . Final verification results
Accuracy | Precision | Recall | Loss | ||
---|---|---|---|---|---|
20 | Men | 0.9770 | 0.9770 | 0.9770 | 0.1079 |
Women | 0.9548 | 0.9548 | 0.9548 | 0.1408 | |
30 | Men | 0.9694 | 0.9694 | 0.9694 | 0.1018 |
Women | 0.9555 | 0.9555 | 0.9555 | 0.1014 | |
40 | Men | 0.9354 | 0.9354 | 0.9354 | 0.1675 |
Women | 0.9166 | 0.9166 | 0.9166 | 0.1553 | |
50 | Men | 0.9772 | 0.9772 | 0.9772 | 0.1405 |
Women | 0.9861 | 0.9861 | 0.9861 | 0.0893 | |
60 | Men | 1.0 | 1.0 | 1.0 | 0.1245 |
Women | 0.9347 | 0.9318 | 0.8913 | 0.1489 |
As a result of the proposed model training, when averaged by performance index, observe that the accuracy is 0.90366, precision is 0.92165, recall is 0.88768, loss value is 0.249, and learning time is 3.529 seconds.
When learning was verified correctly, the average values were 0.96067, 0.96038, 0.95633, and 0.12779 for accuracy, precision, recall, and loss, respectively.
This result indicates that clustering was performed correctly when labeling was performed with the clustering algorithm using the nine items of the human body size data. Subsequently, the average value of the corresponding cluster was calculated.
However, a correct classification cannot be achieved because the human body size data, which are the learning data, do not properly cluster the average body fat percentage in the groups classifying participants in their 50s to 60s compared to those classifying the participants in their 20s to 40s.
The cluster model files corresponding to the previously learned age and gender were saved as Pickle files, and the CNN classification model files were saved as H5 files so that inferences could be made separately. The average body fat percentage of each group was divided based on age and gender and saved in a CSV file.
The environment in which the user inputs data and inference result can be checked was implemented based on the Flask Web framework. The processing procedure comprised only the loading and execution of the aforementioned files. Fig. 8 shows the form used by the user to input the data, and Fig. 9 shows the corresponding results.
In this study, easy-to-measure data comprising nine items, such as the chest and waist measurements, were utilized through human body size dataset, which is a public dataset provided by the National Institute of Technology and Standards. Based on age and gender, clustering was performed using the K-means algorithm, and the average body fat percentage of each cluster was calculated to obtain meaningful clustering results.
Based on these results, a model was proposed to use the Inception-v2 module to obtain five classification groups from the highest body fat percentage to the lowest body fat percentage by modifying the Inception-v2 module according to the data shape. As a result of the verification, accuracy, precision, and recall of more than 95% were obtained.
There was no significant difference between the results of the clusters in the 50s and 60s groups compared with the results of the clusters between those in their 20s and 40s.
Future research should supplement the current results and use a better clustering algorithm or CNN architecture. In addition, it is believed that rational decision-making will be possible in the field of healthcare or obesity analysis using the device through 3D scanning for circumference measurement.
This research was supported by a 2023 Pai Chai University research grant.
He received his bachelor's degree and master’s degree in computer science from Pai Chai University in 2020 and 2023, respectively. Since 2023, he has been pursuing a Ph.D. at Pai Chai University. His current research interests are deep learning, machine learning, big data, and computer vision.
He received his bachelor's degree in electronics from Jeon Buk University in 1998. Then, he received his master's degree in information and communication engineering from Chung Nam University in 2010. His current research interests are deep learning, machine learning, big data, AI, and instrumentation & control.
He received his master’s degree in 1987 and Ph.D. in 1993 from the Department of Computer Engineering of Kwangwoon University, Korea. From 1994 to 1995, he worked for ETRI as a researcher. Since 1994, he has worked in the Department of Computer Engineering at Pai Chai University, where he now works as a professor. His current research interests include multimedia document architecture modeling, information processing, embedded system, machine learning, big data, and IoT.
Journal of information and communication convergence engineering 2023; 21(2): 110-116
Published online June 30, 2023 https://doi.org/10.56977/jicce.2023.21.2.110
Copyright © Korea Institute of Information and Communication Engineering.
Taejun Lee 1, Hakseong Kim 1, and Hoekyung Jung1* , Member, KIICE
1Department of Computer Engineering, PaiChai University, Daejeon 155-40, Republic of Korea
Correspondence to:Hoekyung Jung (E-mail: hkjung@pcu.ac.kr, Tel:+82-42-520-5640)
Department of Computer Engineering, PaiChai University, Daejeon, 35345 Republic of Korea
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Recently, as various examples of machine learning have been applied in the healthcare field, deep learning technology has been applied to various tasks, such as electrocardiogram examination and body composition analysis using wearable devices such as smart watches. To utilize deep learning, securing data is the most important procedure, where human intervention, such as data classification, is required. In this study, we propose a model that uses a clustering algorithm, namely, the K-means clustering, to label body fat according to gender and age considering body size aspects, such as chest circumference and waist circumference, and classifies body fat into five groups from high risk to low risk using a convolutional neural network (CNN). As a result of model validation, accuracy, precision, and recall results of more than 95% were obtained. Thus, rational decision making can be made in the field of healthcare or obesity analysis using the proposed method.
Keywords: Classification, CNN, fat rate percentage, human body size, K-means clustering
Deep learning is being applied in various fields, including the healthcare field. Particularly, as the personal smartphone and wearable device markets grow, body composition analysis, electrocardiogram testing, and diet management are provided in an easily accessible manner to improve eating habits.
Securing data is important when applying deep learning. In the past, only medical records, medical insurance information, and data measured by users were used; however, information collected through various channels, such as the public, social media services, and genetic data, have recently been used [1].
It can be said that the pre-processing process of the collected information is the most important step. The preprocessing requires human intervention, such as removing missing values or outliers and clustering, which is in the domain of unsupervised learning [2-6].
In this study, using only easy-to-measure aspects, such as waist circumference, reported in the Size Korea public dataset provided by the National Institute of Technology, the body fat percentage data were clustered according to gender and age to create a target setpoint. Accordingly, we propose a model that undergoes a pre-processing step generate the desired setpoint.
It is believed that a model using both supervised and unsupervised learning can be used in obesity analysis or body shape management service cases.
In this section, we examine human body size data provided by the National Institute of Technology and Standards and related research.
Since 1978, the National Institute of Technology and Standards governed by the Ministry of Trade, Industry, and Energy has been promoting the human body dimensional survey dissemination project to measure and standardize the human body information of Koreans and produce products that are convenient for Koreans to use with ergonomic designs [7].
The latest data are provided in the 7th human body size dataset released in December 2015. The total number of people measured was 6,413, which were categorized based on age and gender as reported in Table 1.
Table 1 . People groups included in the 7th Korean human body size survey project.
Man | Woman | Total | |
---|---|---|---|
16-19 | 998 | 926 | 1,924 |
20-29 | 869 | 668 | 1,573 |
30-39 | 655 | 675 | 1,330 |
40-49 | 311 | 359 | 670 |
50-59 | 221 | 358 | 579 |
60-69 | 145 | 228 | 373 |
Total | 3,199 | 3,214 | 6,413 |
There are various studies that are in line with the human body size survey project. Specifically, there is a study that divides the lower body types of men in their 30s to design appropriate pant patterns [8]. In addition, another study divided the upper body type of male college students in their 20s into four clusters by extracting factors through principal component and K-means cluster analyses [9]. The first study was significant in that it suggested changes in clothing design or dimensions by statistically analyzing changes in human body dimensions.
In this section, we discuss the overall structure of the proposed model, pre-processing of the training data, and design.
Fig. 1 shows the data pre-processing and model training processes for the human body size data. The data pre-processing procedure is discussed in Section III-B, and the convolutional neural network (CNN) classification process is discussed in Section III-C.
Several items of human body size data were originally considered in the model. However, only nine of them were used, namely, the bust circumference, under bust circumference, waist circumference (natural indentation), waist circumference (omphailon), hip circumference, thigh circumference, knee circumference, calf circumference, and minimum leg circumference. The names set by the National Institute of Technology and Standards were used as is, and there were no data regarding the under bust circumference for men. Fig. 2 depicts the nine human body aspects using a model photo.
Nine items were cut in the column direction from the existing human body size data, and rows with missing values were removed. Afterwards, the data were scaled so that the mean was 0 and variance was 1, followed by standardization.
Figs. 3 and 4 show box plots of the data distribution before and after standardization for men in their 20s. The nine items mentioned above were sequentially expressed from left to right.
After the above process, clustering was performed according to body fat percentage using the K-means algorithm. Fig. 5 shows the pseudocode of the K-means algorithm [6].
The elbow method was used to determine the value of
However, if the group is divided into three groups, such as a cluster representing below average body fat percentage, average cluster, and cluster representing above average, results with a large amount of data in one cluster inevitably occur, making detailed classification difficult.
In this study, we did not use the optimal value but divided the values into five categories from the lowest to the highest case; the corresponding results are reported in Table 2.
Table 2 . Clustering results.
Cluster number | Men’s fat rate | Women’s fat rate | |
---|---|---|---|
Lowest | 13.7741935483 | 24.3294478527 | |
Low | 16.1882575757 | 27.9261682242 | |
20 | Average | 19.6666666666 | 31.6592039800 |
High | 23.8366197183 | 34.8061538461 | |
Highest | 29.2052631578 | 42.7833333333 | |
Lowest | 14.9828124999 | 24.7364197530 | |
Low | 18.3919540229 | 28.4421874999 | |
30 | Average | 20.2247311827 | 31.4851190476 |
High | 23.0694267515 | 35.4985714285 | |
Highest | 27.6900000000 | 40.4210526315 | |
Lowest | 16.5818181818 | 26.3729411764 | |
Low | 19.3210526315 | 29.4008771929 | |
40 | Average | 20.9579439252 | 32.4777777777 |
High | 24.5845070422 | 36.7283018867 | |
Highest | 28.1173913043 | 42.7222222222 | |
Lowest | 17.275 | 28.2071428571 | |
Low | 18.6491525423 | 31.0783132530 | |
50 | Average | 21.3803571428 | 32.9857142857 |
High | 24.2875000000 | 35.22125 | |
Highest | 25.0644444444 | 37.6268292682 | |
Lowest | 16.3800000000 | 29.0653061224 | |
Low | 21.2761904761 | 33.1511627906 | |
60 | Average | 23.2384615384 | 33.7983870967 |
High | 26.1259259259 | 34.7949999999 | |
Highest | 28.0785714285 | 39.6866666666 |
From the cluster results reported in Table 2, observe that when the average body fat percentage in each cluster is calculated, men and women in their 20s to 40s are evenly distributed.
However, after 50 years of age, the difference in the average body fat percentage interval corresponding to each group was not significant. The values represented in bold in Table 2 indicate cases where there was a difference of 1 or less. These results are because of the fact that the number of samples from the 50s onwards was much smaller than the number of people represented in the 7th body size dataset reported in Table 1.
A CNN classification model was used to transform the Inception-v2 module developed by Google to fit the dimensions of the human body size dataset [10]. The reason for using this model is that it suggests extracting features from different hierarchies from different stems and combining them again, which is widely used.
Fig. 7 shows the model modified to fit the dimensions of the human body data to the Inception-v2 module.
As shown in Fig. 7, the Inception module can be imported and used directly from the Keras library. This is because the input data dimensions must match the preprocessed data dimensions.
For the input layer (?, 8, 1), eight items of data should be given as input; in the case of men, this means eight items of the human body size without the data for under bust circumference. For women (?, 9, 1) all nine items of human body size are used.
The first flow from the left is a section in which the features are extracted across three convolutional layers by extending them to 32 dimensions. The second flow is an interval in which features are extracted across two convolutional layers by extending them to 32 dimensions. The third flow is an average pooling layer to calculate the statistics and reduce the number of features by half to 32. This is a section that increases in dimensions. In the fourth flow, features are extracted by extending them to 32 dimensions using only one convolutional layer.
Because all four flows mentioned above have the same output dimension of 32, they are combined (concatenation layer) into one dimension (flatten layer) and then go through the dense layer using only 50 nodes. The final output dimensions became (?, 5) to classify the final five items using the dropout layer, which was used by leaving only 0.7% of the nodes. All convolutional layers used a rectified linear unit (ReLU).
Because this was a multi-classification task, the SoftMax function was used as the final activation function. Assuming that is the number of neurons in the output layer, is the input signal from the previous layer, and is the th output, the Soft- Max function can be defined as stated in Eq. (1).
For the CNN classification model, the categorical crossentropy error was used as the loss function. Assuming that the number of data dimensions is
Adam was used as the optimizer where the number of training iterations (epochs) was set to 100 for both training and validation. The batch size was set to 1000 for both training and validation.
For the model evaluation, the training and validation dataset proportion was 80 to 20, and a random shuffle was used. The Stratify technique was used to prevent the class ratio from being biased toward either training or validation datasets. Table 3 lists the hardware specifications.
Table 3 . Hardware specifications.
CPU | AMD Ryzen 7 2700x (Core x8, 3.70 Ghz) |
RAM | 16 GB |
GPU | Nvidia Geforce GTX 1660ti |
Storage | SSD 500 GB / HDD 2TB |
OS | Windows 10 |
Virtual | Environment Anaconda 3 |
In this section, we discuss the performance indicators, considerations, and inference results of the final training results of the model.
Accuracy, precision, recall, and final loss were used as performance indicators. The accuracy, precision, and recall are defined as expressed in Equations 3 and 4. The loss values were previously mentioned in Section III-D.
Table 4 reports the final learning results for both men and women in their 20s to 60s according to the performance indicators. The last column represents the learning time.
Table 4 . Final learning results.
Accuracy | Precision | Recall | Loss | Times [s] | ||
---|---|---|---|---|---|---|
20 | Men | 0.8745 | 0.8962 | 0.8600 | 0.2943 | 5.52 |
Women | 0.9015 | 0.9249 | 0.8864 | 0.2359 | 3.29 | |
30 | Men | 0.9173 | 0.9324 | 0.9019 | 0.2217 | 3.31 |
Women | 0.9259 | 0.9451 | 0.9241 | 0.1955 | 3.13 | |
40 | Men | 0.8992 | 0.9053 | 0.8871 | 0.2450 | 3.28 |
Women | 0.8819 | 0.9164 | 0.8750 | 0.2681 | 3.14 | |
50 | Men | 0.8864 | 0.9048 | 0.8636 | 0.2709 | 3.65 |
Women | 0.8951 | 0.9197 | 0.8811 | 0.2711 | 3.26 | |
60 | Men | 0.9204 | 0.9174 | 0.8850 | 0.2676 | 3.33 |
Women | 0.9344 | 0.9543 | 0.9126 | 0.2138 | 3.38 |
Table 5 summarizes the verification results based on the performance indicators.
Table 5 . Final verification results.
Accuracy | Precision | Recall | Loss | ||
---|---|---|---|---|---|
20 | Men | 0.9770 | 0.9770 | 0.9770 | 0.1079 |
Women | 0.9548 | 0.9548 | 0.9548 | 0.1408 | |
30 | Men | 0.9694 | 0.9694 | 0.9694 | 0.1018 |
Women | 0.9555 | 0.9555 | 0.9555 | 0.1014 | |
40 | Men | 0.9354 | 0.9354 | 0.9354 | 0.1675 |
Women | 0.9166 | 0.9166 | 0.9166 | 0.1553 | |
50 | Men | 0.9772 | 0.9772 | 0.9772 | 0.1405 |
Women | 0.9861 | 0.9861 | 0.9861 | 0.0893 | |
60 | Men | 1.0 | 1.0 | 1.0 | 0.1245 |
Women | 0.9347 | 0.9318 | 0.8913 | 0.1489 |
As a result of the proposed model training, when averaged by performance index, observe that the accuracy is 0.90366, precision is 0.92165, recall is 0.88768, loss value is 0.249, and learning time is 3.529 seconds.
When learning was verified correctly, the average values were 0.96067, 0.96038, 0.95633, and 0.12779 for accuracy, precision, recall, and loss, respectively.
This result indicates that clustering was performed correctly when labeling was performed with the clustering algorithm using the nine items of the human body size data. Subsequently, the average value of the corresponding cluster was calculated.
However, a correct classification cannot be achieved because the human body size data, which are the learning data, do not properly cluster the average body fat percentage in the groups classifying participants in their 50s to 60s compared to those classifying the participants in their 20s to 40s.
The cluster model files corresponding to the previously learned age and gender were saved as Pickle files, and the CNN classification model files were saved as H5 files so that inferences could be made separately. The average body fat percentage of each group was divided based on age and gender and saved in a CSV file.
The environment in which the user inputs data and inference result can be checked was implemented based on the Flask Web framework. The processing procedure comprised only the loading and execution of the aforementioned files. Fig. 8 shows the form used by the user to input the data, and Fig. 9 shows the corresponding results.
In this study, easy-to-measure data comprising nine items, such as the chest and waist measurements, were utilized through human body size dataset, which is a public dataset provided by the National Institute of Technology and Standards. Based on age and gender, clustering was performed using the K-means algorithm, and the average body fat percentage of each cluster was calculated to obtain meaningful clustering results.
Based on these results, a model was proposed to use the Inception-v2 module to obtain five classification groups from the highest body fat percentage to the lowest body fat percentage by modifying the Inception-v2 module according to the data shape. As a result of the verification, accuracy, precision, and recall of more than 95% were obtained.
There was no significant difference between the results of the clusters in the 50s and 60s groups compared with the results of the clusters between those in their 20s and 40s.
Future research should supplement the current results and use a better clustering algorithm or CNN architecture. In addition, it is believed that rational decision-making will be possible in the field of healthcare or obesity analysis using the device through 3D scanning for circumference measurement.
This research was supported by a 2023 Pai Chai University research grant.
Table 1 . People groups included in the 7th Korean human body size survey project.
Man | Woman | Total | |
---|---|---|---|
16-19 | 998 | 926 | 1,924 |
20-29 | 869 | 668 | 1,573 |
30-39 | 655 | 675 | 1,330 |
40-49 | 311 | 359 | 670 |
50-59 | 221 | 358 | 579 |
60-69 | 145 | 228 | 373 |
Total | 3,199 | 3,214 | 6,413 |
Table 2 . Clustering results.
Cluster number | Men’s fat rate | Women’s fat rate | |
---|---|---|---|
Lowest | 13.7741935483 | 24.3294478527 | |
Low | 16.1882575757 | 27.9261682242 | |
20 | Average | 19.6666666666 | 31.6592039800 |
High | 23.8366197183 | 34.8061538461 | |
Highest | 29.2052631578 | 42.7833333333 | |
Lowest | 14.9828124999 | 24.7364197530 | |
Low | 18.3919540229 | 28.4421874999 | |
30 | Average | 20.2247311827 | 31.4851190476 |
High | 23.0694267515 | 35.4985714285 | |
Highest | 27.6900000000 | 40.4210526315 | |
Lowest | 16.5818181818 | 26.3729411764 | |
Low | 19.3210526315 | 29.4008771929 | |
40 | Average | 20.9579439252 | 32.4777777777 |
High | 24.5845070422 | 36.7283018867 | |
Highest | 28.1173913043 | 42.7222222222 | |
Lowest | 17.275 | 28.2071428571 | |
Low | 18.6491525423 | 31.0783132530 | |
50 | Average | 21.3803571428 | 32.9857142857 |
High | 24.2875000000 | 35.22125 | |
Highest | 25.0644444444 | 37.6268292682 | |
Lowest | 16.3800000000 | 29.0653061224 | |
Low | 21.2761904761 | 33.1511627906 | |
60 | Average | 23.2384615384 | 33.7983870967 |
High | 26.1259259259 | 34.7949999999 | |
Highest | 28.0785714285 | 39.6866666666 |
Table 3 . Hardware specifications.
CPU | AMD Ryzen 7 2700x (Core x8, 3.70 Ghz) |
RAM | 16 GB |
GPU | Nvidia Geforce GTX 1660ti |
Storage | SSD 500 GB / HDD 2TB |
OS | Windows 10 |
Virtual | Environment Anaconda 3 |
Table 4 . Final learning results.
Accuracy | Precision | Recall | Loss | Times [s] | ||
---|---|---|---|---|---|---|
20 | Men | 0.8745 | 0.8962 | 0.8600 | 0.2943 | 5.52 |
Women | 0.9015 | 0.9249 | 0.8864 | 0.2359 | 3.29 | |
30 | Men | 0.9173 | 0.9324 | 0.9019 | 0.2217 | 3.31 |
Women | 0.9259 | 0.9451 | 0.9241 | 0.1955 | 3.13 | |
40 | Men | 0.8992 | 0.9053 | 0.8871 | 0.2450 | 3.28 |
Women | 0.8819 | 0.9164 | 0.8750 | 0.2681 | 3.14 | |
50 | Men | 0.8864 | 0.9048 | 0.8636 | 0.2709 | 3.65 |
Women | 0.8951 | 0.9197 | 0.8811 | 0.2711 | 3.26 | |
60 | Men | 0.9204 | 0.9174 | 0.8850 | 0.2676 | 3.33 |
Women | 0.9344 | 0.9543 | 0.9126 | 0.2138 | 3.38 |
Table 5 . Final verification results.
Accuracy | Precision | Recall | Loss | ||
---|---|---|---|---|---|
20 | Men | 0.9770 | 0.9770 | 0.9770 | 0.1079 |
Women | 0.9548 | 0.9548 | 0.9548 | 0.1408 | |
30 | Men | 0.9694 | 0.9694 | 0.9694 | 0.1018 |
Women | 0.9555 | 0.9555 | 0.9555 | 0.1014 | |
40 | Men | 0.9354 | 0.9354 | 0.9354 | 0.1675 |
Women | 0.9166 | 0.9166 | 0.9166 | 0.1553 | |
50 | Men | 0.9772 | 0.9772 | 0.9772 | 0.1405 |
Women | 0.9861 | 0.9861 | 0.9861 | 0.0893 | |
60 | Men | 1.0 | 1.0 | 1.0 | 0.1245 |
Women | 0.9347 | 0.9318 | 0.8913 | 0.1489 |
Jung-Hee Seo*, Member, KIICE
Journal of information and communication convergence engineering 2024; 22(1): 56-63 https://doi.org/10.56977/jicce.2024.22.1.56Seoyoung Lee, Hyogyeong Park, Yeonhwi You, Sungjung Yong, and Il-Young Moon, Member, KIICE
Journal of information and communication convergence engineering 2023; 21(4): 346-350 https://doi.org/10.56977/jicce.2023.21.4.346Pyeoungkee Kim, Xiaorui Huang, and Ziyu Fang
Journal of information and communication convergence engineering 2023; 21(1): 24-31 https://doi.org/10.56977/jicce.2023.21.1.24