Journal of information and communication convergence engineering 2024; 22(2): 139-144
Published online June 30, 2024
https://doi.org/10.56977/jicce.2024.22.2.139
© Korea Institute of Information and Communication Engineering
Correspondence to : Yun Seop Yu1 and Nam Ho Kim2 (E-mail: ysyu@hknu.ac.kr, namo@kopo.ac.kr, Tel: +82-31-670-5293, +82-31-696-8851)
1ICT & Robotics Engineering and IITC, Hankyong National University, Anseong 17579, Republic of Korea
2Bundang Convergence Technology Campus of Korea Polytechnic, Seongnam 13590, Republic of Korea
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
In this study, four types of fall detection systems – designed with YOLOPose, principal component analysis (PCA), convolutional neural network (CNN), and long short-term memory (LSTM) architectures – were developed and compared in the detection of everyday falls. The experimental dataset encompassed seven types of activities: walking, lying, jumping, jumping in activities of daily living, falling backward, falling forward, and falling sideways. Keypoints extracted from YOLOPose were entered into the following architectures: RAW-LSTM, PCA-LSTM, RAW-PCA-LSTM, and PCA-CNN-LSTM. For the PCA architectures, the reduced input size stemming from a dimensionality reduction enhanced the operational efficiency in terms of computational time and memory at the cost of decreased accuracy. In contrast, the addition of a CNN resulted in higher complexity and lower accuracy. The RAW-LSTM architecture, which did not include either PCA or CNN, had the least number of parameters, which resulted in the best computational time and memory while also achieving the highest accuracy.
Keywords Fall detection, The elderly, Long short-term memory (LSTM), Principal component analysis (PCA), Convolutional neural network (CNN)
According to the World Health Organization (WHO), falls are the second leading cause of unintended, unexpected, or accidental death [1]. Furthermore, falls often result in serious injuries such as femur neck fractures, neurological damage, and skin burns [2]. In the process of physical activities, falls can be detected using sensors, images, or videos. Advances in computer vision and network technologies have made it possible to quickly detect falls and transmit relevant information, thereby enabling rapid detection and response. In addition, a system was developed to track and monitor users in real time, allowing feedback to be sent to medical professionals [3]. Users can be monitored via wearable sensors, such as gyroscopes and accelerometers [4-7]; visual sensors, such as cameras [8-12]; or ambient sensors, such as active infrared sensors, RFID, ultrasonic sensors, radar, and microphones [13-16].
In video-based approaches, motion information is acquired as features extracted from image frames captured by a camera. Human movements can be visually classified using systems such as convolutional neural networks (CNNs), human position estimation (HPE), object detection, and optical flow. A fall detection system based on principal component analysis (PCA) has previously been developed using optical flow [17,18], which refers to the visible movement of light patterns in a video. Optical flow reflects changes in lighting that stem from motion within an image and connects lines over time. Specifically, the flow of light is calculated using the Luca–Kanade algorithm to determine movement. However, this method does not differentiate between the motion of people and that of objects. Furthermore, this approach is vulnerable to fast-moving objects, and even if no motion has occurred, the system may misinterpret changes in contrast caused by camera noise as motion. To compensate for these shortcomings, HPE techniques [8-12] have been developed to identify people through deep learning networks, such as a CNNs. These techniques are designed to track identified persons by locating key points, such as those corresponding to the human skeleton. Among recently developed HPE methods, YOLO-Pose can simultaneously identify multiple people within an image or a video, enabling relatively accurate realtime processing [19,20]. The YOLO-Pose model is a kinetic model that predicts 17 keypoints in a 2D human image. In fall detection systems, data collected over time are used to effectively identify occurrences of falling [9]. Recurrent neural networks (RNNs) [21], long short-term memory (LSTM) [22], and gate recurrent units (GRUs) [10] have all been employed for fall detection. In particular, the LSTM and GRU architectures, as well as CNN-LSTM hybrid models, have recently been utilized to solve the vanishing gradient problem inherent to RNNs [12,15]. The following fall detection architectures have been developed for HPE methods:
Architecture 1: Keypoints corresponding to the skeleton are measured via kinetic sensors, such as Microsoft Kinect RGBD [12] or HPE methods [22], and directly passed into an LSTM network.
Architecture 2: Keypoints extracted via HPE are passed to a CNN or CNN-LSTM network. In the latter, convolutional layers are used to extract input data features while the LSTM network supports sequence prediction [9,10,12,23,24].
Architecture 3: Geometric information pertaining to the human body is calculated to reduce the input dimensionality, and the extracted features are passed to an LSTM model and support vector machine (SVM) to classify falls and activities of daily living (ADL) [25].
Architecture 4: Data preprocessed via dimensional reduction, such as extracted keypoints, are passed to a CNN or CNN-LSTM network.
To our knowledge, a comparative evaluation of the four architectures in terms of accuracy, computation time, and memory efficiency has never been conducted previously.
This article presents four fall detection algorithms that classify ADL and falls using keypoint data representing four types of ADL and three types of falls extracted by YOLO-Pose. In a comparative evaluation, the accuracy of inference, computation time, and memory efficiency were calculated for all four algorithms. Section 2 describes the subjects and datasets used throughout this study, as well as the proposed methods, and Section 3 presents the results of the comparative evaluation. Finally, Section 4 concludes the study.
A dataset comprising fall and ADL occurrences was compiled, encompassing 1190 images [18], of which 20 and 10 images for each fall and ADL were obtained from seven and three applicants, respectively. To enhance the diversity of training data, the images were augmented by horizontal reflection, for a total of 2380 images. Seven activities were included in the dataset: walking, lying, jumping, jumping in ADL, falling backward, falling forward, and falling sideways [18]. All individuals represented in the data were 20-50 years old, 160-180 cm tall, and weighing 50-85 kg. Images were recorded at 50 frames per second (FPS), with the framerate dropping to 25 FPS in the event of a fall. The video data were recorded at 10 FPS owing to the environmental and systemic variability of CCTVs, as this framerate is considered a minimal standard. Furthermore, a framerate of 10 FPS consumes very little memory, ensuring a highly efficient training process.
Fig. 1 depicts four types of fall detection architectures, showing the 17 keypoints extracted from YOLO-Pose along with the eigenvector matrices U1 and U2 obtained from the PCA, 1DCNN, and LSTM networks. In the RAW-LSTM and RAW-CNN-LSTM architectures, keypoints were passed to the LSTM directly, whereas in the PCA-LSTM and PCA-CNN-LSTM models, keypoints were first dimensionally reduced via PCA. The YOLO-Pose network was used to extract 34 data points corresponding to the x and y coordinates of 17 keypoints identified within each video frame. In the PCA-enabled architectures, the center point (mx, my) and slope (θ) of each keypoint were extracted. With the addition of a CNN architecture, a feature map was generated using either the feature points obtained via PCA, or the raw keypoint data. To construct the feature map, a 1D filter was used to extract feature points, with padding used to maintain the time component. The stride was set to 1, and the filter size was set to 2. No additional layers were used. Although CNNs typically extract feature points from images using a 2D filter, the data used in this study were separated into the x and y components, and 1D filters were used instead. The CNN consisted of two layers, with the first and second input values set to 64 and 128, respectively. The LSTM network was trained to classify the seven activities using three types of inputs: raw skeleton keypoint data obtained via YOLO-Pose, feature points extracted via PCA, and feature map data extracted by the CNN. The LSTM network comprised three layers and one dense layer. The first, second, and third input values were set to 64, 128, and 128, respectively. The feature map was split to enable the classification of seven activities in the dense layer, with Softmax predicting the most likely behavior. Adaptive Momentum was used as the optimizer. In the hybrid CNN-LSTM architecture, data features are extracted by the CNN before being passed to the LSTM [9,10,12,23,24]. Fig. 2 presents an overall flowchart of all four algorithms. The processed data were allocated between training and testing sets through a validation process at an 8:2 ratio [26]. To prevent overfitting, L2 regularization was applied with a lambda value of 0.065, as determined by the parameter optimization method proposed in [6]. Upon successful completion of training, each model’s accuracy was determined using the validation data. Min-max normalization [6] was applied to the feature points extracted via PCA, with the normalized (mx, my) and θ used as indicators of speed and slope for fall detection, respectively.
The number of parameters in the CNN components of the hybrid models can be expressed as [27]
where Ninput, Nfh Nfw, and Nkernel represent the input, filter height, filter width, and kernel size, respectively. The number of parameters in the LSTM network is expressed as [28]
where Ninput and Noutput are the input and output sizes, respectively.
When PCA was employed to reduce input size, the number of parameters also decreased. This in turn decreased the algorithm complexity, thereby enhancing memory efficiency and the speed of computation. Tables 1 and 2 summarize the output shape and number of parameters used in each of the proposed architectures, with parentheses indicating parameters used after applying PCA. For example, the first LSTM layer of the RAW-LSTM architecture has input and output sizes of 34 and 64, respectively; thus, the total number of parameters is calculated as 4 × ((34 + 1) × 64 + 642) = 25344. In contrast, the first CNN layer of the PCA-CNN-LSTM architecture has a 1DCNN filter height, filter width, input size, and kernel size of 1, 2, 3, and 64, respectively; thus, the number of parameters is calculated as 3 × 1 × 2 × 64 + 64 = 448.
Table 1 . Details of LSTM layers
Layer (Type) | Output Shape | Number of Parameters |
---|---|---|
LSTM_1 (LSTM) | (None, 50, 64) | 25344 (17408) |
LSTM_2 (LSTM) | (None, 50, 128) | 98816 |
LSTM_3 (LSTM) | (None, 128) | 131584 |
Dense_1 (Dense) | (None, 7) | 903 |
Total number of parameters: 256,647 (248,711) |
Table 2 . Details of CNN-LSTM layers
Layer (Type) | Output Shape | Number of Parameters |
---|---|---|
CONV1D_1 (CONV1D) | (None, 50, 64) | 4416 (448) |
CONV1D_2 (CONV1D) | (None, 50, 128) | 8256 |
LSTM_1 (LSTM) | (None, 50, 64) | 33024 |
LSTM_2 (LSTM) | (None, 50, 128) | 98816 |
LSTM_3 (LSTM) | (None, 128) | 131584 |
Dense_1 (Dense) | (None, 7) | 903 |
Total number of parameters: 276,999 (273,031) |
Fig. 3 shows the confusion matrices obtained by the four architectures using the validation data, with Table 3 listing the corresponding evaluation results. Because the data were distributed between training and validation sets at a ratio of 8:2, 1904 and 476 activities were used for training and validation, respectively. From the results, a slight decrease in accuracy can be observed following the application of PCA, which can be regarded as the cost of the enhanced computation time and memory efficiency. The validation accuracy was also slightly lower when applying the CNN, which also increased model complexity. Specifically, the RAW-LSTM architecture achieved a maximal validation accuracy of 100% while also exhibiting optimal memory use and computation time, as it requires less training parameters than the hybrid architectures.
Table 3 . Validation results of fall detection of four types of proposed fall detection system
Architecture | Sensitivity [%] | Specificity [%] | Accuracy [%] |
---|---|---|---|
Raw-LSTM | 100 | 100 | 100 |
PCA-LSTM | 99.51 | 99.26 | 99.37 |
Raw-CNN-LSTM | 99.51 | 99.63 | 99.58 |
PCA-CNN-LSTM | 99.02 | 99.26 | 99.16 |
In this study, the four types of fall detection architectures were investigated in terms of accuracy, computation time, and memory efficiency. In the case of PCA applications, the reduced input dimensionality enhanced the computation time and memory efficiency while slightly decreasing accuracy. In contrast, the addition of a 1D CNN led to both increased complexity and decreased accuracy. Overall, the RAW-LSTM architecture, which did not include PCA or CNN achieved the best accuracy, computational time, and memory efficiency.
This research was supported by the Basic Science Research Program through the NRF of Korea, funded by the Ministry of Education (NRF-2019R1F1A1060383).
Seung Su Jeong
received his BS from the Department of Electrical, Electronic, and Control Engineering, Hankyong National University, Anseong, Republic of Korea, in 2022. He is currently an MS student at the Department of ICT and Robotics Engineering, Hankyong National University. His current research is focused on the compact modeling of next-generation devices based on deep learning applications.
Nam Ho Kim
received his BS and MS from the Department of Electronics Engineering, Korea University, Seoul, Republic of Korea, in 1996 and 1998, respectively. He received his PhD in Signal Processing at the Graduate School of Biotechnology & Information Technology, Hankyong National University. Currently, he is an Associate Professor at the Bundang Convergence Technology Campus of Korea Polytechnic, Sengnam, Republic of Korea. His research is focused on machine learning, deep learning, and embedded systems.
Yun Seop Yu
received his BS, MS, and PhD from the Department of Electronics Engineering, Korea University, Seoul, Republic of Korea, in 1995, 1997, and 2001, respectively. From 2001 to 2002, he worked as a guest researcher at the Electronics and Electrical Engineering Laboratory, NIST, Gaithersburg, Maryland, USA. From 2014 to 2015, he worked as a visiting scholar at the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, USA. He is now a full professor at the School of ICT, Robotics, and Mechanical Engineering, Hankyong National University, Anseong, Republic of Korea. His main research interests are in the fields of modeling various nanodevices for efficient circuit simulation, as well as future memory, logic, and sensor designs using these devices. He is also interested in the fabrication and characterization of various nanodevices. He has authored and coauthored 100 internationally refereed journal publications.
Journal of information and communication convergence engineering 2024; 22(2): 139-144
Published online June 30, 2024 https://doi.org/10.56977/jicce.2024.22.2.139
Copyright © Korea Institute of Information and Communication Engineering.
Seung Su Jeong 1, Nam Ho Kim 2*, and Yun Seop Yu1* , Member, KIICE
1ICT & Robotics Engineering and IITC, Hankyong National University, Anseong 17579, Republic of Korea
2Bundang Convergence Technology Campus of Korea Polytechnic, Seongnam 13590, Republic of Korea
Correspondence to:Yun Seop Yu1 and Nam Ho Kim2 (E-mail: ysyu@hknu.ac.kr, namo@kopo.ac.kr, Tel: +82-31-670-5293, +82-31-696-8851)
1ICT & Robotics Engineering and IITC, Hankyong National University, Anseong 17579, Republic of Korea
2Bundang Convergence Technology Campus of Korea Polytechnic, Seongnam 13590, Republic of Korea
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
In this study, four types of fall detection systems – designed with YOLOPose, principal component analysis (PCA), convolutional neural network (CNN), and long short-term memory (LSTM) architectures – were developed and compared in the detection of everyday falls. The experimental dataset encompassed seven types of activities: walking, lying, jumping, jumping in activities of daily living, falling backward, falling forward, and falling sideways. Keypoints extracted from YOLOPose were entered into the following architectures: RAW-LSTM, PCA-LSTM, RAW-PCA-LSTM, and PCA-CNN-LSTM. For the PCA architectures, the reduced input size stemming from a dimensionality reduction enhanced the operational efficiency in terms of computational time and memory at the cost of decreased accuracy. In contrast, the addition of a CNN resulted in higher complexity and lower accuracy. The RAW-LSTM architecture, which did not include either PCA or CNN, had the least number of parameters, which resulted in the best computational time and memory while also achieving the highest accuracy.
Keywords: Fall detection, The elderly, Long short-term memory (LSTM), Principal component analysis (PCA), Convolutional neural network (CNN)
According to the World Health Organization (WHO), falls are the second leading cause of unintended, unexpected, or accidental death [1]. Furthermore, falls often result in serious injuries such as femur neck fractures, neurological damage, and skin burns [2]. In the process of physical activities, falls can be detected using sensors, images, or videos. Advances in computer vision and network technologies have made it possible to quickly detect falls and transmit relevant information, thereby enabling rapid detection and response. In addition, a system was developed to track and monitor users in real time, allowing feedback to be sent to medical professionals [3]. Users can be monitored via wearable sensors, such as gyroscopes and accelerometers [4-7]; visual sensors, such as cameras [8-12]; or ambient sensors, such as active infrared sensors, RFID, ultrasonic sensors, radar, and microphones [13-16].
In video-based approaches, motion information is acquired as features extracted from image frames captured by a camera. Human movements can be visually classified using systems such as convolutional neural networks (CNNs), human position estimation (HPE), object detection, and optical flow. A fall detection system based on principal component analysis (PCA) has previously been developed using optical flow [17,18], which refers to the visible movement of light patterns in a video. Optical flow reflects changes in lighting that stem from motion within an image and connects lines over time. Specifically, the flow of light is calculated using the Luca–Kanade algorithm to determine movement. However, this method does not differentiate between the motion of people and that of objects. Furthermore, this approach is vulnerable to fast-moving objects, and even if no motion has occurred, the system may misinterpret changes in contrast caused by camera noise as motion. To compensate for these shortcomings, HPE techniques [8-12] have been developed to identify people through deep learning networks, such as a CNNs. These techniques are designed to track identified persons by locating key points, such as those corresponding to the human skeleton. Among recently developed HPE methods, YOLO-Pose can simultaneously identify multiple people within an image or a video, enabling relatively accurate realtime processing [19,20]. The YOLO-Pose model is a kinetic model that predicts 17 keypoints in a 2D human image. In fall detection systems, data collected over time are used to effectively identify occurrences of falling [9]. Recurrent neural networks (RNNs) [21], long short-term memory (LSTM) [22], and gate recurrent units (GRUs) [10] have all been employed for fall detection. In particular, the LSTM and GRU architectures, as well as CNN-LSTM hybrid models, have recently been utilized to solve the vanishing gradient problem inherent to RNNs [12,15]. The following fall detection architectures have been developed for HPE methods:
Architecture 1: Keypoints corresponding to the skeleton are measured via kinetic sensors, such as Microsoft Kinect RGBD [12] or HPE methods [22], and directly passed into an LSTM network.
Architecture 2: Keypoints extracted via HPE are passed to a CNN or CNN-LSTM network. In the latter, convolutional layers are used to extract input data features while the LSTM network supports sequence prediction [9,10,12,23,24].
Architecture 3: Geometric information pertaining to the human body is calculated to reduce the input dimensionality, and the extracted features are passed to an LSTM model and support vector machine (SVM) to classify falls and activities of daily living (ADL) [25].
Architecture 4: Data preprocessed via dimensional reduction, such as extracted keypoints, are passed to a CNN or CNN-LSTM network.
To our knowledge, a comparative evaluation of the four architectures in terms of accuracy, computation time, and memory efficiency has never been conducted previously.
This article presents four fall detection algorithms that classify ADL and falls using keypoint data representing four types of ADL and three types of falls extracted by YOLO-Pose. In a comparative evaluation, the accuracy of inference, computation time, and memory efficiency were calculated for all four algorithms. Section 2 describes the subjects and datasets used throughout this study, as well as the proposed methods, and Section 3 presents the results of the comparative evaluation. Finally, Section 4 concludes the study.
A dataset comprising fall and ADL occurrences was compiled, encompassing 1190 images [18], of which 20 and 10 images for each fall and ADL were obtained from seven and three applicants, respectively. To enhance the diversity of training data, the images were augmented by horizontal reflection, for a total of 2380 images. Seven activities were included in the dataset: walking, lying, jumping, jumping in ADL, falling backward, falling forward, and falling sideways [18]. All individuals represented in the data were 20-50 years old, 160-180 cm tall, and weighing 50-85 kg. Images were recorded at 50 frames per second (FPS), with the framerate dropping to 25 FPS in the event of a fall. The video data were recorded at 10 FPS owing to the environmental and systemic variability of CCTVs, as this framerate is considered a minimal standard. Furthermore, a framerate of 10 FPS consumes very little memory, ensuring a highly efficient training process.
Fig. 1 depicts four types of fall detection architectures, showing the 17 keypoints extracted from YOLO-Pose along with the eigenvector matrices U1 and U2 obtained from the PCA, 1DCNN, and LSTM networks. In the RAW-LSTM and RAW-CNN-LSTM architectures, keypoints were passed to the LSTM directly, whereas in the PCA-LSTM and PCA-CNN-LSTM models, keypoints were first dimensionally reduced via PCA. The YOLO-Pose network was used to extract 34 data points corresponding to the x and y coordinates of 17 keypoints identified within each video frame. In the PCA-enabled architectures, the center point (mx, my) and slope (θ) of each keypoint were extracted. With the addition of a CNN architecture, a feature map was generated using either the feature points obtained via PCA, or the raw keypoint data. To construct the feature map, a 1D filter was used to extract feature points, with padding used to maintain the time component. The stride was set to 1, and the filter size was set to 2. No additional layers were used. Although CNNs typically extract feature points from images using a 2D filter, the data used in this study were separated into the x and y components, and 1D filters were used instead. The CNN consisted of two layers, with the first and second input values set to 64 and 128, respectively. The LSTM network was trained to classify the seven activities using three types of inputs: raw skeleton keypoint data obtained via YOLO-Pose, feature points extracted via PCA, and feature map data extracted by the CNN. The LSTM network comprised three layers and one dense layer. The first, second, and third input values were set to 64, 128, and 128, respectively. The feature map was split to enable the classification of seven activities in the dense layer, with Softmax predicting the most likely behavior. Adaptive Momentum was used as the optimizer. In the hybrid CNN-LSTM architecture, data features are extracted by the CNN before being passed to the LSTM [9,10,12,23,24]. Fig. 2 presents an overall flowchart of all four algorithms. The processed data were allocated between training and testing sets through a validation process at an 8:2 ratio [26]. To prevent overfitting, L2 regularization was applied with a lambda value of 0.065, as determined by the parameter optimization method proposed in [6]. Upon successful completion of training, each model’s accuracy was determined using the validation data. Min-max normalization [6] was applied to the feature points extracted via PCA, with the normalized (mx, my) and θ used as indicators of speed and slope for fall detection, respectively.
The number of parameters in the CNN components of the hybrid models can be expressed as [27]
where Ninput, Nfh Nfw, and Nkernel represent the input, filter height, filter width, and kernel size, respectively. The number of parameters in the LSTM network is expressed as [28]
where Ninput and Noutput are the input and output sizes, respectively.
When PCA was employed to reduce input size, the number of parameters also decreased. This in turn decreased the algorithm complexity, thereby enhancing memory efficiency and the speed of computation. Tables 1 and 2 summarize the output shape and number of parameters used in each of the proposed architectures, with parentheses indicating parameters used after applying PCA. For example, the first LSTM layer of the RAW-LSTM architecture has input and output sizes of 34 and 64, respectively; thus, the total number of parameters is calculated as 4 × ((34 + 1) × 64 + 642) = 25344. In contrast, the first CNN layer of the PCA-CNN-LSTM architecture has a 1DCNN filter height, filter width, input size, and kernel size of 1, 2, 3, and 64, respectively; thus, the number of parameters is calculated as 3 × 1 × 2 × 64 + 64 = 448.
Table 1 . Details of LSTM layers.
Layer (Type) | Output Shape | Number of Parameters |
---|---|---|
LSTM_1 (LSTM) | (None, 50, 64) | 25344 (17408) |
LSTM_2 (LSTM) | (None, 50, 128) | 98816 |
LSTM_3 (LSTM) | (None, 128) | 131584 |
Dense_1 (Dense) | (None, 7) | 903 |
Total number of parameters: 256,647 (248,711) |
Table 2 . Details of CNN-LSTM layers.
Layer (Type) | Output Shape | Number of Parameters |
---|---|---|
CONV1D_1 (CONV1D) | (None, 50, 64) | 4416 (448) |
CONV1D_2 (CONV1D) | (None, 50, 128) | 8256 |
LSTM_1 (LSTM) | (None, 50, 64) | 33024 |
LSTM_2 (LSTM) | (None, 50, 128) | 98816 |
LSTM_3 (LSTM) | (None, 128) | 131584 |
Dense_1 (Dense) | (None, 7) | 903 |
Total number of parameters: 276,999 (273,031) |
Fig. 3 shows the confusion matrices obtained by the four architectures using the validation data, with Table 3 listing the corresponding evaluation results. Because the data were distributed between training and validation sets at a ratio of 8:2, 1904 and 476 activities were used for training and validation, respectively. From the results, a slight decrease in accuracy can be observed following the application of PCA, which can be regarded as the cost of the enhanced computation time and memory efficiency. The validation accuracy was also slightly lower when applying the CNN, which also increased model complexity. Specifically, the RAW-LSTM architecture achieved a maximal validation accuracy of 100% while also exhibiting optimal memory use and computation time, as it requires less training parameters than the hybrid architectures.
Table 3 . Validation results of fall detection of four types of proposed fall detection system.
Architecture | Sensitivity [%] | Specificity [%] | Accuracy [%] |
---|---|---|---|
Raw-LSTM | 100 | 100 | 100 |
PCA-LSTM | 99.51 | 99.26 | 99.37 |
Raw-CNN-LSTM | 99.51 | 99.63 | 99.58 |
PCA-CNN-LSTM | 99.02 | 99.26 | 99.16 |
In this study, the four types of fall detection architectures were investigated in terms of accuracy, computation time, and memory efficiency. In the case of PCA applications, the reduced input dimensionality enhanced the computation time and memory efficiency while slightly decreasing accuracy. In contrast, the addition of a 1D CNN led to both increased complexity and decreased accuracy. Overall, the RAW-LSTM architecture, which did not include PCA or CNN achieved the best accuracy, computational time, and memory efficiency.
This research was supported by the Basic Science Research Program through the NRF of Korea, funded by the Ministry of Education (NRF-2019R1F1A1060383).
Table 1 . Details of LSTM layers.
Layer (Type) | Output Shape | Number of Parameters |
---|---|---|
LSTM_1 (LSTM) | (None, 50, 64) | 25344 (17408) |
LSTM_2 (LSTM) | (None, 50, 128) | 98816 |
LSTM_3 (LSTM) | (None, 128) | 131584 |
Dense_1 (Dense) | (None, 7) | 903 |
Total number of parameters: 256,647 (248,711) |
Table 2 . Details of CNN-LSTM layers.
Layer (Type) | Output Shape | Number of Parameters |
---|---|---|
CONV1D_1 (CONV1D) | (None, 50, 64) | 4416 (448) |
CONV1D_2 (CONV1D) | (None, 50, 128) | 8256 |
LSTM_1 (LSTM) | (None, 50, 64) | 33024 |
LSTM_2 (LSTM) | (None, 50, 128) | 98816 |
LSTM_3 (LSTM) | (None, 128) | 131584 |
Dense_1 (Dense) | (None, 7) | 903 |
Total number of parameters: 276,999 (273,031) |
Table 3 . Validation results of fall detection of four types of proposed fall detection system.
Architecture | Sensitivity [%] | Specificity [%] | Accuracy [%] |
---|---|---|---|
Raw-LSTM | 100 | 100 | 100 |
PCA-LSTM | 99.51 | 99.26 | 99.37 |
Raw-CNN-LSTM | 99.51 | 99.63 | 99.58 |
PCA-CNN-LSTM | 99.02 | 99.26 | 99.16 |
Yun Jae Yi and Yun Seop Yu
The Korea Institute of Information and Commucation Engineering 2013; 11(3): 199-206 https://doi.org/10.6109/jicce.2013.11.3.199Hankil Kim,Jinyoung Kim,Hoekyung Jung
Journal of information and communication convergence engineering 2018; 16(3): 160-165 https://doi.org/10.6109/jicce.2018.16.3.160