Journal of information and communication convergence engineering 2024; 22(2): 109-120
Published online June 30, 2024
https://doi.org/10.56977/jicce.2024.22.2.109
© Korea Institute of Information and Communication Engineering
Correspondence to : Soo Young Shin (E-mail : wdragon@kumoh.ac.kr)
Room 112, Digital Bldg., 61 Daehakro, Yanghodong, Gumi 39177, Gyeongbuk
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Trash or garbage is one of the most dangerous health and environmental problems that affect pollution. Pollution affects nature, human life, and wildlife. In this paper, we propose modern solutions for cleaning the environment of trash pollution by enforcing strict action against people who dump trash inappropriately on streets, outside the home, and in unnecessary places. Artificial Intelligence (AI), especially Deep Learning (DL), has been used to automate and solve issues in the world. We availed this as an excellent opportunity to develop a system that identifies trash using a deep convolutional neural network (CNN). This paper proposes a real-time garbage identification system based on a deep CNN architecture with eight distinct classes for the training dataset. After identifying the garbage, the CCTV camera captures a video of the individual placing the trash in the incorrect location and sends an alert notice to the relevant authority.
Keywords Convolutional neural network, CCTV camera, deep learning, garbage detection, real-time detection
Trash waste and garbage pollution are among the most severe problems in Karachi. According to global indexing reports, Karachi ranks second among the most polluted cities worldwide. In addition to commercial and industrial garbage, the city generates up to 16,000 Mt/ton of home waste daily, and the daily disposal rates at landfills range between 7,000 and 8,000 tons. Officials have claimed that almost 2.5 million tons of waste is dumped in the city's residential zones. Identifying appropriate landfill sites has become one of the most challenging urban planning problems [1]. Karachi requires a good trash heap, and finding pure land in the city is difficult. Many forms of garbage, including homes, buildings, and material waste, are thrown everywhere. Around grocery stores and hospitals, a few heaps of rotting trash and plastic bags may be spotted. Karachi, the country's commercial and financial hub, has transformed from an urban area to a metropolitan area. Masses of trash are thrown in street corners, behind mosques, in shuttered schools, and elsewhere. Because public garbage bins fill up faster, many containers tend to overflow owing to a lack of resources and authorities to collect trash from the city. This leads to overcrowded roads, bad odors, and adverse health and environmental effects.
Modern societies are developing cities and making life more comfortable by reducing garbage pollution, smart CCTV cameras are deployed on each street, in apartments, and in every possible place to be monitored 24/7 to ensure and protect the place from trash. Trash detection is a challenging task that can be performed using a convolutional neural network (CNN). Garbage disposal is essential to treat waste in a well-rounded and safe manner. A system based on image detection is designed and applied to recognize different garbage bins for classification. Real-time RGB images of garbage disposal lines are captured using a CCD camera. The received RGB format images are converted into an HSI color format to reduce the impact of ambient light [2].
Many other methods have been introduced using DL, an area of machine learning concerned with the development and execution of artificial intelligence (AI) neural networks’ brain-like structures. DL plays a significant role in text recognition, particularly in recognizing fonts from images. The font recognition of the Turkish alphabet has been conducted using the DL technique. In addition to font recognition, DL methods can be used to classify brain cancer and other diseases [3].
CNNs have attracted considerable research interest because automatic detection is crucial during the learning stage. CNN models can extract abstract features and exhibit excellent image classification and object-detection performance [3]. CNNs have been used for identification, detection, processing, and segmentation. Their performance is continually enhanced because they automatically extract features from images. Recent developments have introduced several CNN object detectors for CNN-based object detection, the two most important of which are region-based CNN (R-CNNs) and single-shot detectors (SSD). A powerful graphics processing unit (GPU) is required for R-CNN detectors because they perform computationally intensive tasks, whereas SSDs require only the convolution network [4]. The CNN classification model is significantly more intuitive. To fully comprehend CNNs, the first layer identifies most of the image’s simple structures; for example, in a standard two-dimensional image, these structures include the edges in the images, wherein each filter path is mandatory. Different filters operate at different angles at the edge when edge details in the image are considered. Feature maps containing the information structure of the filter are used to detect the filters in the first layer of the output. The findings contained different edge structures connected to the images. The preceding layer of the feature depicts the edges recognized by the subsequent evolution layer. Each convolutional layer analyzes the correlations between the image data and the combination structures found in the preceding layer. Complexity increases the number of layers, which helps separate semantic information from structural information.
The following describes the operation of the convolutional layer, which obtains the image input. Matrix filters begin with random integers; the picture has specific coordinates such as height, breadth, and depth. The input pictures have the same depth as the channels, and the spatial resolution of the filters is minimal. The depth of the grayscale filters is one, whereas the RGB filters have three depths. The input loudness is convolved using a filter.
In addition, a filter is placed above the input volume. One image is spatially moved over another, and the dot products over the entire image are examined. Eventually, the filters generate maps to activate the input image [4].
This article proposes a real-time CCTV-based garbage-detection method for modern societies using a deep CNN with person identification. The contributions of this study are as follows.
A real-time CCTV-based garbage-detection scheme is proposed to identify people and detect trash using CCTV cameras.
A large custom dataset was collected to train the model. The custom dataset consists of eight trash classes. Each class has more than 300 images, and the compiled dataset includes over 2100 images.
New layers were added to further improve the accuracy of the model. However, the speed of the model was slightly affected, but its accuracy improved.
CCTV records a video of a person throwing trash. The CNN model can quickly identify a person throwing trash and further actions can be implemented against the identified person.
The remainder of this article is structured as follows: Section II covers previous studies. Section III describes the proposed scheme, and Section IV presents the experimental analysis results and findings. Finally, Section V concludes the paper and finishes the report with a discussion on future research.
This study explores CNNs based on image classification and support vector machines based on literature and prior works (SVMs). AlexNet is a well-known and successful CNN structure for image classification; its design is easy to comprehend, and it detects images well. AlexNet, an important and widely used method in ImageNet, initiated the CNN model trend, and its picture categorization is state-of-the-art [5].
Background filtration is often considered an antiquated approach [6], whereas the Haar cascade is an additional classifier used to find objects [7]. In a project from the Tech-Crunch Disrupt Hackathon [8], the team created “Auto-Trash,” an automated garbage sorting system that uses a Raspberry Pi camera and camera module to segregate trash.
In another approach, TrashNet is used for the comparative analysis of image classification methods, as discussed in [9]. For this analysis, half the dataset is used for testing without data augmentation. [10], the authors developed an automatic detection system for urban garbage using the smart city concept.
Moreover, action recognition helps detect behavior when dumping garbage [11]. However, adopting a conventional action recognition method is complex because details regarding the data required to create a dumping behavior model using DL algorithms still need to be improved [12].
Waste identification is required prior to segregation and can be accomplished using various machine learning and image processing methods. CNNs are frequently selected for this stage because they can extract individual visual components and categorize them into specified groups [13]. GPUs have a substantially enhanced computational capacity, allowing enormous picture collections to be analyzed in a short amount of time. Consequently, CNNs have attracted significant attention in recent years [12].
Plastic and non-plastic waste can be classified by capturing images of the trash with a camera and transferring these images to a CNN for preprocessing [14].
A CNN may recognize major tasks, and important properties can be extracted directly from camera pictures [10]. The evaluation involves acquiring knowledge regarding robotic movements from monocular pictures gathered via Reinforcement Learning. More than 14 robotic manipulators attempted 800,000 captures. Real-world applications of such a strategy are useless because of the high computing resources and lengthy learning times required. If the optimizer provides trajectories for addressing fraudulent activities, CNN policy searches are directed to a local optimum, and the procedure is accelerated [15].
A CNN model identifies garbage in pictures captured by a robot-mounted web camera [16]. Google researchers produced a MobileNet study employing CNN. For example, “mobile-first” is user-friendly and operates quickly on mobile devices. MobileNets has two parameters, width, and resolution multipliers, which are set to balance the resource-precision tradeoff. The resolution multiplier modifies the input image size, whereas the width multiplier reduces the thickness of the network. These modifications can diminish the internal structure of all layers [17].
The current state of waste reuse is impacted by (1) government budgets and programs: inadequate government legislation and MSW management budgets, (2) home education: diminishing the importance of recycling in the home, (3) technology: less effective reuse technology, and (4) administrative costs: manual separation incurs high costs. Recent developments in computer vision DL have helped to develop this remarkable system [18].
Sakr et al. attempted to improve the efficiency of waste filters using machine learning algorithms. They used two common techniques: DL with CNNs, and SVMs. The SVMs had a separation accuracy of 94.8%, compared with 83% for the CNNs [19].
Recycling is an essential part of a sustainable society [19]. However, the entire recycling process entails substantial hidden expenses associated with the proper selection, processing, and categorization of recovered materials. In many countries, sorting garbage may make it difficult to find the correct landfill containing the greatest variety of construction materials. Hence, it is necessary to identify a suitable strategy for industry and a knowledge-based society that has favorable environmental consequences and positive economic benefits. A few CNN parameters are simple to train; however, several CNN-based models for waste segregation have been developed for further exploration. This research validated a single object in an image and classified it as a recyclable material such as plastic, paper, or metal [20].
The identification of trash in densely populated areas is vital to the environment. Unfortunately, human monitoring is time-consuming and labor-intensive. Consequently, a system was developed to monitor the spread of garbage in densely populated regions using aircraft hyperspectral data. Therefore, the method is ideal for large-scale environmental monitoring. A novel hyperspectral image classification network, MSCNN, is proposed for garbage identification to produce a binary garbage segmentation map and categorize HSI data pixels. Garbage detection comprises two steps: unsupervised item detection and HSI classification. We manually tagged the HSI garbage dataset because it is not publicly accessible. DL has demonstrated resilient efficiency in several sectors, and many HSI classification techniques have been based on it. In HSI, a CNN-like high-level feature extractor is used, and classification is performed using a multilayer perceptron (MLP) [21].
The city's on-street waste managers spend a significant amount of time and money cleaning waste because of their random appearances. Sight inspections of street cleanliness are essential for collecting cleanliness data; however, they are conducted in real time and are automated. The current evaluation methods have several drawbacks. This is the first time that a high-resolution camera has been placed in a car to capture street images. The images were stored on mobileedge servers to extract street images. A server at the edge of the network is known as an edge server. In the city network, processed street data is sent to the cloud data center for processing. In addition, two distinct edge servers are used for mobile-edge processing to perform the two tasks. In the first task, the overall performance of the system is enhanced. When object detection is conducted, the image data are provided to CNN as the first input to adjust the size of the images. Second, the image data are preprocessed in the edge server to decrease the total system duration.
The improved algorithm automatically resizes the image when it is transferred to the edge server. At mobile stations, municipal trucks are utilized specifically for waste collection. The roofs of trash pickup vans are equipped with highpixel, high-resolution, network-transmission-capable cameras. The camera covers a frontal distance of 50 m. The trash collection van captures regular images and transmits the data to the edge server. Urban residents also gather waste. The gathered data ware sent to an edge server from the mobile devices of urban people through the edge server. A mobile device is required to connect to wireless data networks to process part of a transaction [22].
Owing to the wide range of object sizes and how objects change shape with water flow, finding garbage across water channels is a more difficult challenge. Various efforts have been made to address the issue of object size fluctuations, including the use of picture pyramids and feature fusion networks—attention mechanisms, thinning U-shaped modules, and attention mechanisms. In a study for a specific category of trash detection, the dataset was manually collected and annotated trash images, and a new layer added to the architecture that implicitly captures the smaller object, which helps detect trash. An attention layer was implemented to improve the detection performance of algorithms using tiny garbage particles [23].
An automated garbage technique was proposed to enhance recycling. The dataset includes two or more waste objects from various categories. The initial step in the process is to localize the image and divide various types of garbage into three groups: paper, recyclables, and landfills. The automatic waste classifier sends an input image with a resolution of 768 × 1024 and at least two waste objects on a white backdrop. On a white backdrop, these elements may or may not overlap and may be of different sizes. A faster R-CNN model, previously trained using the PASCAL VOC dataset, was adjusted. The improved model generates anchors (region suggestions) and categorize items [24].
Another study illustrated the use of an intelligent unmanned aerial vehicles (UAV) controller for detecting marine trash. This system uses UAV to effectively identify marine waste and notify relevant authorities about trash pollution in real time. The primary objective of this project was to develop a UAV and IoT architecture consisting of a datastreaming platform with a video-streaming server, web service, the MongoDB database, and a Kafka messaging system. Internet technology and a human-machine interface was leveraged for system communication, and remote monitoring was performed at a distant site. A garbage detector based on the YOLO object-identification approach was constructed using algorithm selection, hyperparameter tuning, and model evaluation to determine the ideal model with the lowest generalization error [25].
A study was conducted on a robot that can identify and classify garbage into organic and inorganic waste for social education. The robot moved independently and searched public spaces for waste. When a piece of garbage is recognized visually, the robot measures the distance moves closer to the object, emits a sound to draw attention to itself, and gets others to pick up the trash it has discovered and place it in the robot’s mounted trash can [26].
Several techniques have been proposed for trash-detection systems. One method called the TACO trash dataset was developed [27] using a Mask R-CNN [28]. First, the authors proposed [29] and used another version of the improved YOLOv2 [30] model, which predicts an automatic garbage-detection system. Similarly, researchers [31] have changed the loss function in YOLOv3 [32] and developed a UAV-based automated floating garbage monitoring system: Real-time data analysis and monitoring were excluded, even though this research offers various helpful waste detection techniques.
The research in [33], using the Haar cascade method, suggests that the robot is initially detected in the presence of garbage. Well-known object detectors include YOLO-v3-Tiny [34] and SSD [35] in contrast to YOLO-v3 [36] and PeeleNet [37]. However, they could not detect smaller objects in our unique visual trash-detection scenario. To address this problem, an algorithm may be forced to prioritize smaller objects using an attention layer.
In [38-41], five DL models—DenseNet121, DenseNet169, InceptionResnetV2, and MobileNet—were used to compare the garbage [38]. The investigation yielded ideal findings with the precision of the models falling between 89% and 95%. In [39], the accuracy of the CNN-based classification model WasteNet was 97%. Another study [40] used a faster R-CNN for classification and exhibited an accuracy range of 70%-84%. Similarly, Ref. [41] employed an R-CNN to classify trash with an accuracy of 64%. Except for [41], all other investigations used the TrashNet dataset.
Compared with other convolution neural networks and the outcomes of their research, faster R-CNN and the dataset expansion in this study obtained a larger mean average accuracy of 68.3% [42]. In [43], a VGG-16 CNN was employed for training, and an 88.42% validation accuracy was achieved by overfitting the CNN using hyperparameter, architectural, or classification adjustments on the fully connected layers. A recent study using the TrashNet dataset yielded superior results. In [44], a faster R-CNN object detector based on Inception V2 and pretrained on the COCO dataset was examined. The results showed an accuracy of 84.2% and a recall rate of 87.8%. Using DenseNet121 and Inception-Res-NetV2, it was established that the test accuracies were 95% and 87%, respectively. In the same study, the improved RecycleNet model demonstrated 81% test accuracy [45].
Using a pretrained VGG-16 network, the findings in [46] revealed a test accuracy of 93%. Adedeji and Wang achieved 87% accuracy using a ResNet50 network with an SVM classifier [47].
In [48], the automation of Trash Sorting using DL reached an accuracy of 86%; however, the best accuracy to date was achieved by [49] using MobileNetV2 for feature extraction and an SVM classifier, which achieved an accuracy of 98.7%. Nevertheless, this study needs to be reviewed by the peers. In [50], a proprietary model and a lightweight neural network based on the AlexNet-SSD model were employed for garbage detection, whereas in [51], a custom trash dataset was used to train a multilayer hybrid DL model for waste classification and recycling. Most of these algorithms report the presence and categorization of an image object. Utilizing hierarchical representations, such as those based on DL approaches, [52] investigated the detection of aluminum profiles inside images using a CNN for object recognition through image segmentation.
This study proposes a method that is efficient and easy to implement. Other methods have been used to address the issue of improper disposal in communities, apprehend those responsible, and protect the environment from contamination. In this study, the proposed addition to the deep CNN model increases the precision. The dataset was trained using the default CNN model, as shown in Fig. 1. However, the results are unsatisfactory. Thus, the speed of the model increased. The addition of 1024 × 1024 CNN layers to the model improved its accuracy. This boosts the accuracy of the model while decreasing its speed. The input images were subdivided into grid cells, G × G, and maintained at 13 × 13. During the learning phase, the probability of an object in an image is assigned to each grid cell. The picture is labeled using the LabelImg program, which creates a bounding box based on five predictions. The cell anticipates a rectangular bounding box. The bounding box prediction comprises the horizontal and vertical components at the center height and width of the image, together with a confidence score. These parts are identified as “X”, “Y”, “H”, and “W”. The final label for the confidence score is “Cs”. The “Cs” represents the precision of a model that predicts the item. The object inside the bounding box is properly identified using the confidence score of the prediction.
The gradient disappears and becomes sparse when the ReLU is used to lower the activation function. The two activation functions used were ReLU and Mish to transmit information across deeper levels. The ReLU prevents gradient destruction, whereas Mish prevents capping. The first 15 layers prevent saturation in the network by minimizing delayed training, and the following unbounded characteristics produce a regularization effect. The enhanced CNN model replaced the Mish activation function with a LeakyReLU activation function. There is a tradeoff between the training time and the ultimate number of convolutional layers. The computational complexity was reduced to achieve a small prediction error and a simpler model while preserving the precision of an extended hit-and-trial procedure.
In this study, a deep CNN-based architecture was adopted, as shown in Fig. 1, for real-time trash detection, by classifying eight trash categories. The proposed deep CNN model was deployed on a CCTV camera to evaluate the results, as shown in Fig. 5. After detecting trash thrown by a person, the proposed model notifies the authorities about trash dumping.
In this section, the usefulness of the proposed model is discussed. The collected dataset, complete specifications of the experimental findings, and the performance data from Table 2 are provided below.
Table 2 . Performance of default deep CNN model and improved CNN model
Data Set | Performance metrics | |||
---|---|---|---|---|
Default Model CNN (%) | Improved Deep CNN Model (%) | |||
Mask Class | Precision | 45.24 | Precision | 99.11 |
Sensitivity | 82.93 | Sensitivity | 93.93 | |
F1 Score | 61.90 | F1 Score | 99.05 | |
F2 Score | 79.46 | F2 Score | 99.02 | |
Dice-coefficient | 62.47 | Dice-coefficient | 96.45 | |
Bottle Class | Precision | 54.37 | Precision | 92.61 |
Sensitivity | 88.84 | Sensitivity | 96.28 | |
F1 Score | 69.93 | F1 Score | 95.69 | |
F2 Score | 84.44 | F2 Score | 97.65 | |
Dice-coefficient | 66.36 | Dice-coefficient | 94.41 | |
Plastic Class | Precision | 42.27 | Precision | 97.19 |
Sensitivity | 88.99 | Sensitivity | 96.33 | |
F1 Score | 59.06 | F1 Score | 98.08 | |
F2 Score | 77.55 | F2 Score | 98.63 | |
Dice-coefficient | 57.31 | Dice-coefficient | 96.76 | |
Juice Class | Precision | 31.77 | Precision | 100 |
Sensitivity | 75.66 | Sensitivity | 91.18 | |
F1 Score | 47.66 | F1 Score | 99.49 | |
F2 Score | 69.16 | F2 Score | 99.19 | |
Dice-coefficient | 44.75 | Dice-coefficient | 95.38 | |
Box Class | Precision | 41.94 | Precision | 99.60 |
Sensitivity | 89.27 | Sensitivity | 96.35 | |
F1 Score | 58.74 | F1 Score | 99.29 | |
F2 Score | 77.32 | F2 Score | 99.11 | |
Dice-coefficient | 57.07 | Dice-coefficient | 97.95 | |
Automobile Class | Precision | 62.90 | Precision | 99.71 |
Sensitivity | 78.99 | Sensitivity | 92.26 | |
F1 Score | 76.62 | F1 Score | 99.35 | |
F2 Score | 88.16 | F2 Score | 99.14 | |
Dice-coefficient | 70.04 | Dice-coefficient | 95.84 | |
Tissue Class | Precision | 36.17 | Precision | 97.27 |
Sensitivity | 60.34 | Sensitivity | 83.13 | |
F1 Score | 52.83 | F1 Score | 98.12 | |
F2 Score | 73.03 | F2 Score | 98.64 | |
Dice-coefficient | 45.23 | Dice-coefficient | 89.65 | |
Diaper Class | Precision | 38.96 | Precision | 99.38 |
Sensitivity | 83.77 | Sensitivity | 94.32 | |
F1 Score | 55.75 | F1 Score | 99.18 | |
F2 Score | 75.20 | F2 Score | 99.07 | |
Dice-coefficient | 53.19 | Dice-coefficient | 96.78 |
As there are no publicly accessible datasets for garbage detection, the data collection procedure was performed manually.
The entire dataset for the enhanced CNN model consisted of over 2100 pictures divided into eight classes of garbage with 300 images in each class. These categories included masks, tissue sheets, plastic bags, boxes, vehicle parts, bottles, and juice boxes. The dataset was not augmented with additional information. Consequently, we generated videos of each garbage type from various perspectives and divided the videos into individual frames. Subsequently, a Python script was used to convert the videos into images. Fig. 2 shows the photographs of the dataset.
The most difficult task was to label the dataset using the LabelImg application. More than 2100 pictures were labeled in the CNN model dataset. After dividing the video into frames, the CNN model was trained using a tagged dataset. The performance of the model was assessed based on an accuracy of 90.2% and average weight loss (approximately 0.0930). Using enhanced layers, the CNN model produced more precise and better outcomes. Consequently, the findings of the enhanced model validated the detection with an accuracy of 97.6% and an average weight loss of 0.6928, which is consistent with the training results. Upon the discovery of trash, the CNN returned to the bounding box. The following pixel values indicate the location of the object's bounding box: (xmin,ymax), (xmax,ymax), (xmin,ymin), and (xmin,ymax) [4].
The image center value was (0,0). The object and image center errors are represented as
Consequently, it can be argued that ex(t) and ey(t) must be equal to zero or always approximate to accurately detect an object. Additionally, for effective detection, the middle and center of the image must be identical. As illustrated in Fig. 5, the middle class detects the bounding box value, which is also employed for detection. The CNN model uses a CCTV camera to detect garbage and identify objects. A Logitech C310 720p HD camera was used for video recording, and it took pictures at 1080p/30 fps; the distinctive feature was auto-focus with a full HD glass lens, and the field of vision was 78o, resulting in high-resolution HD video with images.
The CNN model was enhanced by adding layers that anticipated more precise outcomes, as shown in Fig. 1. The flowchart of the CNN architecture is illustrated in Fig. 1. The CCTV cameras continuously detect trash violations, as shown in Fig. 5. Each video frame is passed through a CNN to detect the image from the model. All the identified objects (people) are present in the submitted images, as indicated by the bounding boxes produced by the CNN-based model. Fig. 5 shows the test pictures in which the detected items were allocated to the boundary boxes. The model operates in realtime using a CCTV camera. When the model identified an item as garbage, it created a movie. This video was captured using video recordings and CCTV pictures stored in the model’s path set.
After 1,000 iterations, the weights were recalculated. The weights and greatest mean average precision (mAP) were assessed for the training. Training on the enhanced CNN model was concluded, and the maximum mAP for the upgraded CNN model was determined to be 97.6%, as shown in Fig. 4(b). The accuracy of the default CNN model and its mean average precision (mAP) were 89%; however, adding layers resulted in a substantial increase to the accuracy of the enhanced model is 97.6%. Table 4 lists the momentum of 0.9 in the default and enhanced models with a learning rate of 0.001 and other additional variables. As shown in Table 3, the data implies that a more robust and comprehensive model may enhance precision by modifying its function. In total, 10,000 iterations were performed to refine the training model and achieve the highest accuracy. After 1000 iterations, the weight file results were progressively compiled. Model accuracy was calculated during the testing phase using the best-weight files with a significant mAP. A GPU GTX 1660 Super X was used to train the CNN model with 64 batches, eight subdivisions, and 65 filters. The advanced model detected the item in 3.568.84 ms, whereas the default model detected the object in 7.895 ms. This is because the advanced model was far more in-depth than the default model.
Table 3 . Parameters and values of the improved CNN model
Parameters | Value |
---|---|
Momentum | 0.9 |
Iterations (t) | 10,000 |
Optimizer | Stochastic Gradient Descent (SGD) |
Batch size | 32 |
Subdivision | 16 |
Learning Rate | 0.001 |
Exposure | 1.5 |
Filter | 65 |
Stride | 1 |
Hue | 0.1 |
Saturation | 1.5 |
Decay | 0.0005 |
Channels | 3 |
Table 4 . Calculations and differentiation of both models of the CNN
Model | Mean Average Precision (mAP) | Accuracy % |
---|---|---|
CNN (Improved) | 0.0930 | 97.6% |
CNN (Default) | 0.6928 | 90.2% |
CNN has a superficial distinction that includes IoU mistakes, classification, and coordinate errors. CNN employs a loss calculation represented by a sum-squared error. The formula for the loss function is.
Estimating the total loss function required calculating the weight of each loss function, respectively. During the training phase, when the classification and coordinate errors were both constant, the model displayed unstable behavior and dissimilarity. Hence, the weighting of coordinate error is set at λ = 5. CNN uses the λnoobj error code to prevent misunderstanding between the no-object and object grids. The absolute loss function computed for the training dataset is expressed as follows:
According to the above equation, each cell number corresponding to the prediction boxes is represented by B and the number of grids is denoted by g.
The coordinate center of each cell is (x, y), and its width and height are (width, height) (h, w). Moreover, the loss function location of the weight is marked by λcoord, and the prediction box confidence is given by C. The confidence of class objects is denoted by the letter R, whereas the weight loss function is labeled as λ noobj. The value is 1 if objects trained in this class are present, and 0 [3].
This study proposed a real-time CCTV-based garbage-detection method for modern societies using a deep convolutional neural network (CNN) with person identification. The dataset consisted of eight classes, and it was challenging to develop a custom dataset owing to the unavailability of a specific trash dataset on the Internet. In the training done on the two CNN models, they were trained to improve the accuracy of the results. The model was first trained using the default CNN architecture, which achieved an accuracy of 90%. However, the accuracy of the default model needs to be improved by adding layers to the default CNN model. After adding layers, the accuracy improved to 97%. This increase in accuracy slightly affects the speed of the model.
To detect trash, the CCTV camera captures a video of an individual who puts trash incorrectly and sends an alert to the relevant authority. After experimenting with the proposed work, it was observed that throwing trash in modern societies has reduced drastically because of people's awareness of this technology. Overall, the successful GC analysis showed outstanding accuracy and speed. The outcomes were assessed with respect to the time, mAP, and precision for the trained models and datasets.
In the future, a larger and more accurate dataset may be explored to expand this research by adding more trash categories, and additional CNN models, such as detection models, can be examined for training purposes. In addition, thermal CCTV cameras may be used to detect trash at night for high accuracy. However, other techniques, such as blockchain-based real-time trash detection, can be used to keep the data of the person who throws trash in the wrong place and notify the authorities to find the person. Similarly, when lightweight models are introduced, the deep CNN model can be further made lightweight and optimized for deployment on the edge device. Mobile applications, along with the new features, could be future contributions to this project.
Table 1 . Summary of existing studies including drawback
Existing studies | Details | Drawback |
---|---|---|
Donovan et al. [8] | Auto-trash sort’s garbage automatically | These contributions are similar because they focus on a specific type of trash. Our contribution is highly focused on real-time CCTV-based garbage detection using Deep CNN with person identification for eight specific classes of trash. |
Ozkaya et al. [9] | Fine tune model comparisons on garbage detection for intelligent urban management | |
Wang et al. [10] | Autonomous garbage detection for intelligent urban management | |
Yun et al. [11] | Vision‐based garbage dumping action detection for real‐world surveillance platform | |
Hulyalkar et al. [12] | Implementation of smart bin using CNNs | |
Tarun et al. [13] | Segregation of plastic and non-plastic waste using CNNs | |
Valente et al [14] | Detection of waste containers using computer vision | |
Bai et al [16] | DL-based robot for automatically picking up garbage on the grass | |
Rabano [17] | Common garbage classification using MobileNet, 2018 IEEE 10th International Conference on Humanoid |
This work was supported by the Priority Research Centers Program through the National Research Foundation of Republic of Korea (NRF), funded by the Ministry of Education, Science, and Technology (2018R1A6A1A03024003).
This research was supported by the MSIT (Ministry of Science and ICT), Republic of Korea, under the ITRC (Information Technology Research Center) support program (IITP-2023-RS-2023-00259061) supervised by the IITP (Institute for Information and Communications Technology Planning and Evaluation (IITP).
Syed Muhammad Raza
is studying his graduate (MS.) degree majoring in IT Convergence Engineering in 2023 from WENS Lab, Kumoh National Institute of Technology (KIT), South Korea. Currently, he is working as a graduate fellow and researcher, KIT, South Korea. His research interests mainly include image processing, deep learning, lightweight deep learning models, computer vision, and drones.
Syed Ghazi Hassan
obtained the BS. degree in Computer Science in 2022 from PAF-KIET University. Currently, he is working as an automation engineer at bank AL-FALAH, Karachi, Pakistan.
Syed Ali Hassan
obtained the M.S. degree in IT Convergence Engineering in 2020 from WENS Lab, Kumoh National Institute of Technology (KIT), Republic of Korea. Currently, he is working as a post-graduate fellow and researcher at Scuola Superiore Sant'Anna, Pisa, Italy. His research interests mainly include image processing, deep learning, computer vision, and robotics.
Soo Young Shin
received the Ph.D. degree in electrical engineering and computer science from Seoul National University in 2006. He was with WiMAX Design Lab, Samsung Electronics, Suwon, South Korea from 2007 to 2010. He joined as full-time Professor to School of Electronics, Kumoh National Institute of Technology, Gumi, South Korea from 2010. He is currently an Associate Professor. He was a post-doc. researcher at University of Washington, USA in 2007. In addition, he was a visiting scholar at University of British Columbia, Canada in 2017. His research interests include 5G/6G wireless communications & networks, signal processing, Internet of things, mixed reality, and drone applications.
Journal of information and communication convergence engineering 2024; 22(2): 109-120
Published online June 30, 2024 https://doi.org/10.56977/jicce.2024.22.2.109
Copyright © Korea Institute of Information and Communication Engineering.
Syed Muhammad Raza 1, Syed Ghazi Hassan 2, Syed Ali Hassan 3, and Soo Young Shin1* , Member, KIICE
1Department of IT Convergence Engineering, Kumoh National Institute of Technology, Gumi-si 39177, Republic of Korea
2Department of Computing and Information Sciences, Karachi Institute of Economics and Technology, Karachi 75190, Pakistan
3Department of Bio Robotics, Scuola Superiore Sant'Anna, Pisa 56127, Italy
Correspondence to:Soo Young Shin (E-mail : wdragon@kumoh.ac.kr)
Room 112, Digital Bldg., 61 Daehakro, Yanghodong, Gumi 39177, Gyeongbuk
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Trash or garbage is one of the most dangerous health and environmental problems that affect pollution. Pollution affects nature, human life, and wildlife. In this paper, we propose modern solutions for cleaning the environment of trash pollution by enforcing strict action against people who dump trash inappropriately on streets, outside the home, and in unnecessary places. Artificial Intelligence (AI), especially Deep Learning (DL), has been used to automate and solve issues in the world. We availed this as an excellent opportunity to develop a system that identifies trash using a deep convolutional neural network (CNN). This paper proposes a real-time garbage identification system based on a deep CNN architecture with eight distinct classes for the training dataset. After identifying the garbage, the CCTV camera captures a video of the individual placing the trash in the incorrect location and sends an alert notice to the relevant authority.
Keywords: Convolutional neural network, CCTV camera, deep learning, garbage detection, real-time detection
Trash waste and garbage pollution are among the most severe problems in Karachi. According to global indexing reports, Karachi ranks second among the most polluted cities worldwide. In addition to commercial and industrial garbage, the city generates up to 16,000 Mt/ton of home waste daily, and the daily disposal rates at landfills range between 7,000 and 8,000 tons. Officials have claimed that almost 2.5 million tons of waste is dumped in the city's residential zones. Identifying appropriate landfill sites has become one of the most challenging urban planning problems [1]. Karachi requires a good trash heap, and finding pure land in the city is difficult. Many forms of garbage, including homes, buildings, and material waste, are thrown everywhere. Around grocery stores and hospitals, a few heaps of rotting trash and plastic bags may be spotted. Karachi, the country's commercial and financial hub, has transformed from an urban area to a metropolitan area. Masses of trash are thrown in street corners, behind mosques, in shuttered schools, and elsewhere. Because public garbage bins fill up faster, many containers tend to overflow owing to a lack of resources and authorities to collect trash from the city. This leads to overcrowded roads, bad odors, and adverse health and environmental effects.
Modern societies are developing cities and making life more comfortable by reducing garbage pollution, smart CCTV cameras are deployed on each street, in apartments, and in every possible place to be monitored 24/7 to ensure and protect the place from trash. Trash detection is a challenging task that can be performed using a convolutional neural network (CNN). Garbage disposal is essential to treat waste in a well-rounded and safe manner. A system based on image detection is designed and applied to recognize different garbage bins for classification. Real-time RGB images of garbage disposal lines are captured using a CCD camera. The received RGB format images are converted into an HSI color format to reduce the impact of ambient light [2].
Many other methods have been introduced using DL, an area of machine learning concerned with the development and execution of artificial intelligence (AI) neural networks’ brain-like structures. DL plays a significant role in text recognition, particularly in recognizing fonts from images. The font recognition of the Turkish alphabet has been conducted using the DL technique. In addition to font recognition, DL methods can be used to classify brain cancer and other diseases [3].
CNNs have attracted considerable research interest because automatic detection is crucial during the learning stage. CNN models can extract abstract features and exhibit excellent image classification and object-detection performance [3]. CNNs have been used for identification, detection, processing, and segmentation. Their performance is continually enhanced because they automatically extract features from images. Recent developments have introduced several CNN object detectors for CNN-based object detection, the two most important of which are region-based CNN (R-CNNs) and single-shot detectors (SSD). A powerful graphics processing unit (GPU) is required for R-CNN detectors because they perform computationally intensive tasks, whereas SSDs require only the convolution network [4]. The CNN classification model is significantly more intuitive. To fully comprehend CNNs, the first layer identifies most of the image’s simple structures; for example, in a standard two-dimensional image, these structures include the edges in the images, wherein each filter path is mandatory. Different filters operate at different angles at the edge when edge details in the image are considered. Feature maps containing the information structure of the filter are used to detect the filters in the first layer of the output. The findings contained different edge structures connected to the images. The preceding layer of the feature depicts the edges recognized by the subsequent evolution layer. Each convolutional layer analyzes the correlations between the image data and the combination structures found in the preceding layer. Complexity increases the number of layers, which helps separate semantic information from structural information.
The following describes the operation of the convolutional layer, which obtains the image input. Matrix filters begin with random integers; the picture has specific coordinates such as height, breadth, and depth. The input pictures have the same depth as the channels, and the spatial resolution of the filters is minimal. The depth of the grayscale filters is one, whereas the RGB filters have three depths. The input loudness is convolved using a filter.
In addition, a filter is placed above the input volume. One image is spatially moved over another, and the dot products over the entire image are examined. Eventually, the filters generate maps to activate the input image [4].
This article proposes a real-time CCTV-based garbage-detection method for modern societies using a deep CNN with person identification. The contributions of this study are as follows.
A real-time CCTV-based garbage-detection scheme is proposed to identify people and detect trash using CCTV cameras.
A large custom dataset was collected to train the model. The custom dataset consists of eight trash classes. Each class has more than 300 images, and the compiled dataset includes over 2100 images.
New layers were added to further improve the accuracy of the model. However, the speed of the model was slightly affected, but its accuracy improved.
CCTV records a video of a person throwing trash. The CNN model can quickly identify a person throwing trash and further actions can be implemented against the identified person.
The remainder of this article is structured as follows: Section II covers previous studies. Section III describes the proposed scheme, and Section IV presents the experimental analysis results and findings. Finally, Section V concludes the paper and finishes the report with a discussion on future research.
This study explores CNNs based on image classification and support vector machines based on literature and prior works (SVMs). AlexNet is a well-known and successful CNN structure for image classification; its design is easy to comprehend, and it detects images well. AlexNet, an important and widely used method in ImageNet, initiated the CNN model trend, and its picture categorization is state-of-the-art [5].
Background filtration is often considered an antiquated approach [6], whereas the Haar cascade is an additional classifier used to find objects [7]. In a project from the Tech-Crunch Disrupt Hackathon [8], the team created “Auto-Trash,” an automated garbage sorting system that uses a Raspberry Pi camera and camera module to segregate trash.
In another approach, TrashNet is used for the comparative analysis of image classification methods, as discussed in [9]. For this analysis, half the dataset is used for testing without data augmentation. [10], the authors developed an automatic detection system for urban garbage using the smart city concept.
Moreover, action recognition helps detect behavior when dumping garbage [11]. However, adopting a conventional action recognition method is complex because details regarding the data required to create a dumping behavior model using DL algorithms still need to be improved [12].
Waste identification is required prior to segregation and can be accomplished using various machine learning and image processing methods. CNNs are frequently selected for this stage because they can extract individual visual components and categorize them into specified groups [13]. GPUs have a substantially enhanced computational capacity, allowing enormous picture collections to be analyzed in a short amount of time. Consequently, CNNs have attracted significant attention in recent years [12].
Plastic and non-plastic waste can be classified by capturing images of the trash with a camera and transferring these images to a CNN for preprocessing [14].
A CNN may recognize major tasks, and important properties can be extracted directly from camera pictures [10]. The evaluation involves acquiring knowledge regarding robotic movements from monocular pictures gathered via Reinforcement Learning. More than 14 robotic manipulators attempted 800,000 captures. Real-world applications of such a strategy are useless because of the high computing resources and lengthy learning times required. If the optimizer provides trajectories for addressing fraudulent activities, CNN policy searches are directed to a local optimum, and the procedure is accelerated [15].
A CNN model identifies garbage in pictures captured by a robot-mounted web camera [16]. Google researchers produced a MobileNet study employing CNN. For example, “mobile-first” is user-friendly and operates quickly on mobile devices. MobileNets has two parameters, width, and resolution multipliers, which are set to balance the resource-precision tradeoff. The resolution multiplier modifies the input image size, whereas the width multiplier reduces the thickness of the network. These modifications can diminish the internal structure of all layers [17].
The current state of waste reuse is impacted by (1) government budgets and programs: inadequate government legislation and MSW management budgets, (2) home education: diminishing the importance of recycling in the home, (3) technology: less effective reuse technology, and (4) administrative costs: manual separation incurs high costs. Recent developments in computer vision DL have helped to develop this remarkable system [18].
Sakr et al. attempted to improve the efficiency of waste filters using machine learning algorithms. They used two common techniques: DL with CNNs, and SVMs. The SVMs had a separation accuracy of 94.8%, compared with 83% for the CNNs [19].
Recycling is an essential part of a sustainable society [19]. However, the entire recycling process entails substantial hidden expenses associated with the proper selection, processing, and categorization of recovered materials. In many countries, sorting garbage may make it difficult to find the correct landfill containing the greatest variety of construction materials. Hence, it is necessary to identify a suitable strategy for industry and a knowledge-based society that has favorable environmental consequences and positive economic benefits. A few CNN parameters are simple to train; however, several CNN-based models for waste segregation have been developed for further exploration. This research validated a single object in an image and classified it as a recyclable material such as plastic, paper, or metal [20].
The identification of trash in densely populated areas is vital to the environment. Unfortunately, human monitoring is time-consuming and labor-intensive. Consequently, a system was developed to monitor the spread of garbage in densely populated regions using aircraft hyperspectral data. Therefore, the method is ideal for large-scale environmental monitoring. A novel hyperspectral image classification network, MSCNN, is proposed for garbage identification to produce a binary garbage segmentation map and categorize HSI data pixels. Garbage detection comprises two steps: unsupervised item detection and HSI classification. We manually tagged the HSI garbage dataset because it is not publicly accessible. DL has demonstrated resilient efficiency in several sectors, and many HSI classification techniques have been based on it. In HSI, a CNN-like high-level feature extractor is used, and classification is performed using a multilayer perceptron (MLP) [21].
The city's on-street waste managers spend a significant amount of time and money cleaning waste because of their random appearances. Sight inspections of street cleanliness are essential for collecting cleanliness data; however, they are conducted in real time and are automated. The current evaluation methods have several drawbacks. This is the first time that a high-resolution camera has been placed in a car to capture street images. The images were stored on mobileedge servers to extract street images. A server at the edge of the network is known as an edge server. In the city network, processed street data is sent to the cloud data center for processing. In addition, two distinct edge servers are used for mobile-edge processing to perform the two tasks. In the first task, the overall performance of the system is enhanced. When object detection is conducted, the image data are provided to CNN as the first input to adjust the size of the images. Second, the image data are preprocessed in the edge server to decrease the total system duration.
The improved algorithm automatically resizes the image when it is transferred to the edge server. At mobile stations, municipal trucks are utilized specifically for waste collection. The roofs of trash pickup vans are equipped with highpixel, high-resolution, network-transmission-capable cameras. The camera covers a frontal distance of 50 m. The trash collection van captures regular images and transmits the data to the edge server. Urban residents also gather waste. The gathered data ware sent to an edge server from the mobile devices of urban people through the edge server. A mobile device is required to connect to wireless data networks to process part of a transaction [22].
Owing to the wide range of object sizes and how objects change shape with water flow, finding garbage across water channels is a more difficult challenge. Various efforts have been made to address the issue of object size fluctuations, including the use of picture pyramids and feature fusion networks—attention mechanisms, thinning U-shaped modules, and attention mechanisms. In a study for a specific category of trash detection, the dataset was manually collected and annotated trash images, and a new layer added to the architecture that implicitly captures the smaller object, which helps detect trash. An attention layer was implemented to improve the detection performance of algorithms using tiny garbage particles [23].
An automated garbage technique was proposed to enhance recycling. The dataset includes two or more waste objects from various categories. The initial step in the process is to localize the image and divide various types of garbage into three groups: paper, recyclables, and landfills. The automatic waste classifier sends an input image with a resolution of 768 × 1024 and at least two waste objects on a white backdrop. On a white backdrop, these elements may or may not overlap and may be of different sizes. A faster R-CNN model, previously trained using the PASCAL VOC dataset, was adjusted. The improved model generates anchors (region suggestions) and categorize items [24].
Another study illustrated the use of an intelligent unmanned aerial vehicles (UAV) controller for detecting marine trash. This system uses UAV to effectively identify marine waste and notify relevant authorities about trash pollution in real time. The primary objective of this project was to develop a UAV and IoT architecture consisting of a datastreaming platform with a video-streaming server, web service, the MongoDB database, and a Kafka messaging system. Internet technology and a human-machine interface was leveraged for system communication, and remote monitoring was performed at a distant site. A garbage detector based on the YOLO object-identification approach was constructed using algorithm selection, hyperparameter tuning, and model evaluation to determine the ideal model with the lowest generalization error [25].
A study was conducted on a robot that can identify and classify garbage into organic and inorganic waste for social education. The robot moved independently and searched public spaces for waste. When a piece of garbage is recognized visually, the robot measures the distance moves closer to the object, emits a sound to draw attention to itself, and gets others to pick up the trash it has discovered and place it in the robot’s mounted trash can [26].
Several techniques have been proposed for trash-detection systems. One method called the TACO trash dataset was developed [27] using a Mask R-CNN [28]. First, the authors proposed [29] and used another version of the improved YOLOv2 [30] model, which predicts an automatic garbage-detection system. Similarly, researchers [31] have changed the loss function in YOLOv3 [32] and developed a UAV-based automated floating garbage monitoring system: Real-time data analysis and monitoring were excluded, even though this research offers various helpful waste detection techniques.
The research in [33], using the Haar cascade method, suggests that the robot is initially detected in the presence of garbage. Well-known object detectors include YOLO-v3-Tiny [34] and SSD [35] in contrast to YOLO-v3 [36] and PeeleNet [37]. However, they could not detect smaller objects in our unique visual trash-detection scenario. To address this problem, an algorithm may be forced to prioritize smaller objects using an attention layer.
In [38-41], five DL models—DenseNet121, DenseNet169, InceptionResnetV2, and MobileNet—were used to compare the garbage [38]. The investigation yielded ideal findings with the precision of the models falling between 89% and 95%. In [39], the accuracy of the CNN-based classification model WasteNet was 97%. Another study [40] used a faster R-CNN for classification and exhibited an accuracy range of 70%-84%. Similarly, Ref. [41] employed an R-CNN to classify trash with an accuracy of 64%. Except for [41], all other investigations used the TrashNet dataset.
Compared with other convolution neural networks and the outcomes of their research, faster R-CNN and the dataset expansion in this study obtained a larger mean average accuracy of 68.3% [42]. In [43], a VGG-16 CNN was employed for training, and an 88.42% validation accuracy was achieved by overfitting the CNN using hyperparameter, architectural, or classification adjustments on the fully connected layers. A recent study using the TrashNet dataset yielded superior results. In [44], a faster R-CNN object detector based on Inception V2 and pretrained on the COCO dataset was examined. The results showed an accuracy of 84.2% and a recall rate of 87.8%. Using DenseNet121 and Inception-Res-NetV2, it was established that the test accuracies were 95% and 87%, respectively. In the same study, the improved RecycleNet model demonstrated 81% test accuracy [45].
Using a pretrained VGG-16 network, the findings in [46] revealed a test accuracy of 93%. Adedeji and Wang achieved 87% accuracy using a ResNet50 network with an SVM classifier [47].
In [48], the automation of Trash Sorting using DL reached an accuracy of 86%; however, the best accuracy to date was achieved by [49] using MobileNetV2 for feature extraction and an SVM classifier, which achieved an accuracy of 98.7%. Nevertheless, this study needs to be reviewed by the peers. In [50], a proprietary model and a lightweight neural network based on the AlexNet-SSD model were employed for garbage detection, whereas in [51], a custom trash dataset was used to train a multilayer hybrid DL model for waste classification and recycling. Most of these algorithms report the presence and categorization of an image object. Utilizing hierarchical representations, such as those based on DL approaches, [52] investigated the detection of aluminum profiles inside images using a CNN for object recognition through image segmentation.
This study proposes a method that is efficient and easy to implement. Other methods have been used to address the issue of improper disposal in communities, apprehend those responsible, and protect the environment from contamination. In this study, the proposed addition to the deep CNN model increases the precision. The dataset was trained using the default CNN model, as shown in Fig. 1. However, the results are unsatisfactory. Thus, the speed of the model increased. The addition of 1024 × 1024 CNN layers to the model improved its accuracy. This boosts the accuracy of the model while decreasing its speed. The input images were subdivided into grid cells, G × G, and maintained at 13 × 13. During the learning phase, the probability of an object in an image is assigned to each grid cell. The picture is labeled using the LabelImg program, which creates a bounding box based on five predictions. The cell anticipates a rectangular bounding box. The bounding box prediction comprises the horizontal and vertical components at the center height and width of the image, together with a confidence score. These parts are identified as “X”, “Y”, “H”, and “W”. The final label for the confidence score is “Cs”. The “Cs” represents the precision of a model that predicts the item. The object inside the bounding box is properly identified using the confidence score of the prediction.
The gradient disappears and becomes sparse when the ReLU is used to lower the activation function. The two activation functions used were ReLU and Mish to transmit information across deeper levels. The ReLU prevents gradient destruction, whereas Mish prevents capping. The first 15 layers prevent saturation in the network by minimizing delayed training, and the following unbounded characteristics produce a regularization effect. The enhanced CNN model replaced the Mish activation function with a LeakyReLU activation function. There is a tradeoff between the training time and the ultimate number of convolutional layers. The computational complexity was reduced to achieve a small prediction error and a simpler model while preserving the precision of an extended hit-and-trial procedure.
In this study, a deep CNN-based architecture was adopted, as shown in Fig. 1, for real-time trash detection, by classifying eight trash categories. The proposed deep CNN model was deployed on a CCTV camera to evaluate the results, as shown in Fig. 5. After detecting trash thrown by a person, the proposed model notifies the authorities about trash dumping.
In this section, the usefulness of the proposed model is discussed. The collected dataset, complete specifications of the experimental findings, and the performance data from Table 2 are provided below.
Table 2 . Performance of default deep CNN model and improved CNN model.
Data Set | Performance metrics | |||
---|---|---|---|---|
Default Model CNN (%) | Improved Deep CNN Model (%) | |||
Mask Class | Precision | 45.24 | Precision | 99.11 |
Sensitivity | 82.93 | Sensitivity | 93.93 | |
F1 Score | 61.90 | F1 Score | 99.05 | |
F2 Score | 79.46 | F2 Score | 99.02 | |
Dice-coefficient | 62.47 | Dice-coefficient | 96.45 | |
Bottle Class | Precision | 54.37 | Precision | 92.61 |
Sensitivity | 88.84 | Sensitivity | 96.28 | |
F1 Score | 69.93 | F1 Score | 95.69 | |
F2 Score | 84.44 | F2 Score | 97.65 | |
Dice-coefficient | 66.36 | Dice-coefficient | 94.41 | |
Plastic Class | Precision | 42.27 | Precision | 97.19 |
Sensitivity | 88.99 | Sensitivity | 96.33 | |
F1 Score | 59.06 | F1 Score | 98.08 | |
F2 Score | 77.55 | F2 Score | 98.63 | |
Dice-coefficient | 57.31 | Dice-coefficient | 96.76 | |
Juice Class | Precision | 31.77 | Precision | 100 |
Sensitivity | 75.66 | Sensitivity | 91.18 | |
F1 Score | 47.66 | F1 Score | 99.49 | |
F2 Score | 69.16 | F2 Score | 99.19 | |
Dice-coefficient | 44.75 | Dice-coefficient | 95.38 | |
Box Class | Precision | 41.94 | Precision | 99.60 |
Sensitivity | 89.27 | Sensitivity | 96.35 | |
F1 Score | 58.74 | F1 Score | 99.29 | |
F2 Score | 77.32 | F2 Score | 99.11 | |
Dice-coefficient | 57.07 | Dice-coefficient | 97.95 | |
Automobile Class | Precision | 62.90 | Precision | 99.71 |
Sensitivity | 78.99 | Sensitivity | 92.26 | |
F1 Score | 76.62 | F1 Score | 99.35 | |
F2 Score | 88.16 | F2 Score | 99.14 | |
Dice-coefficient | 70.04 | Dice-coefficient | 95.84 | |
Tissue Class | Precision | 36.17 | Precision | 97.27 |
Sensitivity | 60.34 | Sensitivity | 83.13 | |
F1 Score | 52.83 | F1 Score | 98.12 | |
F2 Score | 73.03 | F2 Score | 98.64 | |
Dice-coefficient | 45.23 | Dice-coefficient | 89.65 | |
Diaper Class | Precision | 38.96 | Precision | 99.38 |
Sensitivity | 83.77 | Sensitivity | 94.32 | |
F1 Score | 55.75 | F1 Score | 99.18 | |
F2 Score | 75.20 | F2 Score | 99.07 | |
Dice-coefficient | 53.19 | Dice-coefficient | 96.78 |
As there are no publicly accessible datasets for garbage detection, the data collection procedure was performed manually.
The entire dataset for the enhanced CNN model consisted of over 2100 pictures divided into eight classes of garbage with 300 images in each class. These categories included masks, tissue sheets, plastic bags, boxes, vehicle parts, bottles, and juice boxes. The dataset was not augmented with additional information. Consequently, we generated videos of each garbage type from various perspectives and divided the videos into individual frames. Subsequently, a Python script was used to convert the videos into images. Fig. 2 shows the photographs of the dataset.
The most difficult task was to label the dataset using the LabelImg application. More than 2100 pictures were labeled in the CNN model dataset. After dividing the video into frames, the CNN model was trained using a tagged dataset. The performance of the model was assessed based on an accuracy of 90.2% and average weight loss (approximately 0.0930). Using enhanced layers, the CNN model produced more precise and better outcomes. Consequently, the findings of the enhanced model validated the detection with an accuracy of 97.6% and an average weight loss of 0.6928, which is consistent with the training results. Upon the discovery of trash, the CNN returned to the bounding box. The following pixel values indicate the location of the object's bounding box: (xmin,ymax), (xmax,ymax), (xmin,ymin), and (xmin,ymax) [4].
The image center value was (0,0). The object and image center errors are represented as
Consequently, it can be argued that ex(t) and ey(t) must be equal to zero or always approximate to accurately detect an object. Additionally, for effective detection, the middle and center of the image must be identical. As illustrated in Fig. 5, the middle class detects the bounding box value, which is also employed for detection. The CNN model uses a CCTV camera to detect garbage and identify objects. A Logitech C310 720p HD camera was used for video recording, and it took pictures at 1080p/30 fps; the distinctive feature was auto-focus with a full HD glass lens, and the field of vision was 78o, resulting in high-resolution HD video with images.
The CNN model was enhanced by adding layers that anticipated more precise outcomes, as shown in Fig. 1. The flowchart of the CNN architecture is illustrated in Fig. 1. The CCTV cameras continuously detect trash violations, as shown in Fig. 5. Each video frame is passed through a CNN to detect the image from the model. All the identified objects (people) are present in the submitted images, as indicated by the bounding boxes produced by the CNN-based model. Fig. 5 shows the test pictures in which the detected items were allocated to the boundary boxes. The model operates in realtime using a CCTV camera. When the model identified an item as garbage, it created a movie. This video was captured using video recordings and CCTV pictures stored in the model’s path set.
After 1,000 iterations, the weights were recalculated. The weights and greatest mean average precision (mAP) were assessed for the training. Training on the enhanced CNN model was concluded, and the maximum mAP for the upgraded CNN model was determined to be 97.6%, as shown in Fig. 4(b). The accuracy of the default CNN model and its mean average precision (mAP) were 89%; however, adding layers resulted in a substantial increase to the accuracy of the enhanced model is 97.6%. Table 4 lists the momentum of 0.9 in the default and enhanced models with a learning rate of 0.001 and other additional variables. As shown in Table 3, the data implies that a more robust and comprehensive model may enhance precision by modifying its function. In total, 10,000 iterations were performed to refine the training model and achieve the highest accuracy. After 1000 iterations, the weight file results were progressively compiled. Model accuracy was calculated during the testing phase using the best-weight files with a significant mAP. A GPU GTX 1660 Super X was used to train the CNN model with 64 batches, eight subdivisions, and 65 filters. The advanced model detected the item in 3.568.84 ms, whereas the default model detected the object in 7.895 ms. This is because the advanced model was far more in-depth than the default model.
Table 3 . Parameters and values of the improved CNN model.
Parameters | Value |
---|---|
Momentum | 0.9 |
Iterations (t) | 10,000 |
Optimizer | Stochastic Gradient Descent (SGD) |
Batch size | 32 |
Subdivision | 16 |
Learning Rate | 0.001 |
Exposure | 1.5 |
Filter | 65 |
Stride | 1 |
Hue | 0.1 |
Saturation | 1.5 |
Decay | 0.0005 |
Channels | 3 |
Table 4 . Calculations and differentiation of both models of the CNN.
Model | Mean Average Precision (mAP) | Accuracy % |
---|---|---|
CNN (Improved) | 0.0930 | 97.6% |
CNN (Default) | 0.6928 | 90.2% |
CNN has a superficial distinction that includes IoU mistakes, classification, and coordinate errors. CNN employs a loss calculation represented by a sum-squared error. The formula for the loss function is.
Estimating the total loss function required calculating the weight of each loss function, respectively. During the training phase, when the classification and coordinate errors were both constant, the model displayed unstable behavior and dissimilarity. Hence, the weighting of coordinate error is set at λ = 5. CNN uses the λnoobj error code to prevent misunderstanding between the no-object and object grids. The absolute loss function computed for the training dataset is expressed as follows:
According to the above equation, each cell number corresponding to the prediction boxes is represented by B and the number of grids is denoted by g.
The coordinate center of each cell is (x, y), and its width and height are (width, height) (h, w). Moreover, the loss function location of the weight is marked by λcoord, and the prediction box confidence is given by C. The confidence of class objects is denoted by the letter R, whereas the weight loss function is labeled as λ noobj. The value is 1 if objects trained in this class are present, and 0 [3].
This study proposed a real-time CCTV-based garbage-detection method for modern societies using a deep convolutional neural network (CNN) with person identification. The dataset consisted of eight classes, and it was challenging to develop a custom dataset owing to the unavailability of a specific trash dataset on the Internet. In the training done on the two CNN models, they were trained to improve the accuracy of the results. The model was first trained using the default CNN architecture, which achieved an accuracy of 90%. However, the accuracy of the default model needs to be improved by adding layers to the default CNN model. After adding layers, the accuracy improved to 97%. This increase in accuracy slightly affects the speed of the model.
To detect trash, the CCTV camera captures a video of an individual who puts trash incorrectly and sends an alert to the relevant authority. After experimenting with the proposed work, it was observed that throwing trash in modern societies has reduced drastically because of people's awareness of this technology. Overall, the successful GC analysis showed outstanding accuracy and speed. The outcomes were assessed with respect to the time, mAP, and precision for the trained models and datasets.
In the future, a larger and more accurate dataset may be explored to expand this research by adding more trash categories, and additional CNN models, such as detection models, can be examined for training purposes. In addition, thermal CCTV cameras may be used to detect trash at night for high accuracy. However, other techniques, such as blockchain-based real-time trash detection, can be used to keep the data of the person who throws trash in the wrong place and notify the authorities to find the person. Similarly, when lightweight models are introduced, the deep CNN model can be further made lightweight and optimized for deployment on the edge device. Mobile applications, along with the new features, could be future contributions to this project.
Table 1 . Summary of existing studies including drawback.
Existing studies | Details | Drawback |
---|---|---|
Donovan et al. [8] | Auto-trash sort’s garbage automatically | These contributions are similar because they focus on a specific type of trash. Our contribution is highly focused on real-time CCTV-based garbage detection using Deep CNN with person identification for eight specific classes of trash. |
Ozkaya et al. [9] | Fine tune model comparisons on garbage detection for intelligent urban management | |
Wang et al. [10] | Autonomous garbage detection for intelligent urban management | |
Yun et al. [11] | Vision‐based garbage dumping action detection for real‐world surveillance platform | |
Hulyalkar et al. [12] | Implementation of smart bin using CNNs | |
Tarun et al. [13] | Segregation of plastic and non-plastic waste using CNNs | |
Valente et al [14] | Detection of waste containers using computer vision | |
Bai et al [16] | DL-based robot for automatically picking up garbage on the grass | |
Rabano [17] | Common garbage classification using MobileNet, 2018 IEEE 10th International Conference on Humanoid |
This work was supported by the Priority Research Centers Program through the National Research Foundation of Republic of Korea (NRF), funded by the Ministry of Education, Science, and Technology (2018R1A6A1A03024003).
This research was supported by the MSIT (Ministry of Science and ICT), Republic of Korea, under the ITRC (Information Technology Research Center) support program (IITP-2023-RS-2023-00259061) supervised by the IITP (Institute for Information and Communications Technology Planning and Evaluation (IITP).
Table 1 . Summary of existing studies including drawback.
Existing studies | Details | Drawback |
---|---|---|
Donovan et al. [8] | Auto-trash sort’s garbage automatically | These contributions are similar because they focus on a specific type of trash. Our contribution is highly focused on real-time CCTV-based garbage detection using Deep CNN with person identification for eight specific classes of trash. |
Ozkaya et al. [9] | Fine tune model comparisons on garbage detection for intelligent urban management | |
Wang et al. [10] | Autonomous garbage detection for intelligent urban management | |
Yun et al. [11] | Vision‐based garbage dumping action detection for real‐world surveillance platform | |
Hulyalkar et al. [12] | Implementation of smart bin using CNNs | |
Tarun et al. [13] | Segregation of plastic and non-plastic waste using CNNs | |
Valente et al [14] | Detection of waste containers using computer vision | |
Bai et al [16] | DL-based robot for automatically picking up garbage on the grass | |
Rabano [17] | Common garbage classification using MobileNet, 2018 IEEE 10th International Conference on Humanoid |
Table 2 . Performance of default deep CNN model and improved CNN model.
Data Set | Performance metrics | |||
---|---|---|---|---|
Default Model CNN (%) | Improved Deep CNN Model (%) | |||
Mask Class | Precision | 45.24 | Precision | 99.11 |
Sensitivity | 82.93 | Sensitivity | 93.93 | |
F1 Score | 61.90 | F1 Score | 99.05 | |
F2 Score | 79.46 | F2 Score | 99.02 | |
Dice-coefficient | 62.47 | Dice-coefficient | 96.45 | |
Bottle Class | Precision | 54.37 | Precision | 92.61 |
Sensitivity | 88.84 | Sensitivity | 96.28 | |
F1 Score | 69.93 | F1 Score | 95.69 | |
F2 Score | 84.44 | F2 Score | 97.65 | |
Dice-coefficient | 66.36 | Dice-coefficient | 94.41 | |
Plastic Class | Precision | 42.27 | Precision | 97.19 |
Sensitivity | 88.99 | Sensitivity | 96.33 | |
F1 Score | 59.06 | F1 Score | 98.08 | |
F2 Score | 77.55 | F2 Score | 98.63 | |
Dice-coefficient | 57.31 | Dice-coefficient | 96.76 | |
Juice Class | Precision | 31.77 | Precision | 100 |
Sensitivity | 75.66 | Sensitivity | 91.18 | |
F1 Score | 47.66 | F1 Score | 99.49 | |
F2 Score | 69.16 | F2 Score | 99.19 | |
Dice-coefficient | 44.75 | Dice-coefficient | 95.38 | |
Box Class | Precision | 41.94 | Precision | 99.60 |
Sensitivity | 89.27 | Sensitivity | 96.35 | |
F1 Score | 58.74 | F1 Score | 99.29 | |
F2 Score | 77.32 | F2 Score | 99.11 | |
Dice-coefficient | 57.07 | Dice-coefficient | 97.95 | |
Automobile Class | Precision | 62.90 | Precision | 99.71 |
Sensitivity | 78.99 | Sensitivity | 92.26 | |
F1 Score | 76.62 | F1 Score | 99.35 | |
F2 Score | 88.16 | F2 Score | 99.14 | |
Dice-coefficient | 70.04 | Dice-coefficient | 95.84 | |
Tissue Class | Precision | 36.17 | Precision | 97.27 |
Sensitivity | 60.34 | Sensitivity | 83.13 | |
F1 Score | 52.83 | F1 Score | 98.12 | |
F2 Score | 73.03 | F2 Score | 98.64 | |
Dice-coefficient | 45.23 | Dice-coefficient | 89.65 | |
Diaper Class | Precision | 38.96 | Precision | 99.38 |
Sensitivity | 83.77 | Sensitivity | 94.32 | |
F1 Score | 55.75 | F1 Score | 99.18 | |
F2 Score | 75.20 | F2 Score | 99.07 | |
Dice-coefficient | 53.19 | Dice-coefficient | 96.78 |
Table 3 . Parameters and values of the improved CNN model.
Parameters | Value |
---|---|
Momentum | 0.9 |
Iterations (t) | 10,000 |
Optimizer | Stochastic Gradient Descent (SGD) |
Batch size | 32 |
Subdivision | 16 |
Learning Rate | 0.001 |
Exposure | 1.5 |
Filter | 65 |
Stride | 1 |
Hue | 0.1 |
Saturation | 1.5 |
Decay | 0.0005 |
Channels | 3 |
Table 4 . Calculations and differentiation of both models of the CNN.
Model | Mean Average Precision (mAP) | Accuracy % |
---|---|---|
CNN (Improved) | 0.0930 | 97.6% |
CNN (Default) | 0.6928 | 90.2% |
Guangxing Wang, Gwanghyun Jo, and Seong-Yoon Shin, Member, KIICE
Journal of information and communication convergence engineering 2022; 20(4): 303-308 https://doi.org/10.56977/jicce.2022.20.4.303Jin-Hyun Park, Young-Kiu Choi
Journal of information and communication convergence engineering 2020; 18(2): 106-114 https://doi.org/10.6109/jicce.2020.18.2.106Jang, Myeungjae;Hong, Jeongkyu;
Journal of information and communication convergence engineering 2021; 19(1): 22-28 https://doi.org/10.6109/jicce.2021.19.1.22