Journal of information and communication convergence engineering 2021; 19(4): 257-262
Published online December 31, 2021
https://doi.org/10.6109/jicce.2021.19.4.257
© Korea Institute of Information and Communication Engineering
Providing web services to users has become expensive in recent times. For better web services, a web server is provided with high-performance technology. To achieve great web service experiences, tools such as general-purpose graphics processing units (GPGPUs), artificial intelligence, high-performance computing, and three-dimensional simulation are widely used. However, graphics processing units (GPUs) are used in high-speed operations and have limited general applications. In this study, we developed a task queue in a GPU to improve the performance of a web service using a multiprocessor and studied how to receive and process user requests in bulk. We propose the use of a GPGPU-based task queue to process user requests more than GPGPU based a central processing unit thread, and to process more GPU threads on task queue at about 136% to 233%, and proved that the proposed method is effective for web service.
Keywords CUDA, GPGPU, GPU, Task queue, Web service
Web servers continually handle an unspecified number of user requests [1]. However, a large number of user requests can disrupt or prevent smooth web services.
Even web servers that normally have no problem handling user requests fail to handle a large number of user requests that suddenly occur, and sometimes the web server goes down [2-4]. In the event of a web server download, the system can be rebooted, or the web server can be restored to normal by investing considerable time [5].
To solve these problems, we are utilizing cloud services, which are systems that can increase the resources of web servers and dynamically allocate resources. Most of these efforts are costly, and in some cases, do not offer a complete solution [6, 7].
In this study, to handle user requests for web services, we developed task queues that can utilize general-purpose graphics processing units (GPGPUs) multi-threaded processing power and store hypertext transfer protocol (HTTP) requests in task queues by pre-processing the GPU’s pre-processed HTTP to relieve the burden of the web server[8]. Additionally, because task queues cannot be applied to exceed task queue capacity by applying a circular queue algorithm, web servers can reliably process web services by accepting web users’ requests that can be processed from web servers.
In this study, by using GPGPU, we design a task queue that can handle a large number of requirements and improve the performance of the central processing unit (CPU) by reducing the packet processing part of the HTTP on the GPU.
If the CPU load is heavy and the user request cannot be processed, the task queue can inform the user of the waiting time and order. Fig. 1 shows a system configuration consisting of a web application server (WAS) and a GPGPU task queue-type web server on a node server.
The consumer of the HTTP server generates a task queue on the GPU using the GPGPU programming method and sends the C code to be executed in the GPU to generate a large number of threads as the C code executes. The HTTP server consumer creates and manages the thread pool, and the HTTP server producer also uses the GPU as the GPGPU programming method to access the task queue. When the producer receives a user request, it sends a request to the task queue.
Fig. 2 shows the data flow of the server producer and consumer of the HTTP server. When a client sends a request to a web server, the producer’s HTTP server immediately sends a packet to the GPU’s task queue upon receiving the user’s HTTP request. The GPU thread stores the packet data, checks HTTP packet errors, records user data errors, and stores the contents of the error in a variable and waits.
The consumer’s HTTP server reads the task queue’s data asynchronously and checks whether any errors have been handled by the GPU thread. If an error occurs, the consumer thread sends the result of the error code to the producer thread, and the producer thread immediately sends the result to the user. The consumer thread passes data to the web application server if there is no error in the data.
The producer thread is written as a Transmission Control Protocol/Internet Protocol (TCP/IP) socket server programming method. If there is an HTTP request from the client, the server module receives the request and immediately creates the thread. The thread pushes data to the task queue, stores it, and waits for a response from the consumer thread and the web application server. When the server receives the response data, it responds to the client and the thread exits. The producer thread flow chart for the HTTP server is shown in Fig. 3.
The producer thread is written as a TCP/IP socket server programming method. If there is an HTTP request from the client, the server module receives the request and immediately creates the thread. The thread pushes data to the task queue, stores it, and waits for a response from the consumer thread and the web application server. When the server receives the response data, it responds to the client and the thread exits. The producer thread flow chart for the HTTP server is shown in Fig. 3.
The consumer thread accesses the task queue created by the producer thread by processing the HTTP protocol and asynchronously reading the data. The read data is already stored in the structure by checking data consistency, HTTP error, and user input error, among others, in the GPU. Therefore, when a data error occurs, referring to the value of a structure, an error is immediately returned to the producer thread. The producer thread processes the user’s response.
The thread of the task queue created by the producer thread checks the data of the user request and stores the error code and whether an error occurs in the structure variable of the Boolean type when an error occurs. The consumer thread checks the value of the Boolean variable and, if an error occurs, responds directly to the producer with the error code. If there are no errors, it sends an HTTP packet to the web application server.
The consumer thread creates a task queue on the GPU. To create a large number of threads on the GPU, we create a compute unified device architecture (CUDA) application and create threads to receive user requests from the GPU and wait to receive data asynchronously from the GPU.
When a user request is received from a task queue, it checks the error code and notifies the producer thread of the error immediately. If there is no error, send data to the web application server to process the user request. The user request processed by the web application sends a response to the user in the producer thread. The flow of the consumer thread is illustrated in Fig. 4.
Implement a GPGPU-based task queue using CUDA. Then, the algorithm of the task queue uses a circular queue algorithm, and the CUDA thread works as a circular queue to implement simple application processing. The HTTP packet error handling function is called from the CUDA kernel function and is created with many threads on the GPU. It checks the conformity of the HTTP packet by receiving the size of the structure and structure of task_queue_data defined in the CUDA kernel function as parameters and returns it with error_flag. The functions in Fig. 5 are used by calling the CUDA kernel function of the GPGPU-based task queue.
HTTP server producer and consumer server were implemented. The producer server receives the user’s request and sends the data to the task queue, which reads the user’s data from the task queue, processes the error, and sends the user’s request to the web application. As shown in Table 1, the producer server is a multi-threaded server that receives an HTTP request from a user client, separates the data to be processed into a task_queue_data structure, and stores the data in a task queue.
The producer server receives an HTTP request packet from the client. Separate the HTTP request data from the original data into a structure and store the data entered by the user in the user’s http_user_data structure separately. By storing each data structure in this manner, the task_queue structure is completed, and the structure of task_queue is pushed to the GPGPU task queue.
Scenario | Number of threads | HTTP Request | Execution Time | Performance Ratio | |
---|---|---|---|---|---|
Not Used Task Queue | Used Task Queue | ||||
1 | 100 | 300 | 2.566 ms | 1,097 ms | 233% |
2 | 1,000 | 3,000 | 5,440 ms | 3,775 ms | 144% |
3 | 3,000 | 9,000 | 8,937 ms | 6,524 ms | 136% |
The producer shares the shared_memey structure stored in the shared memory to share the Rear value of the task queue with the consumer.
As shown in Fig. 6, we implemented the HTTP server producer. The structure of the consumer server first initializes the shared memory to read the front information of the task queue from the shared memory. Next, the data are read from the task queue, and the error_flag is checked. If there is no error in the data, the HTTP request is forwarded to the WAS server.
The reason for the simple structure of the thread of the consumer server is that the program for data processing is executed in the task queue of the GPU so that the consumer can transmit the data to the WAS server immediately after checking the error.
The consumer of the HTTP server was implemented. The consumer server is a multi-threaded server program that implements the function of reading data from the task queue to check the data error and send it to the web application server to process the user’s request. As shown in Fig. 7, we implemented the consumer of the HTTP server.
Experiments were carried out in the same way by selecting three scenarios for a GPGPU-based task queue and servers that did not use GPGPU-based task queues.
Scenario 1 tested the processing speed of GPGPU-based task queues and servers that did not use GPGPU-based task queues in stable services.
Scenario 2 generated 1,000 threads in a short time and experimented with 3,000 HTTP requests. This experiment compares both stability and performance.
Scenario 3 generated a large number of threads to test the stability and speed of both servers. We experimented by checking the phenomenon of a server crash by generating 9,000 total HTTP requests.
Table 1 lists the performances of each scenario. Comparing the performance of the two servers shows a performance improvement of approximately 2 to 1.3 times.
When the number of threads is small, the performance difference in the HTTP request is large, the number of threads increases, and the number of HTTP requests increases, the speed difference between the two servers decreases. If several threads exist between the GPGPU-based task queue server and the GPGPU-based task queue server, the performance between the two servers becomes similar. This slowdown occurs when a large amount of data is copied from the CPU to the GPU in a GPGPU-based task queue server.
The GPGPU-based task queue server is copied using a peripheral component interconnect (PCI) express interface when copying data from the CPU to the GPU. The PCI Express interface is slow; therefore, increasing the data copy speed of the CPU and GPU can reduce the speed degradation caused by the increase in the HTTP request.
Table 2 lists the error rates of the two servers. In Scenario 1, both servers handle HTTP requests reliably. Scenario 1 is an experiment to check the performance; in general, the processing of the HTTP request is not an error.
Scenario | Number of threads | HTTP Request | Error (%) | |
---|---|---|---|---|
Not Used Task Queue | Used Task Queue | |||
1 | 100 | 300 | 0.0% | 0.0% |
2 | 1,000 | 3,000 | 2.43% | 0.0% |
3 | 3,000 | 9,000 | 15.4% | 0.0% |
Scenario 2 shows a 2.43% error in the server that does not use the GPGPU-based task queue. When several threads are generated in a short time and an HTTP request is requested, a request is rejected. In the case of a GPGPU-based task queue server, all HTTP requests are handled reliably.
Scenario 3 requested more threads and HTTP requests than Scenario 2. For servers that did not use the GPGPU-based task queue, the error rate was 15.4%. However, there was no server down, and the remaining HTTP requests were processed normally. All GPGPU-based task queue servers were able to process HTTP requests.
As shown in Tables 1 and 2, the GPGPU-based task queue server is better in performance and stability than the server that does not use the GPGPU-based task queue.
In this study, we implemented a GPGPU-based task queue using NVIDIA’s CUDA to handle many web service requests and secure the web server.
The experimental results show that the server’s performance using the GPGPU-based task queue is improved by 136% to 233% compared to that of the server that does not use the GPGPU-based task queue. Artificial intelligence, gaming, and high-performance science and technology computing, GPUs can improve the performance of common applications. To process a large number of user requests with the existing web service configuration, there is a high cost of improving the performance of the web server or installing and operating a plurality of web servers. Using the GPGPU-based task queue proposed in this paper, affordable, stable, and high-performance web services can be provided when numerous user requests are processed by one server and the GPU is installed on the server.
In addition, we expect that the GPGPU-based task queue server can improve the performance of the web server by providing multiple GPUs capable of handling more HTTP requests. We propose a solution that can solve the web service disruption by using a GPU at a low cost when there is a large number of user requests in existing web services.
GPGPU-based task queues can improve performance and handle more user requests, such as supporting various task queues for handling large user requirements in the general task queue. It is also expected to reduce the cost of handling a large number of user requests.
Therefore, as the GPGPU-based task queue proposed in this paper evolves and the performance of the GPU continues to improve, and the interface performance between the GPU and CPU further improves, the execution speed of the GPGPU-based applications in general-purpose applications will enhance. In this study, it is considered that changing the program to process various algorithms in the GPGPU-based task queue, it is considered that all the jobs that are requested by a large number of user requests can be processed more quickly and stably than the CPU.
He received the B.S., M.S., and Ph.D. degrees from the Department of Computer Engineering of Paichai University, Korea, in 1996, 1998, and 2002, respectively. From 2005 to 2012, he worked for the Department of Internet at Chungwoon University as a professor. Since 2013, he has worked in the Department of Computer Engineering at Paichai University, where he now works as a professor. His current research interests include multimedia document architecture modeling, Mobile web, and the datamining
He received a Ph.D. in Computer Engineering from Pai Chai University in 2018. From 2005 to 2011, he worked as Research Director at System Bank's Software Technology Research Institute. He is currently working as the CEO of COMMU, a company specializing in software development. His current research interests include software engineering, deep learning, and 3D simulation.
He received his M.S. and Ph. D. degrees in 1987 and 1993, respectively, from the Department of Computer Engineering, Kwangwoon University, Korea. From 1994 to 1995, he worked for ETRI as a researcher. Since 1994, he has worked in the Department of Computer Engineering at PaiChai University, where he now works as a professor. His current research interests include machine learning, IoT, big data, and artificial intelligence.
Journal of information and communication convergence engineering 2021; 19(4): 257-262
Published online December 31, 2021 https://doi.org/10.6109/jicce.2021.19.4.257
Copyright © Korea Institute of Information and Communication Engineering.
Changsu Kim, Kyunghwan Kim, Hoekyung Jung
PaiChai University
Providing web services to users has become expensive in recent times. For better web services, a web server is provided with high-performance technology. To achieve great web service experiences, tools such as general-purpose graphics processing units (GPGPUs), artificial intelligence, high-performance computing, and three-dimensional simulation are widely used. However, graphics processing units (GPUs) are used in high-speed operations and have limited general applications. In this study, we developed a task queue in a GPU to improve the performance of a web service using a multiprocessor and studied how to receive and process user requests in bulk. We propose the use of a GPGPU-based task queue to process user requests more than GPGPU based a central processing unit thread, and to process more GPU threads on task queue at about 136% to 233%, and proved that the proposed method is effective for web service.
Keywords: CUDA, GPGPU, GPU, Task queue, Web service
Web servers continually handle an unspecified number of user requests [1]. However, a large number of user requests can disrupt or prevent smooth web services.
Even web servers that normally have no problem handling user requests fail to handle a large number of user requests that suddenly occur, and sometimes the web server goes down [2-4]. In the event of a web server download, the system can be rebooted, or the web server can be restored to normal by investing considerable time [5].
To solve these problems, we are utilizing cloud services, which are systems that can increase the resources of web servers and dynamically allocate resources. Most of these efforts are costly, and in some cases, do not offer a complete solution [6, 7].
In this study, to handle user requests for web services, we developed task queues that can utilize general-purpose graphics processing units (GPGPUs) multi-threaded processing power and store hypertext transfer protocol (HTTP) requests in task queues by pre-processing the GPU’s pre-processed HTTP to relieve the burden of the web server[8]. Additionally, because task queues cannot be applied to exceed task queue capacity by applying a circular queue algorithm, web servers can reliably process web services by accepting web users’ requests that can be processed from web servers.
In this study, by using GPGPU, we design a task queue that can handle a large number of requirements and improve the performance of the central processing unit (CPU) by reducing the packet processing part of the HTTP on the GPU.
If the CPU load is heavy and the user request cannot be processed, the task queue can inform the user of the waiting time and order. Fig. 1 shows a system configuration consisting of a web application server (WAS) and a GPGPU task queue-type web server on a node server.
The consumer of the HTTP server generates a task queue on the GPU using the GPGPU programming method and sends the C code to be executed in the GPU to generate a large number of threads as the C code executes. The HTTP server consumer creates and manages the thread pool, and the HTTP server producer also uses the GPU as the GPGPU programming method to access the task queue. When the producer receives a user request, it sends a request to the task queue.
Fig. 2 shows the data flow of the server producer and consumer of the HTTP server. When a client sends a request to a web server, the producer’s HTTP server immediately sends a packet to the GPU’s task queue upon receiving the user’s HTTP request. The GPU thread stores the packet data, checks HTTP packet errors, records user data errors, and stores the contents of the error in a variable and waits.
The consumer’s HTTP server reads the task queue’s data asynchronously and checks whether any errors have been handled by the GPU thread. If an error occurs, the consumer thread sends the result of the error code to the producer thread, and the producer thread immediately sends the result to the user. The consumer thread passes data to the web application server if there is no error in the data.
The producer thread is written as a Transmission Control Protocol/Internet Protocol (TCP/IP) socket server programming method. If there is an HTTP request from the client, the server module receives the request and immediately creates the thread. The thread pushes data to the task queue, stores it, and waits for a response from the consumer thread and the web application server. When the server receives the response data, it responds to the client and the thread exits. The producer thread flow chart for the HTTP server is shown in Fig. 3.
The producer thread is written as a TCP/IP socket server programming method. If there is an HTTP request from the client, the server module receives the request and immediately creates the thread. The thread pushes data to the task queue, stores it, and waits for a response from the consumer thread and the web application server. When the server receives the response data, it responds to the client and the thread exits. The producer thread flow chart for the HTTP server is shown in Fig. 3.
The consumer thread accesses the task queue created by the producer thread by processing the HTTP protocol and asynchronously reading the data. The read data is already stored in the structure by checking data consistency, HTTP error, and user input error, among others, in the GPU. Therefore, when a data error occurs, referring to the value of a structure, an error is immediately returned to the producer thread. The producer thread processes the user’s response.
The thread of the task queue created by the producer thread checks the data of the user request and stores the error code and whether an error occurs in the structure variable of the Boolean type when an error occurs. The consumer thread checks the value of the Boolean variable and, if an error occurs, responds directly to the producer with the error code. If there are no errors, it sends an HTTP packet to the web application server.
The consumer thread creates a task queue on the GPU. To create a large number of threads on the GPU, we create a compute unified device architecture (CUDA) application and create threads to receive user requests from the GPU and wait to receive data asynchronously from the GPU.
When a user request is received from a task queue, it checks the error code and notifies the producer thread of the error immediately. If there is no error, send data to the web application server to process the user request. The user request processed by the web application sends a response to the user in the producer thread. The flow of the consumer thread is illustrated in Fig. 4.
Implement a GPGPU-based task queue using CUDA. Then, the algorithm of the task queue uses a circular queue algorithm, and the CUDA thread works as a circular queue to implement simple application processing. The HTTP packet error handling function is called from the CUDA kernel function and is created with many threads on the GPU. It checks the conformity of the HTTP packet by receiving the size of the structure and structure of task_queue_data defined in the CUDA kernel function as parameters and returns it with error_flag. The functions in Fig. 5 are used by calling the CUDA kernel function of the GPGPU-based task queue.
HTTP server producer and consumer server were implemented. The producer server receives the user’s request and sends the data to the task queue, which reads the user’s data from the task queue, processes the error, and sends the user’s request to the web application. As shown in Table 1, the producer server is a multi-threaded server that receives an HTTP request from a user client, separates the data to be processed into a task_queue_data structure, and stores the data in a task queue.
The producer server receives an HTTP request packet from the client. Separate the HTTP request data from the original data into a structure and store the data entered by the user in the user’s http_user_data structure separately. By storing each data structure in this manner, the task_queue structure is completed, and the structure of task_queue is pushed to the GPGPU task queue.
Scenario | Number of threads | HTTP Request | Execution Time | Performance Ratio | |
---|---|---|---|---|---|
Not Used Task Queue | Used Task Queue | ||||
1 | 100 | 300 | 2.566 ms | 1,097 ms | 233% |
2 | 1,000 | 3,000 | 5,440 ms | 3,775 ms | 144% |
3 | 3,000 | 9,000 | 8,937 ms | 6,524 ms | 136% |
The producer shares the shared_memey structure stored in the shared memory to share the Rear value of the task queue with the consumer.
As shown in Fig. 6, we implemented the HTTP server producer. The structure of the consumer server first initializes the shared memory to read the front information of the task queue from the shared memory. Next, the data are read from the task queue, and the error_flag is checked. If there is no error in the data, the HTTP request is forwarded to the WAS server.
The reason for the simple structure of the thread of the consumer server is that the program for data processing is executed in the task queue of the GPU so that the consumer can transmit the data to the WAS server immediately after checking the error.
The consumer of the HTTP server was implemented. The consumer server is a multi-threaded server program that implements the function of reading data from the task queue to check the data error and send it to the web application server to process the user’s request. As shown in Fig. 7, we implemented the consumer of the HTTP server.
Experiments were carried out in the same way by selecting three scenarios for a GPGPU-based task queue and servers that did not use GPGPU-based task queues.
Scenario 1 tested the processing speed of GPGPU-based task queues and servers that did not use GPGPU-based task queues in stable services.
Scenario 2 generated 1,000 threads in a short time and experimented with 3,000 HTTP requests. This experiment compares both stability and performance.
Scenario 3 generated a large number of threads to test the stability and speed of both servers. We experimented by checking the phenomenon of a server crash by generating 9,000 total HTTP requests.
Table 1 lists the performances of each scenario. Comparing the performance of the two servers shows a performance improvement of approximately 2 to 1.3 times.
When the number of threads is small, the performance difference in the HTTP request is large, the number of threads increases, and the number of HTTP requests increases, the speed difference between the two servers decreases. If several threads exist between the GPGPU-based task queue server and the GPGPU-based task queue server, the performance between the two servers becomes similar. This slowdown occurs when a large amount of data is copied from the CPU to the GPU in a GPGPU-based task queue server.
The GPGPU-based task queue server is copied using a peripheral component interconnect (PCI) express interface when copying data from the CPU to the GPU. The PCI Express interface is slow; therefore, increasing the data copy speed of the CPU and GPU can reduce the speed degradation caused by the increase in the HTTP request.
Table 2 lists the error rates of the two servers. In Scenario 1, both servers handle HTTP requests reliably. Scenario 1 is an experiment to check the performance; in general, the processing of the HTTP request is not an error.
Scenario | Number of threads | HTTP Request | Error (%) | |
---|---|---|---|---|
Not Used Task Queue | Used Task Queue | |||
1 | 100 | 300 | 0.0% | 0.0% |
2 | 1,000 | 3,000 | 2.43% | 0.0% |
3 | 3,000 | 9,000 | 15.4% | 0.0% |
Scenario 2 shows a 2.43% error in the server that does not use the GPGPU-based task queue. When several threads are generated in a short time and an HTTP request is requested, a request is rejected. In the case of a GPGPU-based task queue server, all HTTP requests are handled reliably.
Scenario 3 requested more threads and HTTP requests than Scenario 2. For servers that did not use the GPGPU-based task queue, the error rate was 15.4%. However, there was no server down, and the remaining HTTP requests were processed normally. All GPGPU-based task queue servers were able to process HTTP requests.
As shown in Tables 1 and 2, the GPGPU-based task queue server is better in performance and stability than the server that does not use the GPGPU-based task queue.
In this study, we implemented a GPGPU-based task queue using NVIDIA’s CUDA to handle many web service requests and secure the web server.
The experimental results show that the server’s performance using the GPGPU-based task queue is improved by 136% to 233% compared to that of the server that does not use the GPGPU-based task queue. Artificial intelligence, gaming, and high-performance science and technology computing, GPUs can improve the performance of common applications. To process a large number of user requests with the existing web service configuration, there is a high cost of improving the performance of the web server or installing and operating a plurality of web servers. Using the GPGPU-based task queue proposed in this paper, affordable, stable, and high-performance web services can be provided when numerous user requests are processed by one server and the GPU is installed on the server.
In addition, we expect that the GPGPU-based task queue server can improve the performance of the web server by providing multiple GPUs capable of handling more HTTP requests. We propose a solution that can solve the web service disruption by using a GPU at a low cost when there is a large number of user requests in existing web services.
GPGPU-based task queues can improve performance and handle more user requests, such as supporting various task queues for handling large user requirements in the general task queue. It is also expected to reduce the cost of handling a large number of user requests.
Therefore, as the GPGPU-based task queue proposed in this paper evolves and the performance of the GPU continues to improve, and the interface performance between the GPU and CPU further improves, the execution speed of the GPGPU-based applications in general-purpose applications will enhance. In this study, it is considered that changing the program to process various algorithms in the GPGPU-based task queue, it is considered that all the jobs that are requested by a large number of user requests can be processed more quickly and stably than the CPU.
Scenario | Number of threads | HTTP Request | Execution Time | Performance Ratio | |
---|---|---|---|---|---|
Not Used Task Queue | Used Task Queue | ||||
1 | 100 | 300 | 2.566 ms | 1,097 ms | 233% |
2 | 1,000 | 3,000 | 5,440 ms | 3,775 ms | 144% |
3 | 3,000 | 9,000 | 8,937 ms | 6,524 ms | 136% |
Scenario | Number of threads | HTTP Request | Error (%) | |
---|---|---|---|---|
Not Used Task Queue | Used Task Queue | |||
1 | 100 | 300 | 0.0% | 0.0% |
2 | 1,000 | 3,000 | 2.43% | 0.0% |
3 | 3,000 | 9,000 | 15.4% | 0.0% |
SeongJun Choi, DongCheon Kim, and Seog Chung Seo
Journal of information and communication convergence engineering 2024; 22(2): 98-108 https://doi.org/10.56977/jicce.2024.22.2.98Seo, Hyo-Seok;Kwon, Oh-Young;
The Korea Institute of Information and Commucation Engineering 2010; 8(3): 323-327 https://doi.org/10.6109/jicce.2010.8.3.323