Journal of information and communication convergence engineering 2022; 20(3): 189-194
Published online September 30, 2022
https://doi.org/10.56977/jicce.2022.20.3.189
© Korea Institute of Information and Communication Engineering
Correspondence to : *Amsuk Oh (E-mail: asoh@tu.ac.kr, Tel: +82-51-629-1211)
Department of Digital Media Engineering, Tongmyong University, Busan, 48520 Korea
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
As non-face-to-face activities have become commonplace, online video conferencing platforms have become popular collaboration tools. However, existing video conferencing platforms have a structure in which one side unilaterally exchanges information, potentially increase the fatigue of meeting participants. In this study, we designed a video conferencing platform utilizing virtual reality (VR), a metaverse technology, to enable various interactions. A virtual conferencing space and realistic VR video conferencing content authoring tool support system were designed using Meta’s Oculus Quest 2 hardware, the Unity engine, and 3D Max software. With the Photon software development kit, voice recognition was designed to perform automatic text translation with the Watson application programming interface, allowing the online video conferencing participants to communicate smoothly even if using different languages. It is expected that the proposed video conferencing platform will enable conference participants to interact and improve their work efficiency.
Keywords Interaction, Metaverse, Video Conferencing, Virtual Reality, Voice Translation
Virtual reality (VR) is one of the technologies for creating a metaverse environment, thereby allowing various activities in the real world to be performed in a virtual world [1]. Recently, as the quality of VR hardware and software has improved, VR has increasingly been used in realistic games for consumers, product visualizations for social areas and companies, VR collaborations, and business conference areas [2]. Existing online video conferencing platforms are also used for social and business meetings, but they often experience problems such as difficulties in interaction and leakages of personal information [3-4]. This is because the structure, in which the speaker attending the online meeting unilaterally transmits information to the listener, tends to be repeated. Therefore, there is a need to improve the fatigue experienced during the use of existing video conferencing platforms, as well the lack of adequate functionality.
In this paper, we propose the design of a metaverse VR-based video conferencing platform for improving participation performance and emotional commitment in an online virtual space. Video conferencing design systems are generally divided into virtual conferencing space design systems, realistic VR video conferencing content authoring tool support systems, and real-time voice chat systems. Each system operates organically as connected to the others, thereby combining smooth communication (an advantage of offline meetings) and unlimited space (an advantage of online meetings).
The remainder of this paper is organized as follows. The first section introduces studies related to the proposed platform, such as spatial and horizontal workrooms. The second section describes the design process of the three systems necessary to complete the platform. Finally, the paper is concluded by drawing conclusions and implications for platform design.
Since 2020, the metaverse has begun to grow significantly, proving its use not only in games and business-to-consumer entertainment, but also in business-to-business areas such as telecommuting and corporate events [5]. Bloomberg Intelligence predicted that the size of the metaverse market will grow from \$478.7 billion in 2020 to \$783.3 billion in 2024 [6]. Global companies such as Meta, Microsoft, and NDIVIA have also begun to participate in competition to develop virtual video conferencing platforms using metaverse technology, and various platforms are being studied [7].
“Spatial” is a platform for helping team members interact by supporting meetings as avatars in virtual spaces [8]. Spatial users can share traditional document files in a virtual space, share their computer screens with colleagues, and talk about work. Post-it notes can also be pasted into virtual spaces.
Figure 1 shows a portion of the Spatial virtual space.
Horizon Workrooms is a VR-based collaboration platform developed by Meta [9]. It is optimized for the company’s Oculus Quest 2, and its advantage is in that it feels like meeting and working in the real world. It supports a virtual keyboard for inter-user chats and can expand multiple screens (e.g., only those sharable in existing online video conferences) [10-11].
This study aims to design a metaverse platform able to increase the productivity of video conferencing users based on metaverse technologies such as software related to VR functions and hardware for assisting various interactions in the virtual world. First, the virtual conference space design system provides a virtual conference space without restrictions on the size of the space when users access the video conferencing. In this context, a realistic VR video conferencing content authoring tool support system is a VR-based system for helping users use a meta-bus by supporting a series of authoring tools such as laser pointers and avatars, and for executing the applications necessary for meetings and presentations. A real-time voice chat system analyzes a person's voice pattern using a voice recognition software development kit (SDK) and provides a real-time voice translation function.
Figure 2 shows the configuration diagram of the metaverse VR-based bidirectional video conferencing platform.
In this study, the design environment was divided into hardware for manipulating the content and software for developing it. A head-mounted display (HMD), i.e., an image display device worn on the head, used Oculus Quest 2 and a VR controller for interaction equipment. It used Unity and 3D Max software to produce avatar and video conferencing platforms. The software development was performed in a PC environment with a Ryzen 7-5800H CPU, RTX 3060 GPU, and 32 GB of RAM.
Table 1 describes the design environment of the metaverse VR-based video conferencing platform.
Table 1 . Design environment of Metaverse virtual reality (VR)-based video conferencing platform
Classification | Type | Name |
---|---|---|
Hardware | VR head-mounted display (HMD) | Oculus Quest 2 |
Interaction Tools | VR Controller | |
PC Environment | Ryzen 7-5800H (CPU) | |
RTX 3060 (GPU) | ||
32 GB (RAM) | ||
1 TB (STORAGE) | ||
Software | Video Conferencing | Unity (2020.330f1 ver.) |
Platform Feature | ||
Development Tools | ||
Metaverse Management | Photon Server | |
Server | ||
Avatar Design Tools | 3D Max, N.CAD |
The virtual conference space design system supports remote procedure call (RPC) communication with other users through the hardware and Photon server required for program execution and operation. The main body, VR HMD, and controller hardware use Facebook’s Oculus Quest 2 to develop the platform’s functionality with the Unity game engine. The Photon server allows for RPC communication with other users and produces avatars using voxel design techniques.
The Photon server is a Unity package for helping different objects synchronize over a network. When a client connects to the network, it first connects to a name server. The name server performs authentication processing on the application and server, and moves to the master server. The master server checks the information of all rooms before entering the meeting and selects the meeting room the user desires to enter. When the user enters the selected meeting room, they are moved to the video conference server designed to participate in the meeting. On the video conference server, the user can participate in the conference by using an authoring tool supported by the platform.
Figure 3 shows the operation structure of the Photon server.
The RPC is a communication method that allows a procedure to be called to a remote server without directly implementing a connection part for the client-server communication. An interface definition language is used to define the call conventions between clients and servers and to convert the parameters used for procedure calls through a stub. Because of the parameter conversion via the stub, this approach can be used to call a local function on a remote server when calling a procedure on the remote server.
The client’s RPC communication consists of executing remote procedures, outputting parameters, and generating return values. The client passes parameters to the client stub. The parameters from the client stub are converted. The client invokes the RPC client runtime library function and requests it based on the parameters converted by the server. The server RPC runtime library function accepts the request and invokes a server stub procedure. The server stub searches the network buffer for parameters and converts them from the network transport format to the format required by the server. The server stub calls the actual procedure from the server. The server completes the communication by returning the value to the client. The process for returning a value from the server to the client is similar to that for the previous process.
Figure 4 shows the RPC operation process.
Voxels are representations of two-dimensional pixels as three-dimensional cubes. When designing a voxel, several cubes are used to express objects. Designing in this way consumes less data and the 3D model can feel like pixel art; this makes it a design style suitable for VR design, where performance is the key. In addition, it has the advantage of being able to implement detailed textures and freely deform compared to polygons. We designed a virtual video conferencing space and optimized the operational performance of the data by utilizing a low-poly design.
Figure 5 represents one of the voxel designs.
A person who accesses the metaverse video conferencing platform while wearing an HMD and controller is connected to the virtual conferencing space provided by the platform. The virtual conference space can be built by directly modeling video conference participants. When a user customizes the virtual conference space and starts communicating with others, the support for the real-time voice chat system and realistic VR video conferencing content authoring tool support system begins. A realistic VR video conferencing content authoring tool support system provides functions such as brainstorming, presentation, team planning, product review, collaboration whiteboards, avatars, and interactions.
Figure 6 shows the process of using the virtual conferencing platform.
Voice recognition is a technology for recognizing a speaker’s voice and converting it into text. Automatic translation is a technology for converting voice recognition content into a language used by another party. The real-time voice chat system recognizes another person's voice during video conferencing and automatically translates it into other languages through natural language processing (NLP). NLP is a branch of artificial intelligence that deals with human-machine interactions, and can provide convenience and high immersion in virtual conferencing spaces by interpreting the soft languages used in everyday communication and converting them into other languages.
After the speech recognition process within Unity is performed, the real-time speech translation system allows the voices of other participants to be translated into the language used by the participants themselves for verification. When a videoconferencing participant has a conversation through the Photon Voice SDK in a metaverse virtual conference room, the Watson Speech-to-Text application program interface (API) recognizes the voice and converts it into textual information. The translated text performs automatic translation via the Watson Language Translator and applies a singletone pattern to Unity for memory leakage prevention and smooth data sharing between the different classes. The translated data are transmitted through the server to other clients, completing the system.
Figure 7 shows the real-time voice translation process.
The video conferencing platform designed in this study was tested using the Unity Tool. The test environment consisted of the user A avatar, user B avatar, screen where the other party's voice chat contents were displayed, and screen for document display and object manipulation.
Figure 8 shows the test environment.
The Oculus Quest 2 controller was used to manipulate objects, load documents, and/or activate voice chat functions. When the object recognition button on the controller was pressed, a raycast was output, which served to read information about the conflicting object. If the raycast did not collide with the object, it appeared as a red line; if it did, it appeared as a sky-blue line. Depending on whether the object with which the raycast conflicted was a UI object or GameObject, the response button was clicked to perform various interactions.
Figure 9 shows an example of object manipulation using the Oculus Quest2 controller.
When user A outputs a presentation using the controller and proceeds with the presentation, the screen viewed by user B is as shown in Fig. 10. User B can check the information, presentation materials, and memos from converting user A’s voice into text.
In this study, we designed and implemented of a metaverse VR-based video conferencing platform by dividing it into a virtual conferencing space design system, realistic VR video conferencing content authoring tool support system, and real-time voice chat system. The virtual conference space was designed by dividing it into the HMD, controller hardware, Unity engine, and Photon server software. Avatars to be active in the VR adopted a voxel design, and explained the RPC communication process to other users. The RPC could call procedures faster by helping clients and servers connect automatically. In the realistic VR video conference content authoring tool support system, various interactions, as well as the data sharing and collaboration used in existing online video conferences, were supported to improve the immersion. In the real-time voice translation system, voice recognition was performed using the Photon Voice SDK supported by Unity, and automatic translation was performed using the Watson Speech-to-Text API.
The proposed platform will provide experiences similar to real offline conferencing, thereby enhancing interaction and work efficiency among videoconferencing participants. Smooth communication is also possible when conducting business with conference participants of various nationalities. In addition, it is expected that the space can be used in various concepts, such as rest and travel, and not just as a working space.
This research was supported by BB21plus and funded by the Busan Metropolitan City and Busan Institute for Talent and Lifelong Education (BIT).
Journal of information and communication convergence engineering 2022; 20(3): 189-194
Published online September 30, 2022 https://doi.org/10.56977/jicce.2022.20.3.189
Copyright © Korea Institute of Information and Communication Engineering.
Dongeon Yoon1 and Amsuk Oh2* , Member, KIICE
1Department of Computer Media Engineering, TongMyong University, Busan 48520, Korea
2Department of Digital Media Engineering, TongMyong University, Busan 48520, Korea
Correspondence to:*Amsuk Oh (E-mail: asoh@tu.ac.kr, Tel: +82-51-629-1211)
Department of Digital Media Engineering, Tongmyong University, Busan, 48520 Korea
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
As non-face-to-face activities have become commonplace, online video conferencing platforms have become popular collaboration tools. However, existing video conferencing platforms have a structure in which one side unilaterally exchanges information, potentially increase the fatigue of meeting participants. In this study, we designed a video conferencing platform utilizing virtual reality (VR), a metaverse technology, to enable various interactions. A virtual conferencing space and realistic VR video conferencing content authoring tool support system were designed using Meta’s Oculus Quest 2 hardware, the Unity engine, and 3D Max software. With the Photon software development kit, voice recognition was designed to perform automatic text translation with the Watson application programming interface, allowing the online video conferencing participants to communicate smoothly even if using different languages. It is expected that the proposed video conferencing platform will enable conference participants to interact and improve their work efficiency.
Keywords: Interaction, Metaverse, Video Conferencing, Virtual Reality, Voice Translation
Virtual reality (VR) is one of the technologies for creating a metaverse environment, thereby allowing various activities in the real world to be performed in a virtual world [1]. Recently, as the quality of VR hardware and software has improved, VR has increasingly been used in realistic games for consumers, product visualizations for social areas and companies, VR collaborations, and business conference areas [2]. Existing online video conferencing platforms are also used for social and business meetings, but they often experience problems such as difficulties in interaction and leakages of personal information [3-4]. This is because the structure, in which the speaker attending the online meeting unilaterally transmits information to the listener, tends to be repeated. Therefore, there is a need to improve the fatigue experienced during the use of existing video conferencing platforms, as well the lack of adequate functionality.
In this paper, we propose the design of a metaverse VR-based video conferencing platform for improving participation performance and emotional commitment in an online virtual space. Video conferencing design systems are generally divided into virtual conferencing space design systems, realistic VR video conferencing content authoring tool support systems, and real-time voice chat systems. Each system operates organically as connected to the others, thereby combining smooth communication (an advantage of offline meetings) and unlimited space (an advantage of online meetings).
The remainder of this paper is organized as follows. The first section introduces studies related to the proposed platform, such as spatial and horizontal workrooms. The second section describes the design process of the three systems necessary to complete the platform. Finally, the paper is concluded by drawing conclusions and implications for platform design.
Since 2020, the metaverse has begun to grow significantly, proving its use not only in games and business-to-consumer entertainment, but also in business-to-business areas such as telecommuting and corporate events [5]. Bloomberg Intelligence predicted that the size of the metaverse market will grow from $478.7 billion in 2020 to $783.3 billion in 2024 [6]. Global companies such as Meta, Microsoft, and NDIVIA have also begun to participate in competition to develop virtual video conferencing platforms using metaverse technology, and various platforms are being studied [7].
“Spatial” is a platform for helping team members interact by supporting meetings as avatars in virtual spaces [8]. Spatial users can share traditional document files in a virtual space, share their computer screens with colleagues, and talk about work. Post-it notes can also be pasted into virtual spaces.
Figure 1 shows a portion of the Spatial virtual space.
Horizon Workrooms is a VR-based collaboration platform developed by Meta [9]. It is optimized for the company’s Oculus Quest 2, and its advantage is in that it feels like meeting and working in the real world. It supports a virtual keyboard for inter-user chats and can expand multiple screens (e.g., only those sharable in existing online video conferences) [10-11].
This study aims to design a metaverse platform able to increase the productivity of video conferencing users based on metaverse technologies such as software related to VR functions and hardware for assisting various interactions in the virtual world. First, the virtual conference space design system provides a virtual conference space without restrictions on the size of the space when users access the video conferencing. In this context, a realistic VR video conferencing content authoring tool support system is a VR-based system for helping users use a meta-bus by supporting a series of authoring tools such as laser pointers and avatars, and for executing the applications necessary for meetings and presentations. A real-time voice chat system analyzes a person's voice pattern using a voice recognition software development kit (SDK) and provides a real-time voice translation function.
Figure 2 shows the configuration diagram of the metaverse VR-based bidirectional video conferencing platform.
In this study, the design environment was divided into hardware for manipulating the content and software for developing it. A head-mounted display (HMD), i.e., an image display device worn on the head, used Oculus Quest 2 and a VR controller for interaction equipment. It used Unity and 3D Max software to produce avatar and video conferencing platforms. The software development was performed in a PC environment with a Ryzen 7-5800H CPU, RTX 3060 GPU, and 32 GB of RAM.
Table 1 describes the design environment of the metaverse VR-based video conferencing platform.
Table 1 . Design environment of Metaverse virtual reality (VR)-based video conferencing platform.
Classification | Type | Name |
---|---|---|
Hardware | VR head-mounted display (HMD) | Oculus Quest 2 |
Interaction Tools | VR Controller | |
PC Environment | Ryzen 7-5800H (CPU) | |
RTX 3060 (GPU) | ||
32 GB (RAM) | ||
1 TB (STORAGE) | ||
Software | Video Conferencing | Unity (2020.330f1 ver.) |
Platform Feature | ||
Development Tools | ||
Metaverse Management | Photon Server | |
Server | ||
Avatar Design Tools | 3D Max, N.CAD |
The virtual conference space design system supports remote procedure call (RPC) communication with other users through the hardware and Photon server required for program execution and operation. The main body, VR HMD, and controller hardware use Facebook’s Oculus Quest 2 to develop the platform’s functionality with the Unity game engine. The Photon server allows for RPC communication with other users and produces avatars using voxel design techniques.
The Photon server is a Unity package for helping different objects synchronize over a network. When a client connects to the network, it first connects to a name server. The name server performs authentication processing on the application and server, and moves to the master server. The master server checks the information of all rooms before entering the meeting and selects the meeting room the user desires to enter. When the user enters the selected meeting room, they are moved to the video conference server designed to participate in the meeting. On the video conference server, the user can participate in the conference by using an authoring tool supported by the platform.
Figure 3 shows the operation structure of the Photon server.
The RPC is a communication method that allows a procedure to be called to a remote server without directly implementing a connection part for the client-server communication. An interface definition language is used to define the call conventions between clients and servers and to convert the parameters used for procedure calls through a stub. Because of the parameter conversion via the stub, this approach can be used to call a local function on a remote server when calling a procedure on the remote server.
The client’s RPC communication consists of executing remote procedures, outputting parameters, and generating return values. The client passes parameters to the client stub. The parameters from the client stub are converted. The client invokes the RPC client runtime library function and requests it based on the parameters converted by the server. The server RPC runtime library function accepts the request and invokes a server stub procedure. The server stub searches the network buffer for parameters and converts them from the network transport format to the format required by the server. The server stub calls the actual procedure from the server. The server completes the communication by returning the value to the client. The process for returning a value from the server to the client is similar to that for the previous process.
Figure 4 shows the RPC operation process.
Voxels are representations of two-dimensional pixels as three-dimensional cubes. When designing a voxel, several cubes are used to express objects. Designing in this way consumes less data and the 3D model can feel like pixel art; this makes it a design style suitable for VR design, where performance is the key. In addition, it has the advantage of being able to implement detailed textures and freely deform compared to polygons. We designed a virtual video conferencing space and optimized the operational performance of the data by utilizing a low-poly design.
Figure 5 represents one of the voxel designs.
A person who accesses the metaverse video conferencing platform while wearing an HMD and controller is connected to the virtual conferencing space provided by the platform. The virtual conference space can be built by directly modeling video conference participants. When a user customizes the virtual conference space and starts communicating with others, the support for the real-time voice chat system and realistic VR video conferencing content authoring tool support system begins. A realistic VR video conferencing content authoring tool support system provides functions such as brainstorming, presentation, team planning, product review, collaboration whiteboards, avatars, and interactions.
Figure 6 shows the process of using the virtual conferencing platform.
Voice recognition is a technology for recognizing a speaker’s voice and converting it into text. Automatic translation is a technology for converting voice recognition content into a language used by another party. The real-time voice chat system recognizes another person's voice during video conferencing and automatically translates it into other languages through natural language processing (NLP). NLP is a branch of artificial intelligence that deals with human-machine interactions, and can provide convenience and high immersion in virtual conferencing spaces by interpreting the soft languages used in everyday communication and converting them into other languages.
After the speech recognition process within Unity is performed, the real-time speech translation system allows the voices of other participants to be translated into the language used by the participants themselves for verification. When a videoconferencing participant has a conversation through the Photon Voice SDK in a metaverse virtual conference room, the Watson Speech-to-Text application program interface (API) recognizes the voice and converts it into textual information. The translated text performs automatic translation via the Watson Language Translator and applies a singletone pattern to Unity for memory leakage prevention and smooth data sharing between the different classes. The translated data are transmitted through the server to other clients, completing the system.
Figure 7 shows the real-time voice translation process.
The video conferencing platform designed in this study was tested using the Unity Tool. The test environment consisted of the user A avatar, user B avatar, screen where the other party's voice chat contents were displayed, and screen for document display and object manipulation.
Figure 8 shows the test environment.
The Oculus Quest 2 controller was used to manipulate objects, load documents, and/or activate voice chat functions. When the object recognition button on the controller was pressed, a raycast was output, which served to read information about the conflicting object. If the raycast did not collide with the object, it appeared as a red line; if it did, it appeared as a sky-blue line. Depending on whether the object with which the raycast conflicted was a UI object or GameObject, the response button was clicked to perform various interactions.
Figure 9 shows an example of object manipulation using the Oculus Quest2 controller.
When user A outputs a presentation using the controller and proceeds with the presentation, the screen viewed by user B is as shown in Fig. 10. User B can check the information, presentation materials, and memos from converting user A’s voice into text.
In this study, we designed and implemented of a metaverse VR-based video conferencing platform by dividing it into a virtual conferencing space design system, realistic VR video conferencing content authoring tool support system, and real-time voice chat system. The virtual conference space was designed by dividing it into the HMD, controller hardware, Unity engine, and Photon server software. Avatars to be active in the VR adopted a voxel design, and explained the RPC communication process to other users. The RPC could call procedures faster by helping clients and servers connect automatically. In the realistic VR video conference content authoring tool support system, various interactions, as well as the data sharing and collaboration used in existing online video conferences, were supported to improve the immersion. In the real-time voice translation system, voice recognition was performed using the Photon Voice SDK supported by Unity, and automatic translation was performed using the Watson Speech-to-Text API.
The proposed platform will provide experiences similar to real offline conferencing, thereby enhancing interaction and work efficiency among videoconferencing participants. Smooth communication is also possible when conducting business with conference participants of various nationalities. In addition, it is expected that the space can be used in various concepts, such as rest and travel, and not just as a working space.
This research was supported by BB21plus and funded by the Busan Metropolitan City and Busan Institute for Talent and Lifelong Education (BIT).
Table 1 . Design environment of Metaverse virtual reality (VR)-based video conferencing platform.
Classification | Type | Name |
---|---|---|
Hardware | VR head-mounted display (HMD) | Oculus Quest 2 |
Interaction Tools | VR Controller | |
PC Environment | Ryzen 7-5800H (CPU) | |
RTX 3060 (GPU) | ||
32 GB (RAM) | ||
1 TB (STORAGE) | ||
Software | Video Conferencing | Unity (2020.330f1 ver.) |
Platform Feature | ||
Development Tools | ||
Metaverse Management | Photon Server | |
Server | ||
Avatar Design Tools | 3D Max, N.CAD |
Amsuk Oh*, Member, KIICE
Journal of information and communication convergence engineering 2023; 21(3): 239-243 https://doi.org/10.56977/jicce.2023.21.3.239Kim, Kyung-Deok;
The Korea Institute of Information and Commucation Engineering 2003; 1(2): 74-81 https://doi.org/10.7853/.2003.1.2.74