Journal of information and communication convergence engineering 2024; 22(4): 322-329
Published online December 31, 2024
https://doi.org/10.56977/jicce.2024.22.4.322
© Korea Institute of Information and Communication Engineering
Correspondence to : Sang-Seok Yun (E-mail: ssyun@silla.ac.kr)
Department of Aeronautical & Mechanical Engineering, Silla University, Busan 46958, Republic of Korea
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Robot backchanneling is implemented using verbal expressions, such as reiterating after remembering the response of the user or chiming in with the user. We performed experiments to investigate how the backchannel behavior of the robot affects the user's perception of the robot's empathy and attentiveness toward their reported pain and sleep concerns. The results indicate that users prefer the robot that provides backchannel and perceive it as more intelligent. Where the robot backchanneled, the number of conversational turns between the user and the robot decreased compared to situations where the robot did not backchannel. The refusal rate of the robot repeating the same question to the user decreased. When a robot is conducting a medical questionnaire and offers backchannels to the user, the overall efficiency of the conversation improves. Users also perceive the robot as understanding and empathetic towards their pain and discomfort, leading to a positive evaluation of the interaction.
Keywords Human-robot interaction, Backchannel, Chatbot, Medical questionnaire
Patients expect their doctors to be attentive, warm, and understand them [1]. In the medical environment, empathy is an expression of understanding of the emotions and experiences of patients, and patients who feel that they are being empathized with by medical staff are more likely to provide detailed explanations of their symptoms [2]. Furthermore, the empathy, support, and warmth of the doctor toward the patient can encourage the patient to trust the doctor, and this positive relationship can reduce the patient’s stress level and improve treatment outcomes [1,3].
Robots have been introduced to healthcare and medical environments in numerous instances. In Ahn et al. [4], the elderly used a kiosk-type healthcare robot system to check their health status. The system stored and managed the health information of the elderly on a server. The information was transmitted to clinicians or used to provide health-related information to the elderly. Furthermore, research has been consistently conducted on the utility of daily healthcare, such as medication reminders and the provision of health information using social robots [5,6]. Conducting medical questionnaires using a chatbot system is one way robots are applied in medical environments. The medical-questionnaire chatbot system extracts the user's symptoms from system-led conversations and diagnoses based on them, simplifying the diagnosis process. It has the advantage that the medical team can diagnose more efficiently using the collected information, thus it is widely commercialized [7,8]. Owing to its advances, artificialintelligence (AI) technology can rapidly discover diseases and offer proposals for their treatment and management [9]. Existing medical-questionnaire chatbot systems can indicate the expected disease from the user's symptoms or connect users with hospitals. However, most are limited to one-time interactions, thus sustained interaction between the chatbot system and the user is not established [10,11].
Backchannel, a listener's response during a speaker’s turn, serves as a social cue that indicates attentiveness and engagement and includes non-verbal expressions (e.g., facial expressions, nodding) or verbal expressions (e.g., “yes,” “uhhuh”) [12]. Jang et al. [13] developed a model that predicts the appropriate backchannel using lexical information from user utterances, enabling the model to simulate human-like backchannel responses more effectively. However, their study primarily focused on model development, rather than addressing whether the backchannels effectively conveyed empathy in real interactions with users. In contrast, our study experimentally tested the impact of backchannels on conveying empathy through real interactions between users and robots.
Our goal is to investigate whether the medical-questionnaire robot is recognized as a user-friendly system that facilitates interaction and also understands and sympathizes with the pain and discomfort of users, beyond one-time engagements. To implement this, the robot asks about the user's pain and sleep state, responding with compassionate and empathetic statements in the form of backchannels that are integrated into the conversation system of the medical-questionnaire robot [14]. This approach enables comparative evaluation of user perceptions of the robot in cases with and without backchannel, verifying the effectiveness of backchannel as a tool to convey empathy.
Robot empathy in the field of Human-Robot Interaction (HRI) refers to the ability of a robot to understand and respond to the experiences, thoughts, and emotions of humans [15,16]. According to Pepito et al. [17], empathy is necessary for human-machine interaction to continue progressing. The strategy for developing intelligent humanoid robots is to comprehend how people feel and think and to react accordingly. As such, the emotional and cognitive responses of robots play a significant role in the interaction between robots and humans, as well as in the establishment of rapport [18]. To achieve this, many studies have considered applying backchannels to robots. Applying backchannels to robots is an interactive strategy that enhances the naturalness of robot behavior and effectively conveys empathy when interacting with humans [19]. ERIKA, an interactive Android robot, created a backchannel in the form of fillers and laughter to build more human-like conversations [20]. Blomsma et al. [21] found that incorporating backchannel behaviors into AI agents facilitates the creation of more natural interactions. They also reported that the timing of backchannels, the degree of head movement, and the tone of voice influenced how users perceived the personality of the AI agent.
Meanwhile, Li et al. [22] noted that voice-based systems that use the question-answer format have difficulties in creating natural interactions because they do not recognize the emotions and context of the users. Therefore, backchannels can be provided either after or during the utterance of the other person; however, it is important to consider the emotions and context of users when delivering them [23,24,25]. In a study by Cho et al. [26], participants perceived the smart speaker Alexa as a more attentive listener when backchannels were provided. However, the participants felt that the fixed-interval backchannels often disrupted the flow of conversation. They expressed a desire for more empathetic and contextually appropriate responses. Similarly, on interactions between children and robots, Park et al. [27] found that children preferred robots that provided backchannels at the right time. Our study aims to enhance empathetic interactions by providing backchannels after the user’s utterance, considering the emotions and intent of the user. This approach minimizes the disruption caused by fixed-timing backchannels and leads to more natural and empathetic interactions, as demonstrated in our experimental validation.
Backchannels are also used in collaborative activities between humans and robots, helping humans perceive the robot as a member of the team. The group that experienced the robot’s backchannel (e.g., “Okay”, “Good idea!”) psychologically felt that the robot was safe and showed a tendency to include the robot as a member of the team [28]. Additionally, Złotowski et al. [29] reported that when robots used filler words or head movements as backchannels to express agreement or disagreement with the user, users perceived that the robot understood their message, which helped reduce their anxiety.
In medical environments, backchannels play a significant role in enhancing empathetic interactions between patients and robots. Trost et al. [30] demonstrated that highly empathetic robots could reduce pain and distress for pediatric patients by adjusting their backchannels —through changes in LED colors and verbal expressions— based on the fear and pain levels expressed by the patient. Ding et al. [31] explored the use of backchannels by a conversational agent in the cognitive assessment of elderly patients. Although their study did not find significant differences between conditions with and without backchannels, it helped validate the usefulness of backchannels in such systems, contributing to future research on enhancing natural interactions in medical settings. In this study, we aim to experimentally demonstrate that medical-questionnaire robots can more effectively convey empathy by providing backchannels at the appropriate timing after the user’s utterance. Building upon the existing research findings, the medical-questionnaire robot in our study provides backchannels after understanding the user's utterance, empathizes with their pain and discomfort, and signals the start of the next question.
The robot questionnaire consists of questions about the pain and sleep conditions of the user, as shown in Table 1. The pain-related questions are mainly used when questioning patients at the Department of Neurosurgery in Pusan National University Hospital and consist of pain position, pain aspect, pain-aggravating factor, pain severity, and pain duration [32]. The sleep-related questions were based on the Pittsburgh Sleep Quality Index and included questions about bedtime, sleep latency, daytime dysfunction, sleep duration, and sleep disturbance [33,34].
Table 1 . List of pain and sleep questionnaires
Category | Concept | Questions |
---|---|---|
Pain | Position | Q1. Where is the most painful area right now? |
Aspect | Q2. How do you feel the pain in the area? | |
Aggravating factor | Q3. When does the pain get worse? | |
Q4. What posture hurts the most? | ||
Severity | Q5. Please rate how severe the pain is on a scale of 0 to 10. | |
Timing | Q6. How long has the pain been getting worse? | |
Sleep | Bedtime | Q7. What time do you usually go to bed? |
Latency | Q8. How long does it take you to fall asleep after going to bed? | |
Daytime dysfunction | Q9. How often does your sleep disturbance interfere with your daily activities? | |
Duration | Q10. How many hours have you slept? | |
Disturbance | Q11. Do you sleep well without waking up in the middle of the night? |
Fig. 1 illustrates the overall procedure for providing backchannel responses in the medical questionnaire system. The robot begins by capturing the user’s response via a speechto-text module, converting the spoken input into text. The transcribed text then undergoes natural language processing to extract essential details related to the pain and sleep attributes of the user. This information is identified by analyzing key nouns, time expressions, and other contextually relevant terms within the response of the user. Each term is matched against a predefined dictionary, which includes body parts, descriptive terms, and factors related to pain and sleep. This allows the system to accurately categorize and interpret the specifics of the pain or sleep attributes described by the user. After extraction, this information is securely stored in a database and serves as the foundation for generating contextually relevant backchannel responses. Based on these pain and sleep attributes, the system then generates verbal backchannels that align with the specific symptoms of the user. For instance, if the pain is identified in the waist, the robot might respond, “You are uncomfortable in your waist.” Similarly, if the user indicates difficulty with sleep latency, the robot might respond with, “It seems like falling asleep has been challenging.”
We employed verbal expressions as part of the backchannel of the robot, including techniques like reiterating the responses of the user to engage in interactive conversation. We performed two experiments —one where users were provided with a backchannel and another where they were not— while interacting with the medical-questionnaire robot. These experiments aimed to assess how the backchannel behavior of the robot during discussions of pain and discomfort influences user perceptions. In both experiments, the same robot conducted the same questionnaire at the same place; they differed only in the presence/absence of backchannel. Moreover, to minimize the impact of the previous experiment on the next one, we conducted them at intervals of one week. After each experiment, participants were asked to respond to a questionnaire about their impressions of the robot.
The backchannel was provided not only to empathize with the pain and discomfort of the user but also to recognize when the utterance of the user ended and signal the start of a new question. To analyze the experimental results, we used the Godspeed questionnaire, which assesses the perception of robots, and the Paradise framework, which evaluates spoken dialogue systems. Godspeed is a questionnaire developed by Bartneck et al. [35], to measure the user’s perception of service robots, and consists of five concepts: animacy, anthropomorphism, likability, perceived intelligence, and perceived safety. In this study, it was used to measure users’ perceptions of a medical-questionnaire robot. The contents of the questionnaire are shown in Table 2. The questionnaire responses were assigned points using the 5-point Likert scale, depending on which of the two options (e.g., ‘Dead’ or ‘Alive’) felt closer to the respondent’s perception. The degree of animacy that a user perceives in a robot can be used as a measure of how lifelike the robot seems to be. Anthropomorphism refers to the extent to which a robot is designed or perceived to have human-like characteristics or behaviors. Likability refers to the degree to which a robot is perceived as friendly and engaging. It is a measure of how much participants like the robot. Perceived Intelligence measures the user’s perception of how much the robot is thought to be capable of understanding, adaptation, and knowledge. Lastly, perceived safety measures the user’s perception of danger and comfort while interacting with the robot. After the experiment, we measured the internal consistency of the responses using Cronbach’s Alpha scale in the Godspeed questionnaire [36]. Only the items with a Cronbach’s Alpha of 0.7 or higher, in animacy, likability, and perceived intelligence (Cronbach’s Alpha = 0.89), were analyzed in the survey response results.
Table 2 . Godspeed questionnaire list
Concepts | Questions |
---|---|
Animacy | Q1. Dead or Alive |
Q2. Stagnant or Lively | |
Q3. Mechanical or Organic | |
Q4. Artificial or Likelike | |
Q5. Inert or Interactive | |
Q6. Apathetic or Responsive | |
Likability | Q7. Dislike or Like |
Q8. Unfriendly of Friendly | |
Q9. Unkind or Kind | |
Q10. Unpleasant or Pleasant | |
Q11. Awful or Nice | |
Perceived Intelligence | Q12. Incompetent or Competent |
Q13. Ignorant or Knowledgeable | |
Q14. Irresponsible or Responsible | |
Q15. Unintelligent or Intelligent | |
Q16. Foolish or Sensible |
The Paradise (Paradigm for Dialogue System Evaluation) framework, presented by Walker et al., is a framework for evaluating spoken dialogue agents [37]. The dimensional elements of the Paradise framework consist of dialogue efficiency, dialogue quality, task success, and system usability, which were used to compare whether the dimensional elements of the Paradise framework are related to user perception. Dialogue efficiency measures the efficiency of dialogues from the number of user and system turns by automatically recording all dialogues between the user and the system. The rejection item of dialogue quality means the number of times the system requests a user to repeat their statement owing to unrecognition and processing errors, and the misrecognition item means the number of misrecognitions when comparing the voice recognition results of the system with the actual recorded speech of the user [38]. Task success is an item that indicates whether the user has successfully achieved the task goal. The spoken dialogue system of this experiment reiterates the question until the system completes the questionnaire normally; therefore, task success is always a success from the point of view of the system. Thus, task success was excluded from the measurement item [39]. The user satisfaction item, which is the result of the user’s questionnaire response to the performance of the spoken dialogue system, was also excluded [8]. The primary objective of this experiment was to investigate whether the measurements of dialogue efficiency and dialogue quality influence user perception when comparing scenarios with and without backchannel communication. The omission of the user satisfaction item was a deliberate choice, as the goal of the experiment was not to derive the performance function related to the spoken dialogue performance of the system from the questionnaire results.
The experimental setting is as shown in Fig. 2. The medical-questionnaire robot conducts the questionnaire with the participants using the content in Table 1. In the first experiment, with 25 participants (21 males, 4 females; mean age M = 34.0, SD = 5.62), the medical-questionnaire robot did not provide backchannel. In the second experiment (a week later), with 24 participants (21 males, 3 females; mean age M = 34.0, SD = 5.78), the robot provided backchannel. After each experiment, we sent out a Google Form survey link to the participants to collect their responses.
Fig. 3 shows the comparison of Godspeed scores between the group who interacted with the questionnaire robot with backchannel and the group who interacted with the questionnaire robot without backchannel.
There was no difference between the two groups in animacy (nonbackchannel group mean M = 3.43, SD = 1.04 vs. backchannel group mean M = 3.43, SD = 1.20; t(292) = −0.03; p = 0.97, two-tailed). The group that experienced backchannel showed significantly higher likability than the group that did not experience backchannel (nonbackchannel group mean M = 3.85, SD = 0.92 vs. backchannel group mean M = 4.27, SD = 0.53; t(231) = −3.85; p = 0.0002, two-tailed). In perceived intelligence, the group that experienced backchannel scored significantly higher than the group that did not (nonbackchannel group mean M = 3.60, SD = 0.73 vs. backchannel group mean M = 3.83, SD = 0.52; t(243) = −2.23; p = 0.03, two-tailed).
Finally, to investigate whether backchannel affected the performance of conversations between real users and robots, turns elements were analyzed in terms of dialogue efficiency using the Paradise framework and rejection and misrecognition elements in terms of dialogue quality.
Fig. 4 shows the comparison of Paradise framework analysis results between the group that experienced backchannel and the group that did not. The number of turns exchanged between the user and the robot (nonbackchannel group mean M = 29.04, SD = 6.75 vs. backchannel group mean M = 25.67, SD = 4.28; t(41) = 2.10; p = 0.04, two-tailed), the rate of rejection (nonbackchannel group mean M = 10.74, SD = 7.23 vs. backchannel group mean M = 5.81, SD = 5.61; t(47) = 2.66; p = 0.01, two-tailed) and the rate of misrecognition (nonbackchannel group mean M = 10.16, SD = 9.33 vs. backchannel group mean M = 4.81, SD = 4.09; t(33) = 2.62; p = 0.01, two-tailed) that occurred in the dialogue between the user and the robot showed significant differences.
When comparing the cases where the medical-questionnaire robot provided backchannel and where it did not, it can be observed that the robot received significantly higher average scores in likability and perceived intelligence in Godspeed when it provided backchannel to users. The personalized backchannel responses, tailored to the user’s specific symptoms and issues, likely reinforced a sense of being understood. This personalization made users feel that the robot was genuinely attentive to their responses, making the robot appear more likable, resulting in higher likability scores. Additionally, because the robot was perceived to understand and remember their responses, users rated it higher in perceived intelligence, seeing it as an intelligent and trustworthy entity. This is consistent with Leite et al. [40] and Pelau et al. [41] who found that the empathetic and intelligent behavior of robots is one of the factors that can make users perceive robots as more friendly. When doctors express interest and empathy, their patients perceive them as more comforting and trustworthy, similar to the findings of Mitchell et al. [42]. In contrast, Godspeed’s animacy score received a lower score compared to likability and perceived intelligence. This is because the robot used in the experiment had slight movements and was fixed in one place, without engaging in significant activity. Also, the difference in user perception based on the provision of backchannel was not significant, which can be attributed to the fact that the appearance and behavior of the robot used in the two experiments, as well as the experimental setting, were the same, therefore, users may not have felt any difference in the activities of the robot.
We used the Paradise framework to analyze whether backchannel affects the user-robot conversation performance in terms of dialogue efficiency and dialogue quality. In situations where the robot provided backchannels to the user, it can be observed that the number of dialogue turns between the user and the robot was lower than when the robot did not provide backchannels. If the robot proceeds directly to the next question without providing a backchannel to the user’s response, users may not be ready to listen to the robot’s new question because they cannot predict when the robot’s next question will start. For this reason, users increase the number of dialogue turns by failing to understand the robot’s questions, asking the robot to repeat. This suggests that backchannel serves to inform the user that their voice response has been acknowledged, allowing them to anticipate the next question [18,19,43,44]. This is consistent with prior research [45,46], which has shown that backchannels play a crucial role in establishing grounding in conversation and enhancing user experience by maintaining the natural flow of interaction. Backchannels facilitate smoother communication, increasing the efficiency of interaction and contributing to a more positive user experience.
Therefore, it can be inferred that the interaction in which the medical-questionnaire robot remembers the user’s response and understands and empathizes with the patient’s pain and discomfort had a greater impact on the user's positive perception of the robot. Such interaction also influences the efficiency of the conversation.
In this study, we investigated the impact of using backchannel by a robot to convey empathy and understanding while administering a medical questionnaire. The findings revealed that when the robot expressed empathetic responses, particularly in situations involving user pain and sleeprelated discomfort, users perceived the robot as more appealing and intelligent. These results underscore the significant impact of the empathetic capabilities of robots within medical questionnaire dialogue systems on overall user satisfaction.
This study highlights the crucial role of empathy in human-robot interactions. It suggests that with advances in technology related to medical knowledge management and conversation systems, we can foster positive and enduring interactions between patients and robots. Such interactions can facilitate continuous monitoring and management of patient conditions, offering promising prospects for healthcare applications.
This work was supported in part by the National Research Foundation (NRF) grant funded by the government of the Republic of Korea (MSIT) (No. NRF-2021R1F1A1063669) and the Technology Innovation Program (No. 20015052) funded by the Ministry of Trade, Industry & Energy (MOTIE, Republic of Korea).
Da-Young Kim
received her B.S. in Mechanical Engineering from Silla University, Busan, Republic of Korea and is currently pursuing an M.S. in Information Convergence Engineering at Pusan National University, Busan, Republic of Korea. She is currently a researcher at the Human-Robot Interaction Center in the Korea Institute of Robotics & Technology Convergence (KIRO), Pohang, Republic of Korea. Her research interests include human–robot interaction, artificial intelligence, and deep learning.
Min-Gyu Kim
received his Ph.D. in Engineering from the University of Tsukuba, Tsukuba, Japan, in 2012. He conducted postdoctoral research at the Interaction Science Research Institute of Sungkyunkwan University, Suwon, Republic of Korea, in 2013, and at the Department of Industrial Design at Eindhoven University of Technology, Eindhoven, Netherlands, in 2014. Since 2015, he has been affiliated with the Korea Institute of Robotics & Technology Convergence (KIRO), Pohang, Republic of Korea, where he is currently a senior researcher and the head of the Human–Robot Interaction Center. His research interests include human–robot interaction, data analytics, and artificial intelligence.
Sang-Seok Yun
received the B.S. in Mechanical Engineering from Inje University in 2002, and his M.S. in Mechatronics Engineering from Gwangju Institute of Science and Technology (GIST), Republic of Korea, in 2005. He received his Ph.D. in Mechanical Engineering from Korea University in 2013. He was a researcher at the Korea Institute of Science and Technology (KIST) from 2005 to 2017. He is currently an associate professor at the Silla University, Republic of Korea. His current research interests include smart healthcare, human–robot interaction, location-based service, and socially assistive robots.
Journal of information and communication convergence engineering 2024; 22(4): 322-329
Published online December 31, 2024 https://doi.org/10.56977/jicce.2024.22.4.322
Copyright © Korea Institute of Information and Communication Engineering.
Da-Young Kim 1, Min-Gyu Kim
1, and Sang-Seok Yun2*
, Member, KICCE
1Human-Robot Interaction Center, Korea Institute of Robotics & Technology Convergence (KIRO), Pohang 37553, Republic of Korea
2Department of Aeronautical & Mechanical Engineering, Silla University, Busan 46958, Republic of Korea
Correspondence to:Sang-Seok Yun (E-mail: ssyun@silla.ac.kr)
Department of Aeronautical & Mechanical Engineering, Silla University, Busan 46958, Republic of Korea
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Robot backchanneling is implemented using verbal expressions, such as reiterating after remembering the response of the user or chiming in with the user. We performed experiments to investigate how the backchannel behavior of the robot affects the user's perception of the robot's empathy and attentiveness toward their reported pain and sleep concerns. The results indicate that users prefer the robot that provides backchannel and perceive it as more intelligent. Where the robot backchanneled, the number of conversational turns between the user and the robot decreased compared to situations where the robot did not backchannel. The refusal rate of the robot repeating the same question to the user decreased. When a robot is conducting a medical questionnaire and offers backchannels to the user, the overall efficiency of the conversation improves. Users also perceive the robot as understanding and empathetic towards their pain and discomfort, leading to a positive evaluation of the interaction.
Keywords: Human-robot interaction, Backchannel, Chatbot, Medical questionnaire
Patients expect their doctors to be attentive, warm, and understand them [1]. In the medical environment, empathy is an expression of understanding of the emotions and experiences of patients, and patients who feel that they are being empathized with by medical staff are more likely to provide detailed explanations of their symptoms [2]. Furthermore, the empathy, support, and warmth of the doctor toward the patient can encourage the patient to trust the doctor, and this positive relationship can reduce the patient’s stress level and improve treatment outcomes [1,3].
Robots have been introduced to healthcare and medical environments in numerous instances. In Ahn et al. [4], the elderly used a kiosk-type healthcare robot system to check their health status. The system stored and managed the health information of the elderly on a server. The information was transmitted to clinicians or used to provide health-related information to the elderly. Furthermore, research has been consistently conducted on the utility of daily healthcare, such as medication reminders and the provision of health information using social robots [5,6]. Conducting medical questionnaires using a chatbot system is one way robots are applied in medical environments. The medical-questionnaire chatbot system extracts the user's symptoms from system-led conversations and diagnoses based on them, simplifying the diagnosis process. It has the advantage that the medical team can diagnose more efficiently using the collected information, thus it is widely commercialized [7,8]. Owing to its advances, artificialintelligence (AI) technology can rapidly discover diseases and offer proposals for their treatment and management [9]. Existing medical-questionnaire chatbot systems can indicate the expected disease from the user's symptoms or connect users with hospitals. However, most are limited to one-time interactions, thus sustained interaction between the chatbot system and the user is not established [10,11].
Backchannel, a listener's response during a speaker’s turn, serves as a social cue that indicates attentiveness and engagement and includes non-verbal expressions (e.g., facial expressions, nodding) or verbal expressions (e.g., “yes,” “uhhuh”) [12]. Jang et al. [13] developed a model that predicts the appropriate backchannel using lexical information from user utterances, enabling the model to simulate human-like backchannel responses more effectively. However, their study primarily focused on model development, rather than addressing whether the backchannels effectively conveyed empathy in real interactions with users. In contrast, our study experimentally tested the impact of backchannels on conveying empathy through real interactions between users and robots.
Our goal is to investigate whether the medical-questionnaire robot is recognized as a user-friendly system that facilitates interaction and also understands and sympathizes with the pain and discomfort of users, beyond one-time engagements. To implement this, the robot asks about the user's pain and sleep state, responding with compassionate and empathetic statements in the form of backchannels that are integrated into the conversation system of the medical-questionnaire robot [14]. This approach enables comparative evaluation of user perceptions of the robot in cases with and without backchannel, verifying the effectiveness of backchannel as a tool to convey empathy.
Robot empathy in the field of Human-Robot Interaction (HRI) refers to the ability of a robot to understand and respond to the experiences, thoughts, and emotions of humans [15,16]. According to Pepito et al. [17], empathy is necessary for human-machine interaction to continue progressing. The strategy for developing intelligent humanoid robots is to comprehend how people feel and think and to react accordingly. As such, the emotional and cognitive responses of robots play a significant role in the interaction between robots and humans, as well as in the establishment of rapport [18]. To achieve this, many studies have considered applying backchannels to robots. Applying backchannels to robots is an interactive strategy that enhances the naturalness of robot behavior and effectively conveys empathy when interacting with humans [19]. ERIKA, an interactive Android robot, created a backchannel in the form of fillers and laughter to build more human-like conversations [20]. Blomsma et al. [21] found that incorporating backchannel behaviors into AI agents facilitates the creation of more natural interactions. They also reported that the timing of backchannels, the degree of head movement, and the tone of voice influenced how users perceived the personality of the AI agent.
Meanwhile, Li et al. [22] noted that voice-based systems that use the question-answer format have difficulties in creating natural interactions because they do not recognize the emotions and context of the users. Therefore, backchannels can be provided either after or during the utterance of the other person; however, it is important to consider the emotions and context of users when delivering them [23,24,25]. In a study by Cho et al. [26], participants perceived the smart speaker Alexa as a more attentive listener when backchannels were provided. However, the participants felt that the fixed-interval backchannels often disrupted the flow of conversation. They expressed a desire for more empathetic and contextually appropriate responses. Similarly, on interactions between children and robots, Park et al. [27] found that children preferred robots that provided backchannels at the right time. Our study aims to enhance empathetic interactions by providing backchannels after the user’s utterance, considering the emotions and intent of the user. This approach minimizes the disruption caused by fixed-timing backchannels and leads to more natural and empathetic interactions, as demonstrated in our experimental validation.
Backchannels are also used in collaborative activities between humans and robots, helping humans perceive the robot as a member of the team. The group that experienced the robot’s backchannel (e.g., “Okay”, “Good idea!”) psychologically felt that the robot was safe and showed a tendency to include the robot as a member of the team [28]. Additionally, Złotowski et al. [29] reported that when robots used filler words or head movements as backchannels to express agreement or disagreement with the user, users perceived that the robot understood their message, which helped reduce their anxiety.
In medical environments, backchannels play a significant role in enhancing empathetic interactions between patients and robots. Trost et al. [30] demonstrated that highly empathetic robots could reduce pain and distress for pediatric patients by adjusting their backchannels —through changes in LED colors and verbal expressions— based on the fear and pain levels expressed by the patient. Ding et al. [31] explored the use of backchannels by a conversational agent in the cognitive assessment of elderly patients. Although their study did not find significant differences between conditions with and without backchannels, it helped validate the usefulness of backchannels in such systems, contributing to future research on enhancing natural interactions in medical settings. In this study, we aim to experimentally demonstrate that medical-questionnaire robots can more effectively convey empathy by providing backchannels at the appropriate timing after the user’s utterance. Building upon the existing research findings, the medical-questionnaire robot in our study provides backchannels after understanding the user's utterance, empathizes with their pain and discomfort, and signals the start of the next question.
The robot questionnaire consists of questions about the pain and sleep conditions of the user, as shown in Table 1. The pain-related questions are mainly used when questioning patients at the Department of Neurosurgery in Pusan National University Hospital and consist of pain position, pain aspect, pain-aggravating factor, pain severity, and pain duration [32]. The sleep-related questions were based on the Pittsburgh Sleep Quality Index and included questions about bedtime, sleep latency, daytime dysfunction, sleep duration, and sleep disturbance [33,34].
Table 1 . List of pain and sleep questionnaires.
Category | Concept | Questions |
---|---|---|
Pain | Position | Q1. Where is the most painful area right now? |
Aspect | Q2. How do you feel the pain in the area? | |
Aggravating factor | Q3. When does the pain get worse? | |
Q4. What posture hurts the most? | ||
Severity | Q5. Please rate how severe the pain is on a scale of 0 to 10. | |
Timing | Q6. How long has the pain been getting worse? | |
Sleep | Bedtime | Q7. What time do you usually go to bed? |
Latency | Q8. How long does it take you to fall asleep after going to bed? | |
Daytime dysfunction | Q9. How often does your sleep disturbance interfere with your daily activities? | |
Duration | Q10. How many hours have you slept? | |
Disturbance | Q11. Do you sleep well without waking up in the middle of the night? |
Fig. 1 illustrates the overall procedure for providing backchannel responses in the medical questionnaire system. The robot begins by capturing the user’s response via a speechto-text module, converting the spoken input into text. The transcribed text then undergoes natural language processing to extract essential details related to the pain and sleep attributes of the user. This information is identified by analyzing key nouns, time expressions, and other contextually relevant terms within the response of the user. Each term is matched against a predefined dictionary, which includes body parts, descriptive terms, and factors related to pain and sleep. This allows the system to accurately categorize and interpret the specifics of the pain or sleep attributes described by the user. After extraction, this information is securely stored in a database and serves as the foundation for generating contextually relevant backchannel responses. Based on these pain and sleep attributes, the system then generates verbal backchannels that align with the specific symptoms of the user. For instance, if the pain is identified in the waist, the robot might respond, “You are uncomfortable in your waist.” Similarly, if the user indicates difficulty with sleep latency, the robot might respond with, “It seems like falling asleep has been challenging.”
We employed verbal expressions as part of the backchannel of the robot, including techniques like reiterating the responses of the user to engage in interactive conversation. We performed two experiments —one where users were provided with a backchannel and another where they were not— while interacting with the medical-questionnaire robot. These experiments aimed to assess how the backchannel behavior of the robot during discussions of pain and discomfort influences user perceptions. In both experiments, the same robot conducted the same questionnaire at the same place; they differed only in the presence/absence of backchannel. Moreover, to minimize the impact of the previous experiment on the next one, we conducted them at intervals of one week. After each experiment, participants were asked to respond to a questionnaire about their impressions of the robot.
The backchannel was provided not only to empathize with the pain and discomfort of the user but also to recognize when the utterance of the user ended and signal the start of a new question. To analyze the experimental results, we used the Godspeed questionnaire, which assesses the perception of robots, and the Paradise framework, which evaluates spoken dialogue systems. Godspeed is a questionnaire developed by Bartneck et al. [35], to measure the user’s perception of service robots, and consists of five concepts: animacy, anthropomorphism, likability, perceived intelligence, and perceived safety. In this study, it was used to measure users’ perceptions of a medical-questionnaire robot. The contents of the questionnaire are shown in Table 2. The questionnaire responses were assigned points using the 5-point Likert scale, depending on which of the two options (e.g., ‘Dead’ or ‘Alive’) felt closer to the respondent’s perception. The degree of animacy that a user perceives in a robot can be used as a measure of how lifelike the robot seems to be. Anthropomorphism refers to the extent to which a robot is designed or perceived to have human-like characteristics or behaviors. Likability refers to the degree to which a robot is perceived as friendly and engaging. It is a measure of how much participants like the robot. Perceived Intelligence measures the user’s perception of how much the robot is thought to be capable of understanding, adaptation, and knowledge. Lastly, perceived safety measures the user’s perception of danger and comfort while interacting with the robot. After the experiment, we measured the internal consistency of the responses using Cronbach’s Alpha scale in the Godspeed questionnaire [36]. Only the items with a Cronbach’s Alpha of 0.7 or higher, in animacy, likability, and perceived intelligence (Cronbach’s Alpha = 0.89), were analyzed in the survey response results.
Table 2 . Godspeed questionnaire list.
Concepts | Questions |
---|---|
Animacy | Q1. Dead or Alive |
Q2. Stagnant or Lively | |
Q3. Mechanical or Organic | |
Q4. Artificial or Likelike | |
Q5. Inert or Interactive | |
Q6. Apathetic or Responsive | |
Likability | Q7. Dislike or Like |
Q8. Unfriendly of Friendly | |
Q9. Unkind or Kind | |
Q10. Unpleasant or Pleasant | |
Q11. Awful or Nice | |
Perceived Intelligence | Q12. Incompetent or Competent |
Q13. Ignorant or Knowledgeable | |
Q14. Irresponsible or Responsible | |
Q15. Unintelligent or Intelligent | |
Q16. Foolish or Sensible |
The Paradise (Paradigm for Dialogue System Evaluation) framework, presented by Walker et al., is a framework for evaluating spoken dialogue agents [37]. The dimensional elements of the Paradise framework consist of dialogue efficiency, dialogue quality, task success, and system usability, which were used to compare whether the dimensional elements of the Paradise framework are related to user perception. Dialogue efficiency measures the efficiency of dialogues from the number of user and system turns by automatically recording all dialogues between the user and the system. The rejection item of dialogue quality means the number of times the system requests a user to repeat their statement owing to unrecognition and processing errors, and the misrecognition item means the number of misrecognitions when comparing the voice recognition results of the system with the actual recorded speech of the user [38]. Task success is an item that indicates whether the user has successfully achieved the task goal. The spoken dialogue system of this experiment reiterates the question until the system completes the questionnaire normally; therefore, task success is always a success from the point of view of the system. Thus, task success was excluded from the measurement item [39]. The user satisfaction item, which is the result of the user’s questionnaire response to the performance of the spoken dialogue system, was also excluded [8]. The primary objective of this experiment was to investigate whether the measurements of dialogue efficiency and dialogue quality influence user perception when comparing scenarios with and without backchannel communication. The omission of the user satisfaction item was a deliberate choice, as the goal of the experiment was not to derive the performance function related to the spoken dialogue performance of the system from the questionnaire results.
The experimental setting is as shown in Fig. 2. The medical-questionnaire robot conducts the questionnaire with the participants using the content in Table 1. In the first experiment, with 25 participants (21 males, 4 females; mean age M = 34.0, SD = 5.62), the medical-questionnaire robot did not provide backchannel. In the second experiment (a week later), with 24 participants (21 males, 3 females; mean age M = 34.0, SD = 5.78), the robot provided backchannel. After each experiment, we sent out a Google Form survey link to the participants to collect their responses.
Fig. 3 shows the comparison of Godspeed scores between the group who interacted with the questionnaire robot with backchannel and the group who interacted with the questionnaire robot without backchannel.
There was no difference between the two groups in animacy (nonbackchannel group mean M = 3.43, SD = 1.04 vs. backchannel group mean M = 3.43, SD = 1.20; t(292) = −0.03; p = 0.97, two-tailed). The group that experienced backchannel showed significantly higher likability than the group that did not experience backchannel (nonbackchannel group mean M = 3.85, SD = 0.92 vs. backchannel group mean M = 4.27, SD = 0.53; t(231) = −3.85; p = 0.0002, two-tailed). In perceived intelligence, the group that experienced backchannel scored significantly higher than the group that did not (nonbackchannel group mean M = 3.60, SD = 0.73 vs. backchannel group mean M = 3.83, SD = 0.52; t(243) = −2.23; p = 0.03, two-tailed).
Finally, to investigate whether backchannel affected the performance of conversations between real users and robots, turns elements were analyzed in terms of dialogue efficiency using the Paradise framework and rejection and misrecognition elements in terms of dialogue quality.
Fig. 4 shows the comparison of Paradise framework analysis results between the group that experienced backchannel and the group that did not. The number of turns exchanged between the user and the robot (nonbackchannel group mean M = 29.04, SD = 6.75 vs. backchannel group mean M = 25.67, SD = 4.28; t(41) = 2.10; p = 0.04, two-tailed), the rate of rejection (nonbackchannel group mean M = 10.74, SD = 7.23 vs. backchannel group mean M = 5.81, SD = 5.61; t(47) = 2.66; p = 0.01, two-tailed) and the rate of misrecognition (nonbackchannel group mean M = 10.16, SD = 9.33 vs. backchannel group mean M = 4.81, SD = 4.09; t(33) = 2.62; p = 0.01, two-tailed) that occurred in the dialogue between the user and the robot showed significant differences.
When comparing the cases where the medical-questionnaire robot provided backchannel and where it did not, it can be observed that the robot received significantly higher average scores in likability and perceived intelligence in Godspeed when it provided backchannel to users. The personalized backchannel responses, tailored to the user’s specific symptoms and issues, likely reinforced a sense of being understood. This personalization made users feel that the robot was genuinely attentive to their responses, making the robot appear more likable, resulting in higher likability scores. Additionally, because the robot was perceived to understand and remember their responses, users rated it higher in perceived intelligence, seeing it as an intelligent and trustworthy entity. This is consistent with Leite et al. [40] and Pelau et al. [41] who found that the empathetic and intelligent behavior of robots is one of the factors that can make users perceive robots as more friendly. When doctors express interest and empathy, their patients perceive them as more comforting and trustworthy, similar to the findings of Mitchell et al. [42]. In contrast, Godspeed’s animacy score received a lower score compared to likability and perceived intelligence. This is because the robot used in the experiment had slight movements and was fixed in one place, without engaging in significant activity. Also, the difference in user perception based on the provision of backchannel was not significant, which can be attributed to the fact that the appearance and behavior of the robot used in the two experiments, as well as the experimental setting, were the same, therefore, users may not have felt any difference in the activities of the robot.
We used the Paradise framework to analyze whether backchannel affects the user-robot conversation performance in terms of dialogue efficiency and dialogue quality. In situations where the robot provided backchannels to the user, it can be observed that the number of dialogue turns between the user and the robot was lower than when the robot did not provide backchannels. If the robot proceeds directly to the next question without providing a backchannel to the user’s response, users may not be ready to listen to the robot’s new question because they cannot predict when the robot’s next question will start. For this reason, users increase the number of dialogue turns by failing to understand the robot’s questions, asking the robot to repeat. This suggests that backchannel serves to inform the user that their voice response has been acknowledged, allowing them to anticipate the next question [18,19,43,44]. This is consistent with prior research [45,46], which has shown that backchannels play a crucial role in establishing grounding in conversation and enhancing user experience by maintaining the natural flow of interaction. Backchannels facilitate smoother communication, increasing the efficiency of interaction and contributing to a more positive user experience.
Therefore, it can be inferred that the interaction in which the medical-questionnaire robot remembers the user’s response and understands and empathizes with the patient’s pain and discomfort had a greater impact on the user's positive perception of the robot. Such interaction also influences the efficiency of the conversation.
In this study, we investigated the impact of using backchannel by a robot to convey empathy and understanding while administering a medical questionnaire. The findings revealed that when the robot expressed empathetic responses, particularly in situations involving user pain and sleeprelated discomfort, users perceived the robot as more appealing and intelligent. These results underscore the significant impact of the empathetic capabilities of robots within medical questionnaire dialogue systems on overall user satisfaction.
This study highlights the crucial role of empathy in human-robot interactions. It suggests that with advances in technology related to medical knowledge management and conversation systems, we can foster positive and enduring interactions between patients and robots. Such interactions can facilitate continuous monitoring and management of patient conditions, offering promising prospects for healthcare applications.
This work was supported in part by the National Research Foundation (NRF) grant funded by the government of the Republic of Korea (MSIT) (No. NRF-2021R1F1A1063669) and the Technology Innovation Program (No. 20015052) funded by the Ministry of Trade, Industry & Energy (MOTIE, Republic of Korea).
Table 1 . List of pain and sleep questionnaires.
Category | Concept | Questions |
---|---|---|
Pain | Position | Q1. Where is the most painful area right now? |
Aspect | Q2. How do you feel the pain in the area? | |
Aggravating factor | Q3. When does the pain get worse? | |
Q4. What posture hurts the most? | ||
Severity | Q5. Please rate how severe the pain is on a scale of 0 to 10. | |
Timing | Q6. How long has the pain been getting worse? | |
Sleep | Bedtime | Q7. What time do you usually go to bed? |
Latency | Q8. How long does it take you to fall asleep after going to bed? | |
Daytime dysfunction | Q9. How often does your sleep disturbance interfere with your daily activities? | |
Duration | Q10. How many hours have you slept? | |
Disturbance | Q11. Do you sleep well without waking up in the middle of the night? |
Table 2 . Godspeed questionnaire list.
Concepts | Questions |
---|---|
Animacy | Q1. Dead or Alive |
Q2. Stagnant or Lively | |
Q3. Mechanical or Organic | |
Q4. Artificial or Likelike | |
Q5. Inert or Interactive | |
Q6. Apathetic or Responsive | |
Likability | Q7. Dislike or Like |
Q8. Unfriendly of Friendly | |
Q9. Unkind or Kind | |
Q10. Unpleasant or Pleasant | |
Q11. Awful or Nice | |
Perceived Intelligence | Q12. Incompetent or Competent |
Q13. Ignorant or Knowledgeable | |
Q14. Irresponsible or Responsible | |
Q15. Unintelligent or Intelligent | |
Q16. Foolish or Sensible |