The COVID-19 Pandemic caused healthcare providers to expand their offerings of alternatives to in-person service delivery. This resulted in a significant increase in virtual health care, including a 1.6-fold increase in electronic patient messaging to physicians. However, clinical physicians who handle large volumes of patients' messages often report burnout. They must spend time answering generic questions, such as scheduling appointments, in addition to addressing more complex medical questions. Strategies to offload these new burdens included limiting notifications, charging for telephone responses, and dividing the workload among general staff.
Additionally, these changes to virtual service discouraged some patients from seeking medical care, as it disrupted their ability to communicate with their physicians. In the wake of the COVID-19 Pandemic, service providers and researchers have continued to explore methods to improve the efficiency, effectiveness, and acceptability of virtual/electronic services. Many fields have recognized the potential of Artificial Intelligence (AI) assistants (e.g., ChatGPT chatbots) as tools to support healthcare delivery. Although promising, formal research on ChatGPT to support the delivery of medical advice or care is limited. Ayers and colleagues evaluated the ability of an AI assistant to write proficient responses to patient questions by comparing them to a physician’s response to the same question posted on a social media forum (i.e., Reddit).
Methods
The researchers compared how a ChatGPT Version 3.5 chatbot and real physicians answered medical questions. They used questions and answers from a public Reddit forum, r/AskDocs. This forum has questions from real patients and responses from verified physicians, whose credentials are checked by moderators, making it a source of true patient–doctor interactions, without needing private medical data. The researchers randomly selected 195 question-answer pairs from October 2022, each containing one patient’s question and one physician’s response. The questions were entered into a new ChatGPT session to prevent direct use of previously answered questions by the chatbot.
An expert panel of licensed health care professionals reviewed all the responses. Experts were asked to rate anonymized answers using a 5-point scale, with 5 being the highest score. Responses were evaluated according to the following criteria: which response was better overall, the quality of the medical information provided, and the level of empathy evident in the response. This study defined quality as the level of expertise. Empathy was defined as the degree of compassion, understanding, and bedside manner conveyed in the response. Researchers then compared the average ratings in quality, empathy, and length of the responses.
Results
Overall, the expert physician panel rated the responses provided by ChatGPT as better, higher quality, and more empathic than those provided by physicians. While 22.1% of physician responses were rated good or very good in quality, 78.5% of ChatGPT’s responses were rated just as high. On average, physician responses were considered slightly empathetic, while chatbot responses were considered empathetic. Physician responses also tended to be much shorter than chatbot responses, with the average of the latter being nearly quadruple in length. Even when accounting for response length, the expert panel provided more favorable ratings for the ChatGPT chatbot responses.
Conclusion
These results suggest that AI assistants can serve beneficial purposes in healthcare settings. Rather than inefficiently responding to patient questions, physicians could instead review AI-written responses to better devote their time to more demanding tasks without sacrificing proper communication with their patients. Simpler patient concerns can also quickly be solved through AI assistant responses, allowing physicians to better allocate their time for more critical cases. Collectively, the use of AI assistants may be an avenue to improve patient satisfaction, reduce physician burnout, and allocate time more efficiently.
Although these findings illustrate the promise of AI assistant use, this study is not without limitations, including a potential disconnect between the types of questions physicians receive and the questions included in this study. The questions generated in this study are all isolated and don’t contain the personalization that questions asked by patients usually contain. In other words, this study only used broad, general questions, rather than personal questions that a patient might ask their primary physician. Also, the authors noted that some questions, such as inquiries about appointments, medication refills, and test results, cannot be answered using AI. Overall, further research is necessary to better understand the impact AI support can have in healthcare settings.
Full article: https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2804309