Chatbots as Patient Education Resources for Aesthetic Facial Plastic Surgery: Evaluation of ChatGPT and Google Bard Responses

Neha Garg; Daniel J. Campbell; Angela Yang; Adam McCann; Annie E. Moroco; Leonard E. Estephan; William J. Palmer; Howard Krein; Ryan Heffelfinger

doi:10.1089/fpsam.2023.0368

Chatbots as Patient Education Resources for Aesthetic Facial Plastic Surgery: Evaluation of ChatGPT and Google Bard Responses

Neha Garg(Jefferson University Hospitals), Daniel J. Campbell(Jefferson University Hospitals), Angela Yang(Sidney Kimmel Cancer Center), Adam McCann(Jefferson University Hospitals), Annie E. Moroco(Jefferson University Hospitals), Leonard E. Estephan(Jefferson University Hospitals), William J. Palmer(Jefferson University Hospitals), Howard Krein(Jefferson University Hospitals), Ryan Heffelfinger(Jefferson University Hospitals)

Facial Plastic Surgery & Aesthetic Medicine

July 1, 2024

10.1089/fpsam.2023.0368

Cited by 14

Abstract

Background: ChatGPT and Google Bard™ are popular artificial intelligence chatbots with utility for patients, including those undergoing aesthetic facial plastic surgery. Objective: To compare the accuracy and readability of chatbot-generated responses to patient education questions regarding aesthetic facial plastic surgery using a response accuracy scale and readability testing. Method: ChatGPT and Google Bard™ were asked 28 identical questions using four prompts: none, patient friendly, eighth-grade level, and references. Accuracy was assessed using Global Quality Scale (range: 1–5). Flesch-Kincaid grade level was calculated, and chatbot-provided references were analyzed for veracity. Results: Although 59.8% of responses were good quality (Global Quality Scale ≥4), ChatGPT generated more accurate responses than Google Bard™ on patient-friendly prompting ( p < 0.001). Google Bard™ responses were of a significantly lower grade level than ChatGPT for all prompts ( p < 0.05). Despite eighth-grade prompting, response grade level for both chatbots was high: ChatGPT (10.5 ± 1.8) and Google Bard™ (9.6 ± 1.3). Prompting for references yielded 108/108 of chatbot-generated references. Forty-one (38.0%) citations were legitimate. Twenty (18.5%) provided accurately reported information from the reference. Conclusion: Although ChatGPT produced more accurate responses and at a higher education level than Google Bard™, both chatbots provided responses above recommended grade levels for patients and failed to provide accurate references.

Related Papers

No related papers found

Powered by citation graph analysis