Towards conversational diagnostic artificial intelligence

Tao Tu(Google (United States)), Mike Schaekermann(Google (United States)), Anil Palepu(Google (United States)), Khaled Saab(Google (United States)), Jan Freyberg(Google (United States)), Ryutaro Tanno(Google (United States)), Amy Wang(Google (United States)), Brenna Li(Google (United States)), Mohamed Amin(Google (United States)), Yong Cheng(Google (United States)), Elahe Vedadi(Google (United States)), Nenad Tomašev(Google (United States)), Shekoofeh Azizi(Google (United States)), K. K. Singhal(Google (United States)), Le Hou(Google (United States)), Albert Webson(Google (United States)), Kavita Kulkarni(Google (United States)), S. Sara Mahdavi(Google (United States)), Christopher Semturs(Google (United States)), Juraj Gottweis(Google (United States)), Joëlle Barral(Google (United States)), Katherine Chou(Google (United States)), Greg S. Corrado(Google (United States)), Yossi Matias(Google (United States)), Alan Karthikesalingam(Google (United States)), Vivek Natarajan(Google (United States))
Nature
April 9, 2025
Cited by 227Open Access
Full Text

Abstract

Abstract At the heart of medicine lies physician–patient dialogue, where skillful history-taking enables effective diagnosis, management and enduring trust 1,2 . Artificial intelligence (AI) systems capable of diagnostic dialogue could increase accessibility and quality of care. However, approximating clinicians’ expertise is an outstanding challenge. Here we introduce AMIE (Articulate Medical Intelligence Explorer), a large language model (LLM)-based AI system optimized for diagnostic dialogue. AMIE uses a self-play-based 3 simulated environment with automated feedback for scaling learning across disease conditions, specialties and contexts. We designed a framework for evaluating clinically meaningful axes of performance, including history-taking, diagnostic accuracy, management, communication skills and empathy. We compared AMIE’s performance to that of primary care physicians in a randomized, double-blind crossover study of text-based consultations with validated patient-actors similar to objective structured clinical examination 4,5 . The study included 159 case scenarios from providers in Canada, the United Kingdom and India, 20 primary care physicians compared to AMIE, and evaluations by specialist physicians and patient-actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 30 out of 32 axes according to the specialist physicians and 25 out of 26 axes according to the patient-actors. Our research has several limitations and should be interpreted with caution. Clinicians used synchronous text chat, which permits large-scale LLM–patient interactions, but this is unfamiliar in clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI.


Related Papers

No related papers found

Powered by citation graph analysis