Evaluating the Accuracy and Reliability of Large Language Models (ChatGPT, Claude, DeepSeek, Gemini, Grok, and Le Chat) in Answering Item-Analyzed Multiple-Choice Questions on Blood Physiology

Mayank Agarwal(All India Institute of Medical Sciences)

Cureus

April 8, 2025

10.7759/cureus.81871

Cited by 17

Related Papers

Evaluating ChatGPT-3.5 and Claude-2 in Answering and Explaining Conceptual Medical Physiology Multiple-Choice Questions

|Cureus|2023|34