After stratospheric levels of hype, early evidence may be bringing generative artificial intelligence down to Earth.
A series of recent research papers by academic hospitals has revealed significant limitations of large language models (LLMs) in medical settings, undercutting common industry talking points that they will save time and money, and soon liberate clinicians from the drudgery of documentation.
advertisement
Just in the past week, a study at the University of California, San Diego found that use of an LLM to reply to patient messages did not save clinicians time; another study at Mount Sinai found that popular LLMs are lousy at mapping patients’ illnesses to diagnostic codes; and still another study at Mass General Brigham found that an LLM made safety errors in responding to simulated questions from cancer patients. One reply was potentially lethal.
Get unlimited access to award-winning journalism and exclusive events.