GPT-4 language model gives limited quality medical answers in study

Dive Brief:

The successor to ChatGPT delivers answers that agree with health guideline recommendations 60% of the time, but gives low- to moderate-quality information, according to a study published in the Journal of Medical Internet Research.
Researchers asked the text-generating artificial intelligence program questions on guideline recommendations for five conditions. The GPT-4 AI gave responses that matched 15 of the 25 recommendations.
Based on the analysis, the researchers concluded that the AI technology provides medical information of similar quality to what is available online and that responses will improve if training datasets are limited to peer-reviewed studies.

Dive Insight:

The ability of generative AI models such as ChatGPT and its successor GPT-4 to interpret and respond to questions is driving interest in using the technologies to provide medical information and help diagnose disease.

To understand the strengths and limitations of the models, researchers asked GPT-4 25 questions based on the guidelines for five diseases — gallstone disease, pancreatitis, liver cirrhosis, pancreatic cancer and hepatocellular carcinoma — and assessed the answers it provided using a tool designed to measure the quality of information available online.

The analysis showed the AI provides information of similar quality to the internet. It achieved an Ensuring Quality Information for Patients score of 16 in gallstone disease, compared to a median of 15 in studies of online information. As the researchers state, the similar results can be explained by the fact that GPT-4 is trained on information online.

One limitation of the model is its failure to highlight medical advice that is contested. The researchers found the AI listed surgery, chemotherapy and radiotherapy as treatments for pancreatic cancer in a way that suggested equivalence between the interventions and failed to explain the sequencing of care. The role of radiotherapy is limited and a subject of debate, nuances that the model was unable to express.

“The AI does not inform its user which medical information is controversial, which information is clearly evidence based and backed by high-quality studies, and even which information represents a standard of care,” the authors wrote.

In light of those limitations, the researchers propose limiting the medical information used by models such as GPT-4 to peer-reviewed studies and adding a bibliography feature so users can read the papers that underpin answers. If refined, the authors said they think that “chatbots might even replace guidelines, as clinicians will be able to rapidly obtain information and guidance, eliminating the need to find, download, and read large documents.”

Better Therapeutics Shutters Operations, Terminates Employees, NASDAQ Delisting

What You Should Know: – Better Therapeutics (BTTX), a developer of prescription digital therapeutics (PDT), announced today it is shuttering operations and terminating its employees

March 15, 2024

Bringing Engineering to a New Level: ZETA’s Integrated Approach – Pharmaceutical Technology

ZETA provides the best-in-class services and customised solutions to specific problems of its customers. Relying on efficient concepts and with a strong emphasis on sustainability,

January 15, 2025

Science and menopause under the spotlight during the Super Bowl – Pharmaceutical Technology

Share this article A still from Pfizer’s “Here’s to science” advertisement, which aired during the Super Bowl. Thirty-second ad slots are reported to cost $7m.

February 12, 2024

Supercharge Your Portfolio with Future Health Stocks!

Join us for Profitable Insights & Expert Tips!

With expert analysis, comprehensive market coverage, and actionable insights, our newsletter equips you with the knowledge & tools necessary to make informed decisions & maximize your potential returns in the dynamic world of future health stocks.