‘Doctors need to get on top of this’: GPT-4 displays bias in medical tasks

Diagnosis is an especially tantalizing application for generative AI: Even when given tough cases that might stump doctors, the large language model GPT-4 has solved them surprisingly well.

But a new study points out that accuracy isn’t everything — and shows exactly why health care leaders already rushing to deploy GPT-4 should slow down and proceed with caution. When the tool was asked to drum up likely diagnoses, or come up with a patient case study, it in some cases produced problematic, biased results.

advertisement

“GPT-4, being trained off of our own textual communication, shows the same — or maybe even more exaggerated — racial and sex biases as humans,” said Adam Rodman, a clinical reasoning researcher who co-directs the iMED Initiative at Beth Israel Deaconess Medical Center and was not involved in the research.

Unlock this article by subscribing to STAT+ and enjoy your first 30 days free!

GET STARTED