Diagnosis is an especially tantalizing application for generative AI: Even when given tough cases that might stump doctors, the large language model GPT-4 has solved them surprisingly well.
But a new study points out that accuracy isn’t everything — and shows exactly why health care leaders already rushing to deploy GPT-4 should slow down and proceed with caution. When the tool was asked to drum up likely diagnoses, or come up with a patient case study, it in some cases produced problematic, biased results.
advertisement
“GPT-4, being trained off of our own textual communication, shows the same — or maybe even more exaggerated — racial and sex biases as humans,” said Adam Rodman, a clinical reasoning researcher who co-directs the iMED Initiative at Beth Israel Deaconess Medical Center and was not involved in the research.
Unlock this article by subscribing to STAT+ and enjoy your first 30 days free!