Study finds health care evaluations of large language models lacking in real patient data and bias assessment

A systematic review reveals only 5% of healthcare evaluations for large language models use real patient data, highlighting gaps in bias and task assessment.