Getting rid of bias in clinical calculators isn’t as simple as taking race out of the equation

Racial bias is everywhere in medicine, including the calculators doctors commonly use to predict a patient’s risk of disease and inform their treatment. A growing movement is encouraging medical specialties and hospitals to reconsider the use of race in those tools.

But a new study shows that removing bias isn’t as simple as taking race out of the equation.

advertisement

Using records from thousands of colorectal cancer patients in California, researchers from the University of Washington tested the performance of four algorithms that predicted the likelihood cancer would return after a tumor was removed. The model that included race and ethnicity as a predictive variable, they found, performed more equally across groups than a model with race redacted.

“Many groups, including our colleagues in the university, have called for the removal of race in many of the existing clinical algorithms,” said UW Ph.D. student and health economist Sara Khor, who led the research published Thursday in JAMA Network Open. “I think we need to understand what kind of implications that can have and whether that will actually harm patients of color before just removing all variables of race.”

Increasingly, clinicians and policymakers are seeking to strike racial bias from medical practice by updating clinical algorithms that lean on race and ethnicity to calculate risk scores and inform treatment decisions. In recent years, race has been removed from calculators used to estimate kidney function and predict the success of a vaginal birth after a cesarean section.

advertisement

The findings don’t mean that keeping race in clinical algorithms is the right way to stave off bias. But they do make clear that simply removing it isn’t a guarantee of equally accurate results for patients — or more equitable health outcomes.

“Until we know more, there may be circumstances in which including race may be useful,” said Chyke Doubeni, chief health equity officer at Ohio State University Wexner Medical Center. “But we can’t do that blindly across the board in all cases.”

The study calculated the performance of each model with multiple measures of “algorithmic fairness,” including calibration, a measure of discrimination called area under the curve, false positive and negative rates, and positive and negative predictive value. For some of those measures, it found that performance differed more widely between subgroups of patients — non-Hispanic White, Black or African American, Hispanic, and Asian, Hawaiian, or Pacific Islander — when race was excluded from the model.

But a fair algorithm doesn’t ensure fair health outcomes. Importantly, the analysis doesn’t show how each algorithm’s output would impact cancer treatment in different racial and ethnic groups, or its downstream impact on mortality. “When an algorithm is inaccurate, then we might be asking patients to come in less frequently than they needed to,” Khor said. That could lead to testing delays, missed cancer recurrences, and ultimately higher death rates.

Those are the results that are critical for patients who are already subject to systemic health disparities — like Black patients, especially Black men, who are far more likely to die of colorectal cancer than other groups.

“Let’s take it a step further and think about what actually matters,” said Ankur Pandya, a health decision scientist at the Harvard T.H. Chan School of Public Health, who co-authored a commentary published alongside the study. “What are the decisions that are going to be triggered by these algorithms, and how would those decisions impact outcomes we care about?”

In the case of the clinical calculator for estimated glomerular filtration rate, a measure of kidney function that historically took race into account, high scores for Black patients can result in lower transplant rates or delayed transplants. “You can see the direct line between the algorithm, the results of the algorithm, and exacerbating the disparity,” said Pandya. That led to the adoption of a new, race-free tool.

Concerns over downstream effects are also why race has not yet been removed from a common calculator for atherosclerotic cardiovascular disease risk, said Aisha James, director for racial justice in medicine at Massachusetts General Hospital. “Some of the calculators that we use seem to be quote-unquote ‘protective,’ in that they identify that people from marginalized racial groups have an increased risk of disease,” said James. If you remove race from those calculators, “you may exacerbate inequities that already exist.”

Some health economists, clinicians, and ethicists go further, though, saying that there is no reason for race to be included in clinical algorithms in the first place. 

Often, race in clinical algorithms is being used as a proxy for some other missing variable, including those structural inequities. A more complete solution to the problem, then, is to uncover those true drivers and use them to design algorithms that deliver accurate results without using race as a convenient stand-in.

“We need to realize that race is not biological. It’s racism that impacts health outcomes, and we have to think about the structural factors that lead to poor health outcomes,” said Marie Plaisime, a medical sociologist and health and human rights fellow at Harvard. “I think we have to be race-conscious — in figuring out why we are using race — but race can’t be a proxy.”

Khor sees potential in that long-term vision: Over time, it should become more possible to replace race with meaningful clinical variables that accurately predict patients’ outcomes.

That process will be complex, and is the focus of a growing body of research. But there are reasons for optimism, said Pandya. “We have more and more access to data now than we ever did,” he said. “Some of these old scores probably need to be thrown in the garbage anyway. As we’ve got newer, better data, we might as well start with a new score.”

In the process, though, researchers and clinicians will need to be careful that racial bias does not seep into new algorithms in less obvious ways. Even carefully-trained models, assembled without the influence of race or ethnicity data, have the potential to result in biased outcomes due to racial information deeply embedded in a variety of clinical measures — even the language physicians use in their clinical notes.

“There’s so much more to be done,” said Plaisime, who is on the advisory council for the Coalition to End Racism in Clinical Algorithms, a cross-disciplinary effort supporting 12 health systems as they end the use of race adjustment in three clinical algorithms. Meaningful change, she said, needs to include conversations between clinicians, policymakers, and scientists.

“We’re all in these silos, and I hope that we’re all trying to do the same thing,” said Plaisime. “Everyone needs to be at the table to figure out how to create equitable tools.”

This story is part of a series examining the use of artificial intelligence in health care and practices for exchanging and analyzing patient data. It is supported with funding from the Gordon and Betty Moore Foundation.