Doctors use problematic race-based algorithms to guide care every day. Why are they so hard to change?

Pediatrician Alexandra Epee-Bounya had had enough. In her 20 years caring for children in Boston, she had seen hundreds of kids with suspected urinary tract infections. Each time, she’d turn to a calculator, used by all Boston Children’s Hospital clinicians, to judge the youngest childrens’ risk. Did the infant have a high fever? Add a point. Was she a girl? Add two points.

As she went down the list, one of the factors tripped her up every time: Was the child Black? If not, add a point. The more points, the higher the risk of a UTI, which meant the child would get follow up testing. How could it be that the color of a child’s skin dictated their care?

“It always rubbed me the wrong way,” said Epee-Bounya, whose mother was from France and father is from Cameroon. She considers herself mixed race, but “in this country — I came when I was 16 — I learned that I was Black,” she said. How would a doctor categorize her, or her children, she wondered?

Her frustration boiled over in the spring of 2019. In a hurried moment between appointments, she shared her concerns with a colleague in the hallway. “I don’t understand this ‘Black race’ characteristic,” she said. “It doesn’t make sense.”

That conversation would lead Boston Children’s to remove or modify the use of race, ethnicity, or ancestry from eight such algorithms used to guide physicians’ decisions about patient care, including for UTI.

But a STAT investigation found that race-based algorithms are still widely used across medicine, on millions of patients a year. Growing numbers of clinicians, researchers, and health care leaders argue that it is wrong to consider people of different races as biologically different, and to incorporate those outdated notions into clinical tools. They had early successes, as in Boston, but now are confronting powerful headwinds, including challenges from the political right. “The minute it’s no longer in vogue, we don’t hear about it,” said Epee-Bounya.

That’s only part of the explanation. In more than a hundred interviews with clinicians and researchers, STAT found a health care system struggling to reassess its scientific and ethical assumptions about race. Clinicians have been locked in fierce debates about the best way to modify their tools to reduce harm and create fairer outcomes for patients. If race is scratched out of a tool, it’s often an exasperating process to get the revamped version used consistently across America’s disjointed health care system. And there’s currently no way to enforce standards for how race is used by clinicians or researchers.

Race-based calculators became a flashpoint after the murder of George Floyd in 2020, which ignited a movement for racial justice that rippled into medicine. Lawmakers and scientists issued calls to eliminate clinical tools that perpetuate bias and may harm patients of color. A kidney health calculator kept Black patients from receiving needed transplants. Lung function testing with race corrections led to missed diagnoses of severe pulmonary disease. And in all likelihood, UTI guidelines for little kids — created to avoid subjecting them to needless catheterizations — left Black girls with undiagnosed infections, and in some cases, long-term kidney damage. Since then, a handful of race-based calculators used nationally have been revised, and several more are subject to debate.

[embedded content]

Just a few years later, though, that early progress is stalling. At the University of Pittsburgh, researchers have identified at least 40 clinical algorithms that still include race adjustments. Many are used daily by physicians to help make decisions about the patient in front of them: They punch in patient characteristics, and the algorithm uses a formula to spit out a result that guides care. While a few calculators on that list are under review, dozens that may be harming patients have yet to be changed, or even reexamined. And that doesn’t count hundreds of home-grown tools, like those at Boston Children’s, that are used in individual hospitals.

In the face of this daunting task, leadership is scarce. The work to reconsider race’s role in medical decisions is often being driven by medical trainees: young, energetic, and deeply aware of the insidious way that structural racism can perpetuate inequities. At Boston Children’s, for example, pediatric resident Bobby Rosen assembled a group of volunteers to evaluate the hospital’s guidelines for lingering mentions of race, ethnicity, and ancestry.

While their efforts are laudable, advocates say a grassroots and piecemeal approach isn’t the way to tackle such an entrenched issue. “This is a very difficult problem that cannot be solved through the volunteer labor of trainees,” said Jenny Tsai, an emergency medicine physician in Oakland, Calif., who pushed for changes while a medical student. If medicine needed to reform anatomy training, she added, “they would never get two medical students to work on it over the summer and try to figure it out.”

With organized medicine taking halting steps, the Biden administration is using its powers to encourage changes. In part of a rule that kicks in next May, the U.S. Department of Health and Human Services prohibits discrimination through the use of patient care decision support tools, including commonly used calculators, and requires care providers to take steps to identify and fix discriminatory tools.

But advocates are skeptical the rule will have much impact. The policy is built to drive voluntary compliance more than penalize use of discriminatory tools. And as pediatricians have learned in the case of UTI, responsibility for race-based clinical tools is sprawling: It falls to individual clinicians and health systems, the researchers that develop new tools, and medical specialty societies that vouch for their use. This complex web has often escaped — and will continue to escape — individual pressure to act.

“Accountability is … hugely lacking, both the carrot and the sticks,” said Michelle Morse, chief medical officer for New York City’s health department and leader of the NYC Coalition to End Racism in Clinical Algorithms (CERCA), which has focused on getting health systems to adopt four national race-redacted tools. Health equity advocates like Morse are also calling for the leadership and funding to make meaningful progress across medicine.

“There was really never a coalition,” said Epee-Bounya, that could spread change like hers beyond the walls of her own institution. “Little pockets of work were easily extinguished, because there was not a clear framework on a national level.”

A portrait of Nader Shaikh, MD, MPH, professor of pediatrics and clinical and translational science at the University of Pittsburgh — UPMC pediatrician and clinical researcher Nader Shaikh developed a calculator for UTI risk that included race — but later questioned whether race was necessary at all. Stephanie Strasburg for STAT

When race was put into Boston Children’s UTI calculator, it codified a statistical reality: Over and over, studies had shown that Black kids were less likely to be diagnosed with UTIs. Knocking off a point from their risk tally, clinicians thought, could be a helpful shortcut to avoid delivering inappropriate — and in some cases, painful — care.

If a young child shows up with a fever, doctors need a urine sample to test whether it’s caused by a UTI. For an infant or toddler who isn’t yet potty trained, getting that sample requires time-consuming and invasive catheterization.

It’s no small feat to corral such a tiny, squirming ball of energy. “There’s lots of holding and crying,” said Nader Shaikh, a clinical researcher and pediatrician who practices at UPMC. “The child doesn’t like it, it’s painful, so it’s not a fun procedure.” Pediatricians avoid putting in a catheter unless they have a strong sense that a child really might have a UTI — and race-based clinical tools have been there to help doctors make the call.

The American Academy of Pediatrics’ clinical practice guideline has included race as a UTI risk factor since 2011, and was reaffirmed in 2016, influencing home-grown tools like those at Boston Children’s. And in 2018, when Shaikh and his colleagues published a tool called UTICalc, they followed in kind. Because information about race has been collected in medical studies for decades, albeit haphazardly, it often emerges in epidemiological data as a powerful predictor of disease — even when, as in the case of UTI, doctors don’t understand why.

A racial difference might have appeared in the data, clinicians thought, because of some underlying difference in genetic ancestry or biology. There were some guesses that white kids were more likely to have a “sticky epithelium,” helping bacteria take seed in the urinary tract. But they were just that: guesses. Maybe it was that white kids had better access to care, so they were simply diagnosed and recorded with UTIs more frequently.

During medical training, “I was told, this is what the data show,” said Tiffani Johnson, a Black pediatric emergency medicine physician at UC Davis Health who studies the impact of race and racism on child health. “And so then I just believed it.” Like Epee-Bounya, she used the race-based UTI guidelines for years, despite a nagging feeling that something was off.

That inertia has for decades allowed race-based decision tools to spread and root deeply into clinical specialties. Maia Hightower, CEO of EqualityAI, a company supporting the development of unbiased machine learning algorithms, said “medicine is riddled with these shortcuts and rubrics” that are often tied to the prestige of those who create them rather than data underlying them, a practice she terms “eminence-based medicine.”

For most researchers and physicians, those shortcuts were good enough — as long as they appeared to predict risk reliably. Many clinicians considered them an improvement: Until the National Institutes of Health began mandating the collection of race data and set standards for diversity in clinical trials in the 1990s, disease risk predictions defaulted to reflect the white populations that dominated research. Adding race to the equation, they thought, would help move from whitewashed medicine toward more personalized care and help curb disparities.

But some race-based calculators did just the opposite, advocates say, worsening disparities for the disadvantaged populations they were meant to help.

In the mid-2010s, clinicians began to raise concerns about the misuse of race in clinical tools, as did a new generation of medical students. Using race to inform clinical risk predictions, they argued, perpetuates a harmful fallacy that race plays an immutable, biological role in health outcomes. “If race is a social construct and not a biological construct,” questioned Johnson, “why is race included in this calculator? And if we suspect that race is a proxy for something else, we need to figure out what that something else is.”

These calculators and guidelines also can lead to unfair or inequitable treatment for marginalized groups. Because those groups are dramatically underrepresented in clinical datasets, many tools spit out more accurate predictions for Caucasian patients.

Shaikh’s UTI risk calculator, he later calculated, would correctly identify 98% of non-Black kids with a UTI, sending them on to get the necessary follow-up care. Black kids? Just 82%.

Black girls were especially at risk, because girls get UTIs more easily than boys. “I don’t believe my colleagues woke up one morning and said we’re going to discriminate against little Black girls,” said Joseph Wright, chief health equity officer for the AAP. “However, the algorithm has not been able to definitively say to me or to any of us that we are not potentially creating a cohort of little Black girls who will have missed urinary tract infections and suffer complications.”

Maybe they’d be sent home, only to have to return to the emergency room when their fever didn’t subside. If an infection went untreated long enough, they could end up with renal scarring — which could lead to hypertension, increased preeclampsia risk, and even kidney failure down the line. Clinicians couldn’t be sure how many Black girls were harmed. By the time any scars appeared, the initial, untreated UTI was long gone.

Joseph Wright, the chief health equity officer at the American Academy of Pediatrics, stands inside his home Monday, June 24, 2024, on Hilton Head, S.C. — Joseph Wright, chief health equity officer at the American Academy of Pediatrics, is leading efforts to examine the use of race in more than 100 of the academy’s clinical guidelines. GAVIN MCINTYRE FOR STAT

For health equity advocates like Morse, whose critiques of race-based tools had gained little traction, 2020 was a watershed year. Suddenly, everyone from politicians to medical society leaders to journal editors was swept up by the urgency to confront embedded bias.

At Harvard, Darshali Vyas and a group of her fellow medical trainees had been working on a paper highlighting 13 examples of race-based algorithms — including Shaikh’s UTICalc — and calling for their examination.The paper had been accepted by the New England Journal of Medicine, but sat unpublished. “It wasn’t a burning issue for anyone at the time,” said co-author and Harvard medical ethics professor David Shumway Jones.

On May 25, George Floyd was murdered by a police officer in Minneapolis. Shortly afterward, Jones got a call from the editors at NEJM: Could they publish the new paper ASAP?

Quickly, some of those 13 race-based tools came under political pressure. Democratic lawmakers commissioned a report from the Agency for Healthcare Research and Quality. The chair of the House Ways and Means Committee — where Morse had been stationed as a health policy fellow — issued a call to action to medical societies, which adopt and disseminate clinical guidelines and tools, to root out the misuse of race in care.

By mid-2021, calculators used in obstetrics and nephrology, singled out by committee chair Richard Neal (D-Mass.), had been updated to remove race.

The American Thoracic Society, also name-checked by the House, supported the removal of race from lung function tests in the face of skepticism from many member pulmonologists. “It’s a big change in a conservative discipline. They did it in a way that was not knee-jerk or blind,” said Michael Ieong, an attending pulmonologist at Boston Medical Center, which recently implemented the society’s recommendations.

The American Academy of Pediatrics didn’t get a letter from the House. But its leaders recommended prohibiting “the use of race-based medicine,” and in May 2021, it voted to retire its race-based clinical practice guideline for UTI management in the youngest children.

It was a very unusual decision, said AAP chief executive Mark Del Monte. In his memory, the academy had never retired a clinical guideline early, bypassing an adjudication process that typically takes years. An updated UTI guideline is still under review.

“At the end of the day, as physicians, our goal is do no harm, right?” said Johnson, who co-authored a 2022 AAP policy statement on eliminating race-based medicine. “So let’s remove race, and make sure we’re not causing harm to an already vulnerable and marginalized population.”

In the ensuing years, the credo of do no harm has become a double-edged sword. While some clinicians say leaving race in these clinical tools is clearly harmful, others invoke the phrase as they argue for slower, more considered action on excising race — and sometimes, no action at all.

“The first rule always is do no harm,” said Helen Burstin, CEO of the Council of Medical Specialty Societies. “Everyone wants to be very sure that whatever they’re doing — if they remove something that’s been in place for a long time, that’s the basis of a lot of the guidance that’s already used — that it is done correctly.”

That attitude helps explain the results of a recent survey of specialty societies conducted by the American Medical Association. While 20 said they had eliminated problematic race-based tools or were working on it, 14 reported they had not considered or taken action on tools that incorrectly use race, or didn’t find the goal applicable to their organization. Eight said they planned to make changes but hadn’t started, and another 111 specialty societies didn’t respond at all.

AMA chief health equity officer Aletha Maybank said most groups with responsibility for a harmful algorithm are aware they have a problem to solve. “It’s now about the context of will — and then also, how will they do it?”

Physicians in some specialties believe race is too powerful a predictor to banish from their clinical calculators — a critical signal in the noise of medical data. They think getting rid of race in their algorithms could do further harm to their patients, including those who are most vulnerable.

“We’re scientists first,” said breast cancer oncologist Debra Patt, who sits on the board of the American Society of Clinical Oncology. Several breast cancer risk calculators incorporate race as a predictive variable, and Black women, for example, are more likely to have triple-negative breast cancer. Researchers don’t know whether that’s a result of socioeconomic factors, a gene variant more common in certain racial populations, or something else. But teasing out those underlying causes is a “secondary issue,” she said. “If there are compelling reasons to include race as a variable — which there are — because of the difference in presentation, difference in prognosis, sometimes differences in therapy, then it’s going to be included,” said Patt.

The Society for Thoracic Surgeons, too, has resisted removing race from its operative risk calculators, explaining in a 2023 letter to the Agency for Healthcare Research and Quality that without race, “patients, providers, and other stakeholders would knowingly be given inaccurate information.” In April this year, the STS launched three new calculators to predict the risk of undergoing certain valve surgeries, all of which also use race as a variable. Two of the tools include guidance that they shouldn’t be used to exclude patients from surgery based on high risk associated with a single characteristic such as race; health equity advocates say existing calculators have been doing exactly that for years. The society declined multiple requests from STAT to discuss its approach.

The specialty societies’ varied responses reflect the mixed — and to some, disappointingly wishy-washy — conclusions of the AHRQ review, which was finally published in December, more than three years after it was commissioned. In a review of 63 studies, the authors found that some race-based algorithms led to harm — which is why many experts believe the use of race should be avoided when possible. But they also found that health care algorithms that include race can reduce disparities when they’re built intentionally to do so. Their short, frustrating answer: context matters.

Clinicians would love clear, incontrovertible proof that race-free tools won’t lose their predictive power, or harm patients in unforeseen ways. But generating that evidence would add years to the process: It is expensive and time-consuming to test algorithms on real patients. In the meantime, biased tools can continue to harm patients.

In the absence of conclusive answers, clinicians and medical societies are left to make a value judgment, choosing between maximum predictive accuracy or racial justice.

“Whose care is centered in these algorithms … and who is considered the gold standard,” said CERCA’s Morse. “Those are the critical questions to ask.”

A screen capture of The University of Pittsburgh's UTI calculator online tool — The University of Pittsburgh’s UTI calculator asked clinicians to mark whether a patient self-identified as Black. Screenshot via University of Pittsburgh

In many instances, clinicians and researchers have been left to wrestle with these choices on their own.

When the American Academy of Pediatrics retired its UTI guideline without a replacement in 2021, pediatricians had to decide for themselves how to treat their youngest patients. Many turned to Shaikh’s independent calculator, part of a huge class of unofficial yet widely used clinical decision-making algorithms.

By then, Shaikh had already begun wrestling with the role race played in his calculator, which also factored in age, body temperature, gender, circumcision status, and likely causes of fever. “I didn’t have expertise or support,” said Shaikh. “Basically, I was trying to figure it out myself.”

In late 2020, he started experimenting with a new version of UTICalc, one that would let parents choose whether to include race in their child’s risk assessment. It was a doomed idea.

“I didn’t appreciate the problem until I was talking to the parents,” said Shaikh. Even if race could make risk prediction more accurate, parents told him they weren’t comfortable with race driving their childrens’ care. Shaikh had come to agree: If at all possible, race shouldn’t be used as a predictor of disease.

“We were basing testing on race,” he said. “Which is just the definition of racism.”

Now, the question was how to build a tool that accurately predicted UTI risk — without leaning on race. Simply stripping the variable out wouldn’t work. If there was truly some unseen force driving lower UTI rates in Black children, a race-free calculator could result in more false positives and painful catheterizations for Black kids who never had UTIs.

So Shaikh started evaluating other information that seemed like it could influence UTI risk, including a history of previous UTIs or the duration of fever. “When we gathered the data, we realized it works,” said Shaikh. “We could get rid of race completely.” Around the same time, obstetrics researchers were having a similar realization about the calculator that judged the risk of attempting vaginal birth after a C-section, or VBAC: hypertension could replace race.

Shortly after the AAP retired its race-based UTI guideline, Shaikh put a new version of the calculator online, later publishing the work in JAMA Pediatrics. Both new and old versions caught about 95% of true UTIs, he reported. Johnson, at UC Davis, says she and many of her colleagues in pediatric emergency care are now using the revised calculator.

Shaikh’s calculator had cleared the hurdle of accuracy, but Shyam Visweswaran, vice chair of clinical informatics at the University of Pittsburgh, came to Shaikh asking the next critical question: Was it actually fair?

Fairness has no single, simple definition. Because UTICalc is meant to be used as a screening tool, Shaikh and Visweswaran focused on checking that it missed as few UTIs as possible, and whether it erred at the same rate for Black and white children. The original version had been far more sensitive for white kids. But the race-free update showed nearly no disparity.

“Taking race out seemed like it would break things,” said Shaikh. “But actually, if anything, it didn’t break anything — maybe it improved things.”

As part of his job, Visweswaran helps clinicians implement algorithms for research. He had never thought much about the role of race in those tools. But medicine’s racial reckoning opened his eyes — and in 2023, he and his colleagues started to assemble a database of as many race-based algorithms as they could find.

The database now includes 48 algorithms scattered across clinical fields — each powered by gut-based distinctions between poorly defined racial categories with little relationship to an individual’s genetic ancestry. In an increasingly multicultural America, Visweswaran realized, these tools are built on a foundation of Jell-O.

Shazia Siddique speaks at an event “Together to Catalyze Change for Racial Equity in Clinical Algorithms, at the National Academies of Science Keck Center. — Gastroenterologist and health services researcher Shazia Siddique, who reviewed how clinical algorithms could impact disparities, spoke at a meeting devoted to the issue in June. Ian Wagreich for The Doris Duke Foundation

The enormous challenge of solidifying that foundation was on full display at a gathering in Washington, D.C. in June. Each of the hundred or so attendees — from federal agencies, health systems, and medical groups — seemed committed to the goal. But at the podium and in the hallways at the National Academies of Sciences, Engineering, and Medicine, their conversations went in circles.

“We’re sitting here revising all these algorithms, all these guidelines, but at the same exact time, new guidelines are being generated with the exact same potential of racial and ethnic bias,” Shazia Siddique, a health systems researcher and gastroenterologist at the University of Pennsylvania who led the AHRQ review, said in one session. In part, that’s because leaders of some specialties still struggle to talk about race. It’s easy to remain in a defensive posture, especially as political pressure mounts against diversity, equity, and inclusion efforts.

“We’ve already seen pushback,” Burstin, the Council of Medical Specialty Societies CEO said during her introduction. As Black patients have their kidney function reassessed, for example, some have been able to get their transplants sooner — sparking complaints from some that white patients now wait longer. “People have a perception of who is then a winner and a loser,” she said.

Despite their commitment, academic and clinical leaders at the meeting were daunted by the scale of the work ahead. The Doris Duke Foundation has granted $13 million to groups like AAP, where Wright has led work to evaluate 149 of the academy’s policies and guidelines for race-based approaches in pediatrics. But short-term philanthropic grants aren’t enough to tackle the backlog: A full 75 need some form of revision — with new science needed to make judgments.

Speakers called on scientific journals and funders to lead the effort to fill in data gaps needed to craft better algorithms. A report to be published in October from NASEM on the role of race and ethnicity in biomedical research could provide a path forward. But no one seems ready to accept full responsibility for this work, including the National Institutes of Health.

“We have intentionally stayed away from this topic as a research one, and let the societies and the clinical scientists figure it out,” Eliseo Pérez-Stable, director of the National Institute on Minority Health and Health Disparities, said in an interview with STAT.

One federal agency has pulled out a stick of sorts: Just a month before the meeting, a rule issued by HHS explicitly prohibited discrimination through the use of patient care decision-support tools. That includes everything from simple ER triage flowcharts, to the kidney function calculator, to predictive artificial intelligence models.

Attendees were excited about the potential of the rule — but concerned it was far from enough to force change to harmful race-based clinical algorithms across the nation.

In an interview with STAT, HHS Office for Civil Rights director Melanie Fontes Rainer said if a health system identifies a potentially discriminatory algorithm, “they can stop or pause the use of the tool altogether,” or create policies for their use so they don’t result in discrimination. But the rule doesn’t specify definitive steps providers must take to address those risks. And with OCR focused on encouraging voluntary compliance, it’s unclear whether health systems will be motivated to invest in dismantling discriminatory algorithms.

For now, that means the hard questions will remain in the hands of providers. And the difficult conversations will continue, whether in crowded clinic hallways, medical society board rooms, or executive suites of major health systems.

Rosen, the Boston Children’s resident, knows how hard those conversations can be. “People come in really hot,” he said. Some say, “‘What are you talking about? The research is clear: There are racial and ethnic associations.’” Others say, “‘You know, this is crazy. We’re using race and ethnicity in our pathways. That’s racist. How could we be doing that?’”

It’s possible to make progress, he has found, when clinicians tackle the issue of race with rigor, and open themselves to all of its nuance. “Complexity humbles people’s intensity of emotion on both sides,” he said. For now, at least the race of a child won’t mean their UTI is missed.

STAT’s coverage of health inequities is supported by a grant from the Commonwealth Fund. Our financial supporters are not involved in any decisions about our journalism.

Doctors use problematic race-based algorithms to guide care every day. Why are they so hard to change?

PFA devices are changing AFib treatment. The next wave could grow the market further.

Tariffs on medical items should be reinvested in the health care supply chain

Uterine Cancer Trials Made Waves This Year, Experts Say

Supercharge Your Portfolio with Future Health Stocks!

Join us for Profitable Insights & Expert Tips!

Doctors use problematic race-based algorithms to guide care every day. Why are they so hard to change?

PFA devices are changing AFib treatment. The next wave could grow the market further.

Tariffs on medical items should be reinvested in the health care supply chain

Uterine Cancer Trials Made Waves This Year, Experts Say

Supercharge Your Portfolio with Future Health Stocks!

Join us for Profitable Insights & Expert Tips!

Subscribe