There’s a public health crisis lurking in our data: the Census option ‘some other race’

According to the 2020 Census, the second most common race in America, after white, is “Some other race,” an option chosen by an astonishing one out of seven people. The nationwide failure to accurately measure the variety of races and ethnicities that make up the U.S. population makes underrepresented groups invisible in public health data, resulting in policies informed by inadequate or misleading information.

One of the main drivers of this unfortunate categorization is the widespread practice of data aggregation.

advertisement

Health data collection often involves aggregating, or grouping, individuals into broad racial and ethnic categories. So, for example, as Usha Lee McFarling recently reported for STAT, the catch-all term “Asian” — one of the five mandatory racial categories in federal reporting — lumps together groups who face very different health challenges. Liver cancer is more than twice as common among Asian Americans than white Americans, but the rates for Laotian Americans are more than seven times higher than white Americans. This important distinction is lost when these unique populations are grouped together.

The U.S. Office of Management and Budget’s five mandatory racial categories are white; Black or African American; American Indian or Alaska Native; Asian; and Native Hawaiian or other Pacific Islander. These limited racial categories leave many people without an accurate designation. As a result, nearly 50 million people in the 2020 Census chose “some other race,” up 129% from 2010. Furthermore, these predefined categories often obscure the way that identities intersect for many people.

The persistence of these broad categories is partly a vestigial constraint from a time when paper surveys dominated. Today, we have the tools and infrastructure to collect and process data with greater precision.

advertisement

Rather than replacing five categories with many, we can implement a more nuanced approach. Surveys, which are now often conducted online or on tablets, can first ask for a person’s broad category — say, Asian — and then dynamically provide options for individuals to specify their exact racial or ethnic identity, in terms they identify with, such as Laotian, Hmong, or Vietnamese. While there should be a balance between specificity and usability, technology has enabled us to offer more choices without overwhelming respondents.

Allowing survey respondents to choose terms they most identify with, and program for flexibility in the backend with how these terms are grouped, categorized, and ultimately reported on can lead to innovative reporting practices that are not only more accurate, but also more effective.

This approach is known as data disaggregation: breaking down the information into smaller, more specific variables. Data disaggregation provides a more profound understanding of populations and trends. By allowing for more flexible data collection and grouping by various parameters such as age, sex, education, race, and socioeconomic variables, we can create a data system that can adapt to specific needs while retaining the data that is already collected. Retaining disaggregated data allows researchers and public health practitioners to define the groups that best fit their methods and research purposes.

To improve public health, government agencies must pursue efforts in data disaggregation.

The data we don’t have is often more consequential for public health than what we do. For instance, over three years into the pandemic, Georgia only knows the race and ethnicity for half of the Covid-19 cases it reports to the CDC, making reliable, data-driven decisions virtually impossible. Data disaggregation is fundamentally about designing data systems that are flexible and can change according to what we need to know.

Despite these efforts, OMB’s guidelines have not changed since 1997, hindering efforts for better data collection and reporting.

A key strength of disaggregated data is the separation of variables into independent fields, which programs for flexibility so researchers and policymakers could more accurately represent groups and lessen the reliance on “some other race.” Flexibility in data collection can prevent entire populations from being obscured under a single aggregated racial category, which is vital for health equity. According to a National Academy of Medicine report published earlier this year, the idea that humans can be grouped into discrete categories can lead to failures in capturing the complex patterns of human variation. Researchers caution this can lead to poor scientific results and misguided interpretations in genetic research. This point is key: with disaggregated data, researchers can group data from smaller variables, but we can’t ungroup it.

This principle is similar to the logic of building with LEGO blocks. When we start with the smallest pieces, such as country of origin, race, ethnicity, ancestry, and others, we can build structures of any size or complexity, but if we begin with a pre-assembled chunk, our options for redesign are severely limited. Importantly, when we have detailed, granular data, it offers flexibility, allowing us to group and categorize it as needed, tailoring our approach to the question at hand.

There are use cases when populations can and should be grouped, but we now know that overly aggressive aggregation practices can obscure sub-populations and mask real harm.

If we only see part of America, we can only heal part of America. It’s time to shed light on the invisible millions and make them an integral part of the public health narrative. Let us make the term “some other race” obsolete in our public health discourse.

Juan Carlos Gonzalez Jr. is a health equity researcher, currently serving as assistant vice president for the School of Global Health at Meharry Medical College. He is also a public voices fellow of The OpEd Project in partnership with AcademyHealth.