Over the past five years, two unique federal efforts have collected the health records of millions of Americans. They’ve assembled billions of clinical observations, medication logs, lab results, and more with the goal of supercharging public health research.
Now, their dramatically different approaches to data sharing are coming together to put citizens’ real-world health information to work.
advertisement
In October, the National Institutes of Health announced a $30 million grant to establish the Center for Linkage and Acquisition of Data, a new resource for the expansive All of Us research program. For the most part, the program’s data is contributed by its intentionally diverse participants, who fill out lengthy questionnaires and consent for their records to be shared with the program. The new center will widen its aperture on their medical lives by sucking up even more information from a wider range of sources — without participants having to do a thing.
By attempting to cross-pollinate health data from different worlds in ways that respect peoples’ agency and privacy, the center will aim to benefit both patients and health research, filling in gaps that lead to disjointed care and unanswered public health questions.
“There’s just not a lot of people out there that have dealt with health data at this scale, and understand what it takes to really be able to bring in and harmonize that data so that you can ultimately make it useful,” said Chris Lunt, chief technology officer for All of Us.
advertisement
The program found some of those experts in the National Covid Cohort Collaborative, an NIH project developed in response to the need for immediate data on Covid-19 outcomes. During the pandemic, the N3C developed passive data collection techniques that linked sensitive medical data from EHR records at 82 institutions to many other data types — without revealing patients’ identities. Its database of medical records now includes more than 20 million patients and represents more than 8 million Covid cases.
With the public health emergency over, the health system agreements that enabled that data sharing are soon coming to an end. But elements of the collaborative’s technical approach will now be applied to All of Us, in the new center led by N3C lead investigator Melissa Haendel, chief research informatics officer at the University of Colorado Anschutz Medical Campus. In the center’s first 18-month grant period, it will aim to collect data on medical claims, mortality, and environmental factors for the more than 700,000 All of Us participants.
“If we really want to realize the promise of precision medicine and public health — evidence-based decision-making for both — we’ve got to put the patient back together again,” said Haendel, by uniting data streams from different health systems, public health agencies, insurers and more. “Data is more valuable the more it moves, the more people use it, the more they integrate it, the more they analyze it.”
Realizing that goal is an immense challenge. Despite federal efforts to promote interoperability of electronic health records, they often remain stubbornly stuck in independent health systems — keeping both patients and health researchers from seeing the whole picture.
“Our health care system is full of data silos, and there are overlapping legal constructs of privacy laws, data use agreements, and data access rights,” said Lisa Bari, CEO of Civitas Networks for Health. When academics want to use existing health data for research, they need to be enormously careful to preserve patients’ privacy — and the risk of doing it improperly often keeps health organizations from sharing at all.
“The challenge is that not all these sources have high-quality data, are easy to link with individual patients, and can be used in ways aligned with patient permissions and the commitment to privacy and empowerment,” said Harlan Krumholz, professor of medicine and public health at the Yale School of Medicine and co-director of the Yale Open Data Access Project.
To get over those hurdles, All of Us and N3C have taken dramatically different approaches to real-world health data collection.
The United States has invested more than a billion dollars to recruit at least a million participants into All of Us, asking each of them for consent to use their health records for research, along with a significant time commitment to proactively contribute even more medical information. That includes volunteered genetic sequences, health surveys, and wearables data — “data that we will not have in the N3C,” said Haendel.
That’s because N3C collected all of its 20 million records by working with health systems, not directly with patients. To respond quickly to the need for action during the pandemic, the collaborative developed its data-sharing agreements with medical centers so they wouldn’t need to ask each patient for consent to use their existing medical records. Then they developed privacy-preserving techniques to link patient records to other sources of data — mortality databases, insurance claims, vaccine registries, and more — in ways that protected their identities.
“They’ve built great pipelines because of the work they did already on N3C that we’re going to take advantage of,” said Lunt. “What we want to do is take the experience they’ve had in forming those connections and finding the information, but now layer on this more participant-aware way of addressing collecting that data.”
To build up these new connections through the Center for Linkage and Acquisition of Data, or CLAD, All of Us will get participants’ permission to go on a hunting expedition for new data sources that can be linked to their profiles. “We expect to be able to collect EHR data for people for whom we have no EHR data so far, and that’s a real priority for us,” said Lunt. “But this will also then take people for whom we have data and really add to it substantially.”
That’s an important task if All of Us wants to meet its goal of enhancing the presence of underrepresented groups in medical research. “It’s something we obsess about,” said Lunt. “And it’s constant work.” The program intentionally recruits diverse participants; more than 80% are underrepresented in some way, including 50% who identify as a racial or ethnic minority.
But it has run into challenges collecting equally representative data from all participants — for example, because some people are less likely to own their own wearable device, or have responsibilities that make it nearly impossible to complete the program’s extensive health surveys. The new center’s data, passively captured from existing sources, will be less likely to run into that problem.
To grab that data, the program won’t have to share participants’ names and other identifying information to look for matches. Instead, the new center will tokenize identifying information — scrambling it so it’s unreadable without a matching tool. “If you don’t have the underlying information, you can’t read it, so you don’t know who it is,” explained Lunt. “But if your data matches, they’ll go, oh, we both have the same person in here. Now we can have you pass over the information.”
Many of those possible partners, like state health information exchanges, weren’t built to enable health research, and it’s not a given that they’ll want to play ball. “The questions remain: Is the fact that the patients have opted in for All of Us going to change the conversation around research use cases for health data, for clinical data, for EHR level data, for data that’s available in HIEs?” said Bari. “Is that going to speed the process of maturing those research use cases?”
And even if All of Us can port that information into its Researcher Workbench, a platform where accredited researchers can access participants’ data for study, analyzing real-world data is full of pitfalls. “We’ve never had access to this much data,” said Haendel. “It’s really messy. You can really make terrible assumptions and have conclusions that are really wrong. There’s a lot to think about how to use these data effectively and ethically.”
But uniting the deep, participant-first data initiated by All of Us with more records could help researchers ask whole new categories of questions. “The work of this group will be to figure out what is possible — and how that knowledge can best advance research progress,” said Krumholz. “What is important to know is that there is much to work out. If successful, their work will have widespread application across the research ecosystem.”
Lunt and Haendel hope that the techniques advanced within the new center will have important trickle-down effects. “If you look at the big picture on this, this is a capacity not just for All of Us,” said Lunt. “This really is a capacity for the NIH as a whole.” He hopes clinical trials will ultimately be empowered to pull in more complete and precise data on participants — without any special effort, and while maintaining patient privacy.
In theory, these techniques could one day be used to benefit all patients. “Every time we go to the doctor, you fill out that darn form that says, ‘What’s going on with you?’” said Haendel. “It’s a paper form, and some person has to enter it again. And the doctor has to make decisions based upon what they see in front of them, which is incredibly incomplete.”
Haendel wants all that information to be shared seamlessly, so that patients can be flagged for appropriate follow-up even if they don’t remember to share their family history of breast cancer for the fifteenth time, or if their local environmental health agency knows they might have been exposed to a toxin, or if a lapse in insurance suggests they’ve gone without important preventive care, she said. “It’s about how do we do this everywhere, for everyone.”