Another upload of genetic sequence data from the H5N1 bird flu outbreak in dairy cattle has exacerbated the scientific community’s frustration with the U.S. Department of Agriculture after the agency again failed to include basic information needed to track how the virus is changing as it spreads.
Like a large tranche of sequences that the USDA uploaded to a public database on April 21, this week’s data dump did not include information about where and when the sequenced samples were obtained from cows or other sequenced animals. All are simply labeled with “USA” and “2024.”
advertisement
A key goal of monitoring genetic sequences in an outbreak is to track the evolution of a spreading virus, in this case to see if transmission among a new mammalian species is leading to changes that could make H5N1 more transmissible to and among people. Without the equivalent of a time stamp on the individual sequences, that’s much more difficult to do, scientists told STAT.
“We know what was happening a month ago, but we don’t know what’s happening now. Or it’s less clear what’s happening now,” said Thomas Peacock, an influenza virologist at the Pirbright Institute, a British organization that focuses on controlling viral illnesses in animals.
Cows in 36 herds in nine states are known to have tested positive for the virus. But it is widely believed the outbreak, which may have begun late last year, is more widespread than the number of confirmed outbreaks would suggest.
advertisement
Many of the 87 new sequences that were uploaded to the database of the National Center for Biotechnology Information — run by the National Institutes of Health’s National Library of Medicine — are from samples retrieved from poultry and wild birds, and may not pertain to the dairy cow outbreak. But 10 of the new viral sequences are from cattle, two more are from cats, and another is from a pigeon. These sequences are all believed to be part of the outbreak.
The fact that basic information — called metadata — isn’t being shared about the samples “hinders our efforts a lot,” said Gytis Dudas, a senior researcher in genomic epidemiology and metagenomics at the Vilnius University Life Sciences Center in Lithuania. Dudas is working with a group of U.S. and international researchers to try to make sense of what the genetic sequences say about the H5N1 outbreak in cows.
A number of scientists have openly questioned whether the USDA is deliberately withholding these data, or even removing more specific information.
“I can’t imagine that they’d be getting these samples, running the sequences, and not somehow recording that data for themselves, for what state it came from and what date it was sampled. That’s really extremely basic data,” said Angela Rasmussen, a virologist who studies emerging zoonotic pathogens — disease threats that jump from animals to humans — at the Vaccine and Infectious Disease Organization at the University of Saskatchewan, in Saskatoon, Canada.
A USDA spokesman denied that the department is taking metadata off the sequence files before uploading them. In an email exchange with STAT, he said samples it receives contain only laboratory information numbers when they are sequenced. “Metadata is added by [Animal and Plant Health Inspection Service] staff after the sequencing occurs,” he said. “APHIS adds ‘USA’ and ‘2024’ as metadata tags and posts the sequences as they become available, in order to expedite public access to sequence data.”
The department has committed to sharing raw sequence data as quickly as it is available and has said it will upload what are called “consensus sequences” in an internationally used database, GISAID — the Global Initiative on Sharing All Influenza Data — when they are ready. Consensus sequences are more thoroughly edited and contain the metadata scientists are seeking.
It’s not just academic scientists who are seeking it, Peacock said, noting international public health agencies that are trying to assess the risk the U.S. outbreak poses are keen to get more data too. “They’re just being much more quiet about it. But you know they’re all requesting this and not getting it as well, as far as I’m aware.”
The USDA has only posted consensus sequences to GISAID from this outbreak once, in late March. It’s clear, though, that they have many more than they have shared to date. At an online symposium last week, Rosemary Sifford, the USDA’s chief veterinary officer, showed a phylogenetic tree featuring dozens of sequences, using the figure to explain that the department believes the outbreaks across the country are all linked and began from one spillover of the virus from wild birds to cows, likely in Texas.
A phylogenetic tree is like a family tree of a virus, showing how it is changing over time, but also providing a sense of when the virus spilled over from wild birds into cattle. The genetic sequence data available so far suggest that it occurred in late 2023 or early 2024.
The sequences featured in the phylogenetic tree in Sifford’s presentation would have been consensus sequences, Peacock said. “It does suggest they have them and they’re just not uploading them.”
The group of scientists Peacock, Dudas, and Rasmussen are part of quickly went through the sequences on the slide Sifford showed, harvesting from it the metadata the USDA has to date failed to provide. “That was less than ideal,” Dudas said.