In 2018, during her chemistry Nobel Prize lecture, Frances Arnold noted that scientists had arrived at a point where they could read, write, and edit any sequence of DNA. But composing whole genes or even whole genomes from scratch — that was something only evolution could do.
A few years later, not long after helping to launch the Arc Institute, a nonprofit research center in the Bay Area, molecular engineer Patrick Hsu wondered if it was possible to imitate the forces of evolution that Arnold had been referring to. DNA is a language, after all, and with all the advances in generative AI — chatbots that could hold eerily lifelike conversations if trained on enough text — maybe recreating all the cellular complexity contained in a genome wasn’t that far behind.
advertisement
Working with Brian Hie, a computational biologist at Stanford University and a fellow Arc Institute member, Hsu, who is also an assistant professor at the University of California, Berkeley, began assembling a team of scientists to train an AI model on vast troves of biological data — 300 billion DNA letters, including long sequences from 80,000 genomes of bacteria and archaea.
STAT+ Exclusive Story
Already have an account? Log in
Get unlimited access to award-winning journalism and exclusive events.