Generative AI in novel drug development

Within months of its release, ChatGPT is finding use in almost all industries, including biotech, pharma, and healthcare. ChatGPT and other large language models like it belong to a family of artificial intelligence (AI) technologies dubbed generative AI. Based on statistical associations gathered from large datasets, generative AI models produce content similar to the data they are trained on.

This is how ChatGPT produces realistic text, and AI art generators like Midjourney or DALL-E create images in the style of Picasso or van Gogh. Similarly, generative chemistry models dream up chemicals that could pave the way to new medicines, materials, or agrochemicals. For example, a study reported an AI model trained on amino acid sequences generated novel proteins.

Labiotech spoke with Petrina Kamya, head of AI platforms at Insilico Medicine, about how the company uses generative AI to design new drugs. The company employs both generative chemistry and large language models in its drug development programs. Its lead drug candidate targets idiopathic pulmonary fibrosis, a rare lung disease.

How does your generative chemistry model work?

Generative AI algorithms are designed to generate new data samples that resemble a given dataset. What we have done is we’ve taken many different generative AI algorithms such as diffusion models or variational auto-encoders and adapted them towards generating novel drug-like molecular structures. They are trained under FDA-approved drugs or molecules destined towards becoming drugs and they design drug-like molecules.

Are these medicines derivative or can these be novel as well?

They can absolutely be novel. The idea with a generative adversarial network is to design molecules that trick humans into thinking that they have seen them before, but they haven’t. You can generate molecules that look very similar. You can generate one that already exists and molecules that look very, very different. So there’s a spectrum and you can control that to a certain extent. You can control the level of novelty in terms of output.

Hallucination is a problem with ChatGPT in that it can sometimes produce outputs that sound reasonable but are nonsensical. Is there an analogous problem in generative chemistry and how does your platform deal with that?

Generative algorithms are pre-trained to generate molecules and then you have reward functions [on top of it] that act as guardrails in terms of the types of molecules that are output from the platform. The reward function is customizable so that the generative algorithms create molecules that satisfy certain criteria and don’t go off and generate bananas or molecules that don’t make any sense.

ChatGPT struggles with languages that we don’t have a lot of training data on. Do generative chemistry models also perform poorly in the case of diseases, say rare diseases, where we have less data?

No. With generative chemistry algorithms, the limitation is the target that you’re working on. If you are generating molecules and sort of screening them willy-nilly without really knowing the target, that’s the limitation. Once you have a validated target, it’s relatively straightforward to generate molecules that produce the desired outcome.

What does generative chemistry offer for rare disease drug development?

The advantage of using generative AI is that you’re able to optimize the properties of the molecules faster than using traditional methods. With the reward function and the active learning cycle of generative chemistry, you can get relatively decent molecules that you can synthesize and screen for any potential toxic effects and blood-brain barrier penetrance. You have fewer iterative cycles as you’re designing them. I would say that the way generative chemistry helps with rare diseases specifically is in generating medicines faster.

You recently integrated ChatGPT into your platform. How does it help with your work?

There’s a lot of textual data that you can mine. And we’ve mapped that data out for particular diseases in a knowledge graph. We use the ChatGPT functionality to query that knowledge graph and tease out the answers to questions. For example, if you’re looking at scleroderma, you can ask it for the latest drugs that are undergoing clinical trials for scleroderma. It will parse through all of that data or come up with what’s publicly available in terms of peer-reviewed journals and clinical trials. 

What role did generative AI play in the development of the drug candidate against idiopathic pulmonary fibrosis?

We used our platform with 20 different machine learning models to identify targets. And then we used generative chemistry to design molecules that act on those targets. We also used transformers and large language models to predict the probability of our program transitioning from phase to phase.

This story was made possible with support from the National Press Foundation. The Foundation did not influence the research or reporting of this article.