Novel AI Tool for Gene Discovery in Research and Clinical Settings

Scientists from the Agency for Science, Technology and Research (A*STAR) Genome Institute of Singapore (GIS) say they have developed a new tool, named Bambu, which uses artificial intelligence to identify and characterize new genes, enabling an adaptable analysis across various species and samples.

With a better understanding of which and how genes are expressed in samples, Bambu (named after the bamboo plant) can provide deeper insights of how cells function, according to the researchers, who explain that it is a long-read RNA sequencing tool that can be used in both clinical and research settings to discover how DNA encodes novel transcripts and quantifies them.

The team’s study “Context-aware transcript quantification from long-read RNA-seq data with Bambu” appears in Nature Methods.

“Most approaches to transcript quantification rely on fixed reference annotations; however, the transcriptome is dynamic and depending on the context, such static annotations contain inactive isoforms for some genes, whereas they are incomplete for others. Here we present Bambu, a method that performs machine-learning-based transcript discovery to enable quantification specific to the context of interest using long-read RNA-sequencing,” write the investigators.

bamboo
The new tool, Bambu, which is named after the bamboo plant, uses artificial intelligence to identify and characterize new genes, enabling an adaptable analysis across various species and samples. [Drasa/Getty Images]

“To identify novel transcripts, Bambu estimates the novel discovery rate, which replaces arbitrary per-sample thresholds with a single, interpretable, precision-calibrated parameter. Bambu retains the full-length and unique read counts, enabling accurate quantification in presence of inactive isoforms. Compared to existing methods for transcript discovery, Bambu achieves greater precision without sacrificing sensitivity.

“We show that context-aware annotations improve quantification for both novel and known transcripts. We apply Bambu to quantify isoforms from repetitive HERVH-LTR7 retrotransposons in human embryonic stem cells, demonstrating the ability for context-specific transcript expression analysis.”

To explore the unknown parts of genomes, be it for human, fish, or flowers, A*STAR’s researchers developed Bambu, which can identify new transcripts and quantify them with a high degree of precision and sensitivity, providing a more comprehensive understanding of an organism’s genetic makeup, say the scientists.

This will allow researchers to identify new role players, such as genes, proteins, and other elements in their field of research and expand their ability to research organisms that are currently under-studied. Furthermore, the discovery of new genes, especially from clinical samples, can lead to the identification of biomarkers for the early detection of diseases or as targets of therapeutics.

“It is fascinating to see that scientists are still discovering new genes even in genomes that have been studied for many years, such as the human or mouse genome. However, the key question is if these transcripts are relevant, or if they could be artifacts. To address this, Bambu quantifies the probability that a transcript is real, making transcript and gene discovery much more reliable,” explained Jonathan Göke, PhD, group leader of the laboratory of computational transcriptomics at A*STAR’s GIS and the corresponding author of the study.

“By providing such a measure of confidence, Bambu can more reliably be applied to find new genes that play a role in human diseases such as cancer.”