image for the project BRAI - Biology-informed Robust AI Methods for Inferring Complex Gene Regulatory Networks

BRAI – Biology-informed Robust AI Methods for Inferring Complex Gene Regulatory Networks

About the project

Objective
There are abundant unlabeled and noisy data in research fields of modern biology and medical science. Naturally, estimating biological structures and networks from unlabeled and noisy data widens the scope of future AI-based research in biology, with directly actionable effects in medical science. The major technical challenge is development of robust AI and GenAI methods that can use information hidden in unlabeled and noisy data. A promising path to address the challenge is to include a-priori biological knowledge in developing models for signals and systems, and collecting data, and then regularize the learning of AI methods.

In pursuit of addressing the challenge, we focus on inference of gene regulatory networks (GRNs) from their noisy gene expression level data – a challenging inverse problem in biology. Understanding and knowing a GRN is a key for understanding biological mechanisms causing diseases such as cancer. While gene expression data is available in abundance, the data is unlabeled due to absence of knowing the true GRNs underneath. In addition, the expression data is noisy. So far, use of AI for robust estimation of large-size GRNs from unlabeled and noisy gene expression level data has been little exercised. Indeed, learning from unlabeled and noise data is challenging for AI methods. Here comes the motivation for the proposed project – Biology-informed Robust AI (BRAI). The objective of the BRAI project is to develop fundamental theory and tools for inferring complex biological structures and networks from unlabeled and noisy data using a-priori biological knowledge, focusing on the challenging inverse problem ‘GRN inference’.

image for the project BRAI - Biology-informed Robust AI Methods for Inferring Complex Gene Regulatory Networks

Background
The human reference genome contains somewhere between 19,000 – 20,000 protein-coding genes. For human cells (ex. cancer cells), GRNs are large. In reality, the GRNs are not observed directly. They are observed through the gene expression data. Therefore, it is difficult to collect labeled data as pairwise GRN- and -expression data for training AI and machine learning (ML) in a standard supervised learning approach. On the other hand, there are gene expression data available in abundance as unlabeled data, without the true GRNs underneath.

The actual functional relationship between a GRN matrix and its expression data is governed by complex biophysics. For complex biological systems like cancer cells, the true functional relationship governing GRN-to- expression data is unknown, and difficult to model. In addition, the gene expression data is noisy, as the expression data contains not only information from the hidden GRN, but other known-and-unknown biological events.

Naturally, the GRN inference problem – estimating a large GRN from its noisy gene expression data without having labeled data and knowing their actual functional relationship – is a challenging inverse problem.

Cross-disciplinary collaboration
The project will combine methods and techniques from separate research fields – (a) biological knowledge about GRNs from bioinformatics and system biology, (b) graph theory and topological data analysis for network modeling from mathematics, and (c) robust machine learning (ML) and GenAI from AI / ML.

Project period

01/07/2025 – 30/06/2028

Type of call

Flagship

Societal context

Rich and Healthy Life

Research themes

Learn

Partner

KTH

SciLifeLab

Project status

Ongoing

Contacts