New machine-learning algorithm discovers gene signature property of tumors

How are cancer cells different from healthy cells? A new machine-learning algorithm called Ikarus knows the answer, according to a report by a team led by MDC bioinformatician Altona Akalin in the journal Genome Biology. An AI program has found a genetic hallmark of tumors.

When it comes to identifying patterns in mountains of data, humans are no match for artificial intelligence (AI). In particular, a branch of artificial intelligence called machine learning is often used to find regularities in data sets — whether it’s for stock market analysis, image and speech recognition, or cell classification. To reliably distinguish between cancerous cells and healthy cells, a team led by Dr. Altona Akalin, Head of the Omics Bioinformatics and Data Science Platform at the Max Delbrück Center for Molecular Medicine at the Helmholtz Society (MDC), has developed a machine learning program. It’s called “Icarus”. The program found a pattern in cancer cells common to different types of cancer, made up of a distinct set of genes. According to the team’s paper in the journal Genome Biology, the algorithm also detected gene types in a pattern that had not been clearly linked to cancer before.

Machine learning basically means that an algorithm uses training data to learn how to answer certain questions on its own. It does this by looking for patterns in the data that help it solve problems. After the training phase, the system can generalize from what it has learned in order to evaluate unknown data.

Obtaining the appropriate training data had been a major challenge as the experts had already clearly distinguished between ‘healthy’ and ‘cancerous’ cells. “

Jean Dumaine, first author of the paper

Surprisingly high success rate

In addition, single-cell sequencing data sets are often noisy. This means that the information they contain about the molecular characteristics of individual cells is not very accurate – perhaps because a different number of genes are detected in each cell, or because samples are not always processed in the same way. As reported by Dohmen and his colleague Dr Vedran Franke, co-leader of the study, they searched countless publications and contacted quite a few research groups in order to obtain sufficient data sets. The team eventually used data from lung and colorectal cancer cells to train the algorithm before applying it to data sets for other types of tumors.

In the training phase, Icarus had to find a list of distinct genes which he then used to classify cells. “We’ve tried and refined different approaches,” says Domin. It was a time-consuming work, say the three scientists. “The key was for Icarus to eventually use two lists: one for cancer genes and one for genes from other cells,” Frank explains. After the learning phase, the algorithm was able to reliably distinguish healthy cells from cancer cells in other cancers as well, such as tissue samples from liver cancer or neuroblastoma patients. Its success rate was unusually high, which surprised even the research group. “We did not expect that there would be a common signature that accurately identifies cancer cells for different types of cancer,” Akalin says. “But we still can’t say if the method works for all types of cancer,” Domain adds. To turn Icarus into a reliable tool for diagnosing cancer, researchers now want to test it on other types of tumors.

Artificial intelligence as a fully automated diagnostic tool

The project aims to go beyond categorizing ‘healthy’ versus ‘cancerous’ cells. In preliminary tests, Icarus has already demonstrated that the method can also distinguish other types (and certain subtypes) of cells from cancer cells. “We want to make this approach more comprehensive, and develop it further so that it can differentiate all of the possible cell types in the biopsy,” Akalin says.

In hospitals, pathologists tend only to examine tumor tissue samples under a microscope in order to identify different cell types. It is hard work and time consuming. With Icarus, this move could one day become a fully automated process. Furthermore, Akalin notes that the data can be used to draw conclusions about the immediate environment of the tumor. This can help doctors choose the best treatment. For cancer tissue makeup and microenvironment, this often indicates whether a particular treatment or drug will be effective. Moreover, AI may also be useful in developing new drugs. “Ikarus allows us to identify genes that may be potential drivers of cancer,” Akalin says. New therapeutic agents can then be used to target these molecular structures.

Collaboration between home and office

One great aspect of the publication is that it is fully prepared during the COVID pandemic. Not all participants were in their usual offices at the Berlin Institute for Medical Systems Biology (BIMSB), which is part of the MDC. Instead, they were in home offices and only communicated with each other digitally. In Frankie’s view, “the project shows that a digital architecture can be created to facilitate scientific work under these conditions.”


Max Delbrück Center for Molecular Medicine at the Helmholtz Society

Journal reference:

Domin, c. et al. (2022) Identification of cancer cells at the single-cell level using machine learning. Genome biology.–022–02683‐1.


Leave a Reply

Your email address will not be published.