Research suggests that two protein-coding genes can predict severe COVID-19 disease

In a recent study published in medRxiv* Prepress server, researchers show that two genes, GTPase, a member of the IMAP 7 family (GIMAP7), and sphingosine-1-phosphate receptor 2 (S1PR2), have the potential to predict severe coronavirus disease 2019 (COVID-19) with ~ 90% accuracy.

The study: A transcriptome-wide association meta-analysis predicts the presence of two robust human biomarkers for severe SARS-CoV-2 infection. Image Credit: Marcin Janiec / Shutterstock


The host response to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection was diverse; Accordingly, the demand for biomarkers associated with the severity of the disease COVID-19 has been constantly increasing. Multiple studies have demonstrated that different factors contribute to the observed differences in COVID-19 severity by evaluating associations between disease severity and different aspects of the adaptive immune system. However, there is a lack of studies exploring transcriptional biomarkers associated with mild versus severe COVID-19.

about studying

In this study, researchers performed a meta-analysis of available human transcriptome data to identify textual predictive markers to inform decisions about the care of patients with SARS-CoV-2 undergoing hospitalization. The team searched relevant data sets based on three pre-defined criteria, as follows:

i) the host was a human being;

2) data were generated by RNA-sequencing experiments (RNA-seq);

c) Peripheral human blood or mononuclear cell samples (PBMCs) were collected from patients during the acute phase of SARS-CoV-2 infection and were associated with descriptive data for COVID-19 severity.

They obtained 358 public human transcript samples from three independent RNA-seq studies of the Comprehensive Gene Expression (GEO) database. Furthermore, the researchers subjected these samples to a specialized data processing workflow called the Automated Repeatable Modular Workflow for Preprocessing and Differential Analysis of RNA-seq (ARMOR) data to determine gene expression in each patient. This process used salmon to assign the reads to the human genome reference transcript set of construct 38 (GRCh38).

Similarly, they used edgeR to calculate differential gene expression (DGE) from read numbers; Finally, they used the camera to calculate Gene Ontology (GO) terms from the list of gene identifiers produced by edgeR. Shifting z-Score normalized salmon counts for each gene in each GEO sample.

Finally, the team trained a machine-learning algorithm on read count data to determine which genes could best separate patient samples based on COVID-19 severity and produce a list of genes based on Gini impurity values ​​that measure entropy. Transcripts from genes with greater Gini Impurity values ​​represent genes that can accurately predict COVID-19 phenotype.


Study samples were assigned high or low intensities and processed to obtain quality trimmed readings that were mapped to the human version to calculate DEG levels. Overall, the authors identified 8176 important DEGs, the most important of which are aspartate beta hydroxylase (ASPH), chromosome 5 open reading frame 30 (C5orf30), ​​diacylglycerol kinase eta (DGKH), and solute carrier family 26 (SLC26A6).

GO enrichment yielded 90 significant GO terms, including apoptosis, immune response and I-kappaB kinase/NF-kappaB signaling. Furthermore, the authors evaluated the intracellular signaling pathways that were best represented by DEGs using the signaling pathway effect analysis algorithm. The analysis showed nine signaling pathways that were significantly affected by the severe COVID-19 virus. Of these nine pathways, five were directly associated with T-cell receptor (TCR) signaling, while the sixth described zeta-chain-associated protein kinase 70 (Zap70); Significantly, all six pathways remained inhibited during severe COVID-19.

The team built a table containing all the transcripts from each gene and the readout mapping data was represented as tables and rows, respectively, to generate a receiver operator characteristic (ROC) curve. The authors noted an area under the curve (AUC) of 96.6% across all transcripts, indicating that the host transcriptional response contributed to the severity of COVID-19.

The AUC of six genes with the highest genotype-deficient values ​​was 94.3%. The analysis quantified a pooled AUC of 89.8% for the top DEGs, GIMAP7 and S1PR2. Furthermore, the mean and read counts for both of these genes were almost three times higher in the samples with low risk of COVID-19.


A previous study on SARS-CoV-2 transcriptional biomarkers identified the GIMAP7 gene; However, it did not classify it as a top biomarker. The current study approach allowed the researchers to discover the direction of the GIMAP7 and S1PR2 genes. Furthermore, the study results illustrate the ups and downs of the genetic regulation that characterizes each COVID-19 patient in a more diverse population.

Future studies should investigate whether these biomarkers are still consistently predictive of infection severity in patients infected with the latest SARS-CoV-2 variants, such as Omicron. Also, additional trials are required to confirm whether the study results can be replicated in samples of patients of different ages and risk groups. However, the study devised a predictive test that could contribute to triage efforts of patients at higher risk of severe COVID-19 and help reduce the burden on hospital resources globally.

*Important note

medRxiv It publishes preliminary scientific reports that have not been peer-reviewed and therefore should not be considered conclusive, guide clinical practice/health-related behaviour, or be treated as established information.


Leave a Reply

Your email address will not be published.