Attention Based Models for Cell Type Classification on Single-Cell RNA-Seq Data

Basic Information

Abstract

Cell type classification serves as one of the most fundamental analyses in bioinformatics. It helps discovering new cell types, recognizing tumor cells in cancer microenvironment and facilitating the downstream tasks such as trajectory inference. Single-cell RNA-sequencing (scRNA-seq) technology can profile the whole transcriptome of different cells, thus providing invaluable data for cell type classification. Existing cell type classification methods can be mainly categorized into statistical models and neural network models. The statistical models either make hypotheses on the gene expression distribution which may not be consistent with the real data, or heavily rely on prior knowledge such as marker genes for specific cell types. By contrast, the neural networks are more robust and flexible, while it is hard to interpret the biological meanings hidden behind a mass of model parameters. Recently, the attention mechanism has been widely applied in diverse fields due to the good interpretability of the attention weights. In this paper, we examine the effectiveness and interpretability of the attention mechanism by proposing two novel models for the cell type classification task. The first model classifies cells by a capsule attention network (CAN) that performs attention on the capsule features extracted for cells. To align the features with genes, the second model first factorizes the scRNA-seq matrix to obtain the representation vectors for all genes and cells, and then performs the attention operation on the cell and gene vectors. We name it Cell-Gene Representation Attention network (CGRAN). Experiments show that our attention-based models achieve higher accuracy in cell type classification compared to existing methods on diverse datasets. Moreover, the key genes picked by their high attention scores in different cell types perfectly match with the acknowledged marker genes.