Exploring Single-Cell Data with Deep Multitasking Neural Networks

Matthew Amodio(Yale University), David van Dijk(Yale University), Krishnan Srinivasan(Yale University), William S. Chen(Yale University), Hussein Mohsen(Yale University), Kevin R. Moon(Yale University), Allison M. Campbell(Yale University), Yujiao Zhao(Yale University), Xiaomei Wang(Yale University), Manjunatha M. Venkataswamy(National Institute of Mental Health and Neurosciences), Anita Desai(National Institute of Mental Health and Neurosciences), Vasanthapuram Ravi(National Institute of Mental Health and Neurosciences), Priti Kumar(Yale University), Ruth R. Montgomery(Yale University), Guy Wolf(Yale University), Smita Krishnaswamy(Yale University)
bioRxiv (Cold Spring Harbor Laboratory)
December 19, 2017
Cited by 47Open Access
Full Text

Abstract

Abstract Biomedical researchers are generating high-throughput, high-dimensional single-cell data at a staggering rate. As costs of data generation decrease, experimental design is moving towards measurement of many different single-cell samples in the same dataset. These samples can correspond to different patients, conditions, or treatments. While scalability of methods to datasets of these sizes is a challenge on its own, dealing with large-scale experimental design presents a whole new set of problems, including batch effects and sample comparison issues. Currently, there are no computational tools that can both handle large amounts of data in a scalable manner (many cells) and at the same time deal with many samples (many patients or conditions). Moreover, data analysis currently involves the use of different tools that each operate on their own data representation, not guaranteeing a synchronized analysis pipeline. For instance, data visualization methods can be disjoint and mismatched with the clustering method. For this purpose, we present SAUCIE, a deep neural network that leverages the high degree of parallelization and scalability offered by neural networks, as well as the deep representation of data that can be learned by them to perform many single-cell data analysis tasks, all on a unified representation. A well-known limitation of neural networks is their interpretability. Our key contribution here are newly formulated regularizations (penalties) that render features learned in hidden layers of the neural network interpretable. When large multi-patient datasets are fed into SAUCIE, the various hidden layers contain denoised and batch-corrected data, a low dimensional visualization, unsupervised clustering, as well as other information that can be used to explore the data. We show this capability by analyzing a newly generated 180-sample dataset consisting of T cells from dengue patients in India, measured with mass cytometry. We show that SAUCIE, for the first time, can batch correct and process this 11-million cell data to identify cluster-based signatures of acute dengue infection and create a patient manifold, stratifying immune response to dengue on the basis of single-cell measurements.


Related Papers

No related papers found

Powered by citation graph analysis