Population-level integration of single-cell datasets enables multi-scale analysis across samples

Carlo De Donno(Helmholtz Zentrum München), Soroor Hediyeh-zadeh(Helmholtz Zentrum München), Amir Ali Moinfar(Helmholtz Zentrum München), Marco Wagenstetter(Helmholtz Zentrum München), Luke Zappia(Helmholtz Zentrum München), Mohammad Lotfollahi(Wellcome Sanger Institute), Fabian J. Theis(Wellcome Sanger Institute)
Nature Methods
October 9, 2023
Cited by 108Open Access
Full Text

Abstract

The increasing generation of population-level single-cell atlases has the potential to link sample metadata with cellular data. Constructing such references requires integration of heterogeneous cohorts with varying metadata. Here we present single-cell population level integration (scPoli), an open-world learner that incorporates generative models to learn sample and cell representations for data integration, label transfer and reference mapping. We applied scPoli on population-level atlases of lung and peripheral blood mononuclear cells, the latter consisting of 7.8 million cells across 2,375 samples. We demonstrate that scPoli can explain sample-level biological and technical variations using sample embeddings revealing genes associated with batch effects and biological effects. scPoli is further applicable to single-cell sequencing assay for transposase-accessible chromatin and cross-species datasets, offering insights into chromatin accessibility and comparative genomics. We envision scPoli becoming an important tool for population-level single-cell data integration facilitating atlas use but also interpretation by means of multi-scale analyses.


Related Papers

No related papers found

Powered by citation graph analysis