CZ CELLxGENE Discover: a single-cell data platform for scalable exploration, analysis and modeling of aggregated data

CZI Cell Science Program(Wellcome Sanger Institute), Shibla Abdulla(Wellcome Sanger Institute), Brian D. Aevermann(Chan Zuckerberg Initiative (United States)), Pedro Assis(Chan Zuckerberg Initiative (United States)), Seve Badajoz(Chan Zuckerberg Initiative (United States)), Sidney M. Bell(Chan Zuckerberg Initiative (United States)), Emanuele Bezzi(Wellcome Sanger Institute), Batuhan Çakır(Wellcome Sanger Institute), Jim Chaffer(Chan Zuckerberg Initiative (United States)), Signe Chambers(Chan Zuckerberg Initiative (United States)), J. Michael Cherry(Chan Zuckerberg Initiative (United States)), Tiffany Chi(Chan Zuckerberg Initiative (United States)), Jennifer Chien(Chan Zuckerberg Initiative (United States)), Leah C. Dorman(Chan Zuckerberg Initiative (United States)), Pablo E. García-Nieto(Chan Zuckerberg Initiative (United States)), Nayib Gloria(Chan Zuckerberg Initiative (United States)), Mim Hastie(Chan Zuckerberg Initiative (United States)), Daniel Hegeman(Chan Zuckerberg Initiative (United States)), Jason A. Hilton(Chan Zuckerberg Initiative (United States)), Timmy Huang(Chan Zuckerberg Initiative (United States)), Amanda Infeld(Chan Zuckerberg Initiative (United States)), Ana-Maria Istrate(Chan Zuckerberg Initiative (United States)), Ivana Jelic(Chan Zuckerberg Initiative (United States)), Kuni Katsuya(Chan Zuckerberg Initiative (United States)), Yang Joon Kim(Chan Zuckerberg Initiative (United States)), Karen Liang(Chan Zuckerberg Initiative (United States)), Mike C. Lin(Chan Zuckerberg Initiative (United States)), Maximilian Lombardo(Chan Zuckerberg Initiative (United States)), Bailey Marshall(Chan Zuckerberg Initiative (United States)), Bruce Martin(Chan Zuckerberg Initiative (United States)), Fran McDade(Chan Zuckerberg Initiative (United States)), Colin Megill(Chan Zuckerberg Initiative (United States)), Nikhil Patel(Wellcome Sanger Institute), Alexander V. Predeus(Wellcome Sanger Institute), Brian Raymor(Chan Zuckerberg Initiative (United States)), Behnam Robatmili(Chan Zuckerberg Initiative (United States)), Dave Rogers(Stanford University), Erica Rutherford(Chan Zuckerberg Initiative (United States)), Dana Sadgat(Chan Zuckerberg Initiative (United States)), Andrew Shin(Chan Zuckerberg Initiative (United States)), Corinn Small(Chan Zuckerberg Initiative (United States)), Trent M. Smith(Chan Zuckerberg Initiative (United States)), Prathap Sridharan(Chan Zuckerberg Initiative (United States)), Alexander J. Tarashansky(Chan Zuckerberg Initiative (United States)), Norbert K. Tavares(Chan Zuckerberg Initiative (United States)), Harley Thomas(Chan Zuckerberg Initiative (United States)), Andrew Tolopko(Chan Zuckerberg Initiative (United States)), Meghan Urisko(Chan Zuckerberg Initiative (United States)), Joyce Yan(Chan Zuckerberg Initiative (United States)), Garabet Yeretssian(Chan Zuckerberg Initiative (United States)), Jennifer Zamanian(Chan Zuckerberg Initiative (United States)), Arathi Mani(Chan Zuckerberg Initiative (United States)), Jonah Cool(Chan Zuckerberg Initiative (United States)), Ambrose Carr(Chan Zuckerberg Initiative (United States))
Nucleic Acids Research
November 28, 2024
Cited by 296Open Access
Full Text

Abstract

Hundreds of millions of single cells have been analyzed using high-throughput transcriptomic methods. The cumulative knowledge within these datasets provides an exciting opportunity for unlocking insights into health and disease at the level of single cells. Meta-analyses that span diverse datasets building on recent advances in large language models and other machine-learning approaches pose exciting new directions to model and extract insight from single-cell data. Despite the promise of these and emerging analytical tools for analyzing large amounts of data, the sheer number of datasets, data models and accessibility remains a challenge. Here, we present CZ CELLxGENE Discover (cellxgene.cziscience.com), a data platform that provides curated and interoperable single-cell data. Available via a free-to-use online data portal, CZ CELLxGENE hosts a growing corpus of community-contributed data of over 93 million unique cells. Curated, standardized and associated with consistent cell-level metadata, this collection of single-cell transcriptomic data is the largest of its kind and growing rapidly via community contributions. A suite of tools and features enables accessibility and reusability of the data via both computational and visual interfaces to allow researchers to explore individual datasets, perform cross-corpus analysis, and run meta-analyses of tens of millions of cells across studies and tissues at the resolution of single cells.


Related Papers