CZ CELL×GENE Discover: A single-cell data platform for scalable exploration, analysis and modeling of aggregated data

CZI Single-Cell Biology Program(Wellcome Sanger Institute), Shibla Abdulla(Wellcome Sanger Institute), Brian D. Aevermann(Chan Zuckerberg Initiative (United States)), Pedro Assis(Chan Zuckerberg Initiative (United States)), Seve Badajoz(Chan Zuckerberg Initiative (United States)), Sidney M. Bell(Chan Zuckerberg Initiative (United States)), Emanuele Bezzi(Wellcome Sanger Institute), Batuhan Çakır(Wellcome Sanger Institute), James Chaffer(Chan Zuckerberg Initiative (United States)), Signe Chambers(Chan Zuckerberg Initiative (United States)), J. Michael Cherry(Chan Zuckerberg Initiative (United States)), Tiffany Chi(Chan Zuckerberg Initiative (United States)), Jennifer Chien(Chan Zuckerberg Initiative (United States)), Leah C. Dorman(Chan Zuckerberg Initiative (United States)), Pablo E. García-Nieto(Chan Zuckerberg Initiative (United States)), Nayib Gloria(Chan Zuckerberg Initiative (United States)), Mim Hastie(Chan Zuckerberg Initiative (United States)), Daniel Hegeman(Chan Zuckerberg Initiative (United States)), Jason A. Hilton(Chan Zuckerberg Initiative (United States)), Timmy Huang(Chan Zuckerberg Initiative (United States)), Amanda Infeld(Chan Zuckerberg Initiative (United States)), Ana-Maria Istrate(Chan Zuckerberg Initiative (United States)), Ivana Jelic(Chan Zuckerberg Initiative (United States)), Kuni Katsuya(Chan Zuckerberg Initiative (United States)), Yang Joon Kim(Chan Zuckerberg Initiative (United States)), Karen Liang(Chan Zuckerberg Initiative (United States)), Mike C. Lin(Chan Zuckerberg Initiative (United States)), Maximilian Lombardo(Chan Zuckerberg Initiative (United States)), Bailey Marshall(Chan Zuckerberg Initiative (United States)), Bruce Martin(Chan Zuckerberg Initiative (United States)), Fran McDade(Chan Zuckerberg Initiative (United States)), Colin Megill(Chan Zuckerberg Initiative (United States)), Nikhil Patel(Wellcome Sanger Institute), Alexander V. Predeus(Wellcome Sanger Institute), Brian Raymor(Chan Zuckerberg Initiative (United States)), Behnam Robatmili(Chan Zuckerberg Initiative (United States)), Dave Rogers(Stanford University), Erica Rutherford(Chan Zuckerberg Initiative (United States)), Dana Sadgat(Chan Zuckerberg Initiative (United States)), Andrew Shin(Chan Zuckerberg Initiative (United States)), Corinn Small(Chan Zuckerberg Initiative (United States)), Trent M. Smith(Chan Zuckerberg Initiative (United States)), Prathap Sridharan(Chan Zuckerberg Initiative (United States)), Alexander J. Tarashansky(Chan Zuckerberg Initiative (United States)), Norbert K. Tavares(Chan Zuckerberg Initiative (United States)), Harley Thomas(Chan Zuckerberg Initiative (United States)), Andrew Tolopko(Chan Zuckerberg Initiative (United States)), Meghan Urisko(Chan Zuckerberg Initiative (United States)), Joyce Yan(Chan Zuckerberg Initiative (United States)), Garabet Yeretssian(Chan Zuckerberg Initiative (United States)), Jennifer Zamanian(Chan Zuckerberg Initiative (United States)), Arathi Mani(Chan Zuckerberg Initiative (United States)), Jonah Cool(Chan Zuckerberg Initiative (United States)), Ambrose Carr(Chan Zuckerberg Initiative (United States))
bioRxiv (Cold Spring Harbor Laboratory)
November 2, 2023
Cited by 128Open Access
Full Text

Abstract

Abstract Hundreds of millions of single cells have been analyzed to date using high throughput transcriptomic methods, thanks to technological advances driving the increasingly rapid generation of single-cell data. This provides an exciting opportunity for unlocking new insights into health and disease, made possible by meta-analysis that span diverse datasets building on recent advances in large language models and other machine learning approaches. Despite the promise of these and emerging analytical tools for analyzing large amounts of data, a major challenge remains the sheer number of datasets and inconsistent format, data models and accessibility. Many datasets are available via unique portals platforms that often lack interoperability. Here, we present CZ CellxGene Discover ( cellxgene.cziscience.com ), a data platform that provides curated and interoperable data. This single-cell data resource, available via a free-to-use online data portal, hosts a growing corpus of community contributed data that spans more than 50 million unique cells. Curated, standardized, and associated with consistent cell-level metadata, this collection of interoperable single-cell transcriptomic data is the largest of its kind. A suite of tools and features enables accessibility and reusability of the data via both computational and visual interfaces to allow researchers to rapidly explore individual datasets and perform cross-corpus analysis. This functionality is enabling meta-analyses of tens of millions of cells across studies and tissues and providing global views of human cells at the resolution of single cells.


Related Papers

No related papers found

Powered by citation graph analysis