CZ CELLxGENE Discover: a single-cell data platform for scalable exploration, analysis and modeling of aggregated dataHundreds of millions of single cells have been analyzed using high-throughput transcriptomic methods. The cumulative knowledge within these datasets provides an exciting opportunity for unlocking insights into health and disease at the level of single cells. Meta-analyses that span diverse datasets building on recent advances in large language models and other machine-learning approaches pose exciting new directions to model and extract insight from single-cell data. Despite the promise of these and emerging analytical tools for analyzing large amounts of data, the sheer number of datasets, data models and accessibility remains a challenge. Here, we present CZ CELLxGENE Discover (cellxgene.cziscience.com), a data platform that provides curated and interoperable single-cell data. Available via a free-to-use online data portal, CZ CELLxGENE hosts a growing corpus of community-contributed data of over 93 million unique cells. Curated, standardized and associated with consistent cell-level metadata, this collection of single-cell transcriptomic data is the largest of its kind and growing rapidly via community contributions. A suite of tools and features enables accessibility and reusability of the data via both computational and visual interfaces to allow researchers to explore individual datasets, perform cross-corpus analysis, and run meta-analyses of tens of millions of cells across studies and tissues at the resolution of single cells.
cellxgene: a performant, scalable exploration platform for high dimensional sparse matricesColin Megill, Bruce Martin, Charlotte A. Weaver et al.|bioRxiv (Cold Spring Harbor Laboratory)|2021 Abstract Quickly and flexibly exploring high-dimensional datasets, such as scRNAseq data, is underserved but critical for hypothesis generation, dataset annotation, publication, sharing, and community reuse. cellxgene is a highly generalizable, web-based interface for exploring high dimensional datasets along categorical, continuous and spatial dimensions, as well as feature annotation. cellxgene is differentiated by its ability to performantly handle millions of observations, and bridges a critical gap by enabling computational and experimental biologists to iteratively ask questions of private and public datasets. In doing so, cellxgene increases the utility and reusability of datasets across the single-cell ecosystem. The codebase can be accessed at https://github.com/chanzuckerberg/cellxgene . For questions and inquiries, please contact cellxgene@chanzuckerberg.com .
CZ CELL×GENE Discover: A single-cell data platform for scalable exploration, analysis and modeling of aggregated dataAbstract Hundreds of millions of single cells have been analyzed to date using high throughput transcriptomic methods, thanks to technological advances driving the increasingly rapid generation of single-cell data. This provides an exciting opportunity for unlocking new insights into health and disease, made possible by meta-analysis that span diverse datasets building on recent advances in large language models and other machine learning approaches. Despite the promise of these and emerging analytical tools for analyzing large amounts of data, a major challenge remains the sheer number of datasets and inconsistent format, data models and accessibility. Many datasets are available via unique portals platforms that often lack interoperability. Here, we present CZ CellxGene Discover ( cellxgene.cziscience.com ), a data platform that provides curated and interoperable data. This single-cell data resource, available via a free-to-use online data portal, hosts a growing corpus of community contributed data that spans more than 50 million unique cells. Curated, standardized, and associated with consistent cell-level metadata, this collection of interoperable single-cell transcriptomic data is the largest of its kind. A suite of tools and features enables accessibility and reusability of the data via both computational and visual interfaces to allow researchers to rapidly explore individual datasets and perform cross-corpus analysis. This functionality is enabling meta-analyses of tens of millions of cells across studies and tissues and providing global views of human cells at the resolution of single cells.