FlowGrid enables fast clustering of very large single-cell RNA-seq data
Abstract
MOTIVATION: Scalable clustering algorithms are needed to analyze millions of cells in single cell RNA-seq (scRNA-seq) data. RESULTS: Here, we present an open source python package called FlowGrid that can integrate into the Scanpy workflow to perform clustering on very large scRNA-seq datasets. FlowGrid implements a fast density-based clustering algorithm originally designed for flow cytometry data analysis. We introduce a new automated parameter tuning procedure, and show that FlowGrid can achieve comparable clustering accuracy as state-of-the-art clustering algorithms but at a substantially reduced run time for very large single cell RNA-seq datasets. For example, FlowGrid can complete a one-hour clustering task for one million cells in about five min. AVAILABILITY AND IMPLEMENTATION: https://github.com/holab-hku/FlowGrid. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Related Papers
No related papers found
Powered by citation graph analysis