DUBStepR is a scalable correlation-based feature selection method for accurately clustering single-cell dataBobby Ranjan, Wenjie Sun, Jinyu Park et al.|Nature Communications|2021 Feature selection (marker gene selection) is widely believed to improve clustering accuracy, and is thus a key component of single cell clustering pipelines. Existing feature selection methods perform inconsistently across datasets, occasionally even resulting in poorer clustering accuracy than without feature selection. Moreover, existing methods ignore information contained in gene-gene correlations. Here, we introduce DUBStepR (Determining the Underlying Basis using Stepwise Regression), a feature selection algorithm that leverages gene-gene correlations with a novel measure of inhomogeneity in feature space, termed the Density Index (DI). Despite selecting a relatively small number of genes, DUBStepR substantially outperformed existing single-cell feature selection methods across diverse clustering benchmarks. Additionally, DUBStepR was the only method to robustly deconvolve T and NK heterogeneity by identifying disease-associated common and rare cell types and subtypes in PBMCs from rheumatoid arthritis patients. DUBStepR is scalable to over a million cells, and can be straightforwardly applied to other data types such as single-cell ATAC-seq. We propose DUBStepR as a general-purpose feature selection solution for accurately clustering single-cell data.
scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing dataBACKGROUND: Clustering is a crucial step in the analysis of single-cell data. Clusters identified in an unsupervised manner are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering approaches have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. RESULTS: We present SCCONSENSUS, an [Formula: see text] framework for generating a consensus clustering by (1) integrating results from both unsupervised and supervised approaches and (2) refining the consensus clusters using differentially expressed genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. CONCLUSIONS: SCCONSENSUS combines the merits of unsupervised and supervised approaches to partition cells with better cluster separation and homogeneity, thereby increasing our confidence in detecting distinct cell types. SCCONSENSUS is implemented in [Formula: see text] and is freely available on GitHub at https://github.com/prabhakarlab/scConsensus .
DUBStepR: correlation-based feature selection for clustering single-cell RNA sequencing dataBobby Ranjan, Wenjie Sun, Jinyu Park et al.|bioRxiv (Cold Spring Harbor Laboratory)|2020 Feature selection (marker gene selection) is widely believed to improve clustering accuracy, and is thus a key component of single cell clustering pipelines. However, we found that the performance of existing feature selection methods was inconsistent across benchmark datasets, and occasionally even worse than without feature selection. Moreover, existing methods ignored information contained in gene-gene correlations. We therefore developed DUBStepR ( D etermining the U nderlying B asis using Step wise R egression), a feature selection algorithm that leverages gene-gene correlations with a novel measure of inhomogeneity in feature space, termed the Density Index (DI). Despite selecting a relatively small number of genes, DUBStepR substantially outperformed existing single-cell feature selection methods across diverse clustering benchmarks. In a published scRNA-seq dataset from sorted monocytes, DUBStepR sensitively detected a rare and previously invisible population of contaminating basophils. DUBStepR is scalable to over a million cells, and can be straightforwardly applied to other data types such as single-cell ATAC-seq. We propose DUBStepR as a general-purpose feature selection solution for accurately clustering single-cell data.
A case of isodicentric chromosome 15 presented with epilepsy and developmental delayJon Soo Kim, Jinyu Park, Byung-Joo Min et al.|Korean Journal of Pediatrics|2012 We report a case of isodicentric chromosome 15 (idic(15) chromosome), the presence of which resulted in uncontrolled seizures, including epileptic spasms, tonic seizures, and global developmental delay. A 10-month-old female infant was referred to our pediatric neurology clinic because of uncontrolled seizures and global developmental delay. She had generalized tonic-clonic seizures since 7 months of age. At referral, she could not control her head and presented with generalized hypotonia. Her brain magnetic resonance imaging scans and metabolic evaluation results were normal. Routine karyotyping indicated the presence of a supernumerary marker chromosome of unknown origin (47, XX +mar). An array-comparative genomic hybridization (CGH) analysis revealed amplification from 15q11.1 to 15q13.1. Subsequent fluorescence in situ hybridization analysis confirmed a idic(15) chromosome. Array-CGH analysis has the advantage in determining the unknown origin of a supernumerary marker chromosome, and could be a useful method for the genetic diagnosis of epilepsy syndromes associated with various chromosomal aberrations.
scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing dataBobby Ranjan, Florian Schmidt, Wenjie Sun et al.|bioRxiv (Cold Spring Harbor Laboratory)|2020 Clustering is a crucial step in the analysis of single-cell data. Clusters identified using unsupervised clustering are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering strategies have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. We present sc C onsensus , an R framework for generating a consensus clustering by (i) integrating the results from both unsupervised and supervised approaches and (ii) refining the consensus clusters using differentially expressed (DE) genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. sc C onsensus is freely available on GitHub at https://github.com/prabhakarlab/scConsensus .