High-sensitivity pattern discovery in large, paired multiomic datasets

Andrew R. Ghazi(Broad Institute), Kathleen Sucipto(Harvard University), Ali Rahnavard(Broad Institute), Eric A. Franzosa(Broad Institute), Lauren J. McIver(Broad Institute), Jason Lloyd‐Price(Broad Institute), Emma Schwager(Harvard University), George Weingart(Harvard University), Yo Sup Moon(Harvard University), Xochitl C. Morgan(University of Otago), Levi Waldron(City University of New York), Curtis Huttenhower(Broad Institute)
Bioinformatics
April 26, 2022
Cited by 79Open Access
Full Text

Abstract

MOTIVATION: Modern biological screens yield enormous numbers of measurements, and identifying and interpreting statistically significant associations among features are essential. In experiments featuring multiple high-dimensional datasets collected from the same set of samples, it is useful to identify groups of associated features between the datasets in a way that provides high statistical power and false discovery rate (FDR) control. RESULTS: Here, we present a novel hierarchical framework, HAllA (Hierarchical All-against-All association testing), for structured association discovery between paired high-dimensional datasets. HAllA efficiently integrates hierarchical hypothesis testing with FDR correction to reveal significant linear and non-linear block-wise relationships among continuous and/or categorical data. We optimized and evaluated HAllA using heterogeneous synthetic datasets of known association structure, where HAllA outperformed all-against-all and other block-testing approaches across a range of common similarity measures. We then applied HAllA to a series of real-world multiomics datasets, revealing new associations between gene expression and host immune activity, the microbiome and host transcriptome, metabolomic profiling and human health phenotypes. AVAILABILITY AND IMPLEMENTATION: An open-source implementation of HAllA is freely available at http://huttenhower.sph.harvard.edu/halla along with documentation, demo datasets and a user group. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Related Papers

No related papers found

Powered by citation graph analysis