Metin Balaban

Greengenes2 unifies microbial data in a single reference tree

Daniel McDonald, Yueyu Jiang, Metin Balaban et al.|Nature Biotechnology|2023

Cited by 571Open Access

Studies using 16S rRNA and shotgun metagenomics typically yield different results, usually attributed to PCR amplification biases. We introduce Greengenes2, a reference tree that unifies genomic and 16S rRNA databases in a consistent, integrated resource. By inserting sequences into a whole-genome phylogeny, we show that 16S rRNA and shotgun metagenomic data generated from the same samples agree in principal coordinates space, taxonomy and phenotype effect size when analyzed with the same tree.

Complexity of avian evolution revealed by family-level genomes

Josefin Stiller, Shaohong Feng, Al-Aabid Chowdhury et al.|Nature|2024

Cited by 271Open Access

Abstract Despite tremendous efforts in the past decades, relationships among main avian lineages remain heavily debated without a clear resolution. Discrepancies have been attributed to diversity of species sampled, phylogenetic method and the choice of genomic regions 1–3 . Here we address these issues by analysing the genomes of 363 bird species 4 (218 taxonomic families, 92% of total). Using intergenic regions and coalescent methods, we present a well-supported tree but also a marked degree of discordance. The tree confirms that Neoaves experienced rapid radiation at or near the Cretaceous–Palaeogene boundary. Sufficient loci rather than extensive taxon sampling were more effective in resolving difficult nodes. Remaining recalcitrant nodes involve species that are a challenge to model due to either extreme DNA composition, variable substitution rates, incomplete lineage sorting or complex evolutionary events such as ancient hybridization. Assessment of the effects of different genomic partitions showed high heterogeneity across the genome. We discovered sharp increases in effective population size, substitution rates and relative brain size following the Cretaceous–Palaeogene extinction event, supporting the hypothesis that emerging ecological opportunities catalysed the diversification of modern birds. The resulting phylogenetic estimate offers fresh insights into the rapid radiation of modern birds and provides a taxon-rich backbone tree for future comparative studies.

Multi-level analysis of the gut–brain axis shows autism spectrum disorder-associated molecular and microbial profiles

James T. Morton, Dong-Min Jin, Robert H. Mills et al.|Nature Neuroscience|2023

Cited by 203Open Access

Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by heterogeneous cognitive, behavioral and communication impairments. Disruption of the gut-brain axis (GBA) has been implicated in ASD although with limited reproducibility across studies. In this study, we developed a Bayesian differential ranking algorithm to identify ASD-associated molecular and taxa profiles across 10 cross-sectional microbiome datasets and 15 other datasets, including dietary patterns, metabolomics, cytokine profiles and human brain gene expression profiles. We found a functional architecture along the GBA that correlates with heterogeneity of ASD phenotypes, and it is characterized by ASD-associated amino acid, carbohydrate and lipid profiles predominantly encoded by microbial species in the genera Prevotella, Bifidobacterium, Desulfovibrio and Bacteroides and correlates with brain gene expression changes, restrictive dietary patterns and pro-inflammatory cytokine profiles. The functional architecture revealed in age-matched and sex-matched cohorts is not present in sibling-matched cohorts. We also show a strong association between temporal changes in microbiome composition and ASD phenotypes. In summary, we propose a framework to leverage multi-omic datasets from well-defined cohorts and investigate how the GBA influences ASD.

TreeCluster: Clustering biological sequences using phylogenetic trees

Metin Balaban, Niema Moshiri, Uyen Mai et al.|PLoS ONE|2019

Cited by 200Open Access

Clustering homologous sequences based on their similarity is a problem that appears in many bioinformatics applications. The fact that sequences cluster is ultimately the result of their phylogenetic relationships. Despite this observation and the natural ways in which a tree can define clusters, most applications of sequence clustering do not use a phylogenetic tree and instead operate on pairwise sequence distances. Due to advances in large-scale phylogenetic inference, we argue that tree-based clustering is under-utilized. We define a family of optimization problems that, given an arbitrary tree, return the minimum number of clusters such that all clusters adhere to constraints on their heterogeneity. We study three specific constraints, limiting (1) the diameter of each cluster, (2) the sum of its branch lengths, or (3) chains of pairwise distances. These three problems can be solved in time that increases linearly with the size of the tree, and for two of the three criteria, the algorithms have been known in the theoretical computer scientist literature. We implement these algorithms in a tool called TreeCluster, which we test on three applications: OTU clustering for microbiome data, HIV transmission clustering, and divide-and-conquer multiple sequence alignment. We show that, by using tree-based distances, TreeCluster generates more internally consistent clusters than alternatives and improves the effectiveness of downstream applications. TreeCluster is available at https://github.com/niemasd/TreeCluster.

Tumour evolution and microenvironment interactions in 2D and 3D space

Chia-Kuei Mo, Jingxian Liu, Siqi Chen et al.|Nature|2024

Cited by 137Open Access

, we here examined a cohort of 131 tumour sections from 78 cases across 6 cancer types by Visium spatial transcriptomics (ST). This was combined with 48 matched single-nucleus RNA sequencing samples and 22 matched co-detection by indexing (CODEX) samples. To describe tumour structures and habitats, we defined 'tumour microregions' as spatially distinct cancer cell clusters separated by stromal components. They varied in size and density among cancer types, with the largest microregions observed in metastatic samples. We further grouped microregions with shared genetic alterations into 'spatial subclones'. Thirty five tumour sections exhibited subclonal structures. Spatial subclones with distinct copy number variations and mutations displayed differential oncogenic activities. We identified increased metabolic activity at the centre and increased antigen presentation along the leading edges of microregions. We also observed variable T cell infiltrations within microregions and macrophages predominantly residing at tumour boundaries. We reconstructed 3D tumour structures by co-registering 48 serial ST sections from 16 samples, which provided insights into the spatial organization and heterogeneity of tumours. Additionally, using an unsupervised deep-learning algorithm and integrating ST and CODEX data, we identified both immune hot and cold neighbourhoods and enhanced immune exhaustion markers surrounding the 3D subclones. These findings contribute to the understanding of spatial tumour evolution through interactions with the local microenvironment in 2D and 3D space, providing valuable insights into tumour biology.

Is this you? Claim your profile.

Top publicationsby citations