The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversityThe majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3' end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3' processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.
Regulatory enhancer profiling of mesenchymal-type gastric cancer reveals subtype-specific epigenomic landscapes and targetable vulnerabilitiesObjective Gastric cancer (GC) comprises multiple molecular subtypes. Recent studies have highlighted mesenchymal-subtype GC (Mes-GC) as a clinically aggressive subtype with few treatment options. Combining multiple studies, we derived and applied a consensus Mes-GC classifier to define the Mes-GC enhancer landscape revealing disease vulnerabilities. Design Transcriptomic profiles of ~1000 primary GCs and cell lines were analysed to derive a consensus Mes-GC classifier. Clinical and genomic associations were performed across >1200 patients with GC. Genome-wide epigenomic profiles (H3K27ac, H3K4me1 and assay for transposase-accessible chromatin with sequencing (ATAC-seq)) of 49 primary GCs and GC cell lines were generated to identify Mes-GC-specific enhancer landscapes. Upstream regulators and downstream targets of Mes-GC enhancers were interrogated using chromatin immunoprecipitation followed by sequencing (ChIP-seq), RNA sequencing, CRISPR/Cas9 editing, functional assays and pharmacological inhibition. Results We identified and validated a 993-gene cancer-cell intrinsic Mes-GC classifier applicable to retrospective cohorts or prospective single samples. Multicohort analysis of Mes-GCs confirmed associations with poor patient survival, therapy resistance and few targetable genomic alterations. Analysis of enhancer profiles revealed a distinctive Mes-GC epigenomic landscape, with TEAD1 as a master regulator of Mes-GC enhancers and Mes-GCs exhibiting preferential sensitivity to TEAD1 pharmacological inhibition. Analysis of Mes-GC super-enhancers also highlighted NUAK1 kinase as a downstream target, with synergistic effects observed between NUAK1 inhibition and cisplatin treatment. Conclusion Our results establish a consensus Mes-GC classifier applicable to multiple transcriptomic scenarios. Mes-GCs exhibit a distinct epigenomic landscape, and TEAD1 inhibition and combinatorial NUAK1 inhibition/cisplatin may represent potential targetable options.
Comprehensive molecular phenotyping of<i>ARID1A</i>-deficient gastric cancer reveals pervasive epigenomic reprogramming and therapeutic opportunitiesObjective Gastric cancer (GC) is a leading cause of cancer mortality, with ARID1A being the second most frequently mutated driver gene in GC. We sought to decipher ARID1A -specific GC regulatory networks and examine therapeutic vulnerabilities arising from ARID1A loss. Design Genomic profiling of GC patients including a Singapore cohort (>200 patients) was performed to derive mutational signatures of ARID1A inactivation across molecular subtypes. Single-cell transcriptomic profiles of ARID1A -mutated GCs were analysed to examine tumour microenvironmental changes arising from ARID1A loss. Genome-wide ARID1A binding and chromatin profiles (H3K27ac, H3K4me3, H3K4me1, ATAC-seq) were generated to identify gastric-specific epigenetic landscapes regulated by ARID1A. Distinct cancer hallmarks of ARID1A -mutated GCs were converged at the genomic, single-cell and epigenomic level, and targeted by pharmacological inhibition. Results We observed prevalent ARID1A inactivation across GC molecular subtypes, with distinct mutational signatures and linked to a NFKB-driven proinflammatory tumour microenvironment. ARID1A -depletion caused loss of H3K27ac activation signals at ARID1A -occupied distal enhancers, but unexpectedly gain of H3K27ac at ARID1A-occupied promoters in genes such as NFKB1 and NFKB2 . Promoter activation in ARID1A -mutated GCs was associated with enhanced gene expression, increased BRD4 binding, and reduced HDAC1 and CTCF occupancy. Combined targeting of promoter activation and tumour inflammation via bromodomain and NFKB inhibitors confirmed therapeutic synergy specific to ARID1A -genomic status. Conclusion Our results suggest a therapeutic strategy for ARID1A -mutated GCs targeting both tumour-intrinsic (BRD4-assocatiated promoter activation) and extrinsic (NFKB immunomodulation) cancer phenotypes.
Integrative epigenomic and high-throughput functional enhancer profiling reveals determinants of enhancer heterogeneity in gastric cancerBACKGROUND: Enhancers are distal cis-regulatory elements required for cell-specific gene expression and cell fate determination. In cancer, enhancer variation has been proposed as a major cause of inter-patient heterogeneity-however, most predicted enhancer regions remain to be functionally tested. METHODS: We analyzed 132 epigenomic histone modification profiles of 18 primary gastric cancer (GC) samples, 18 normal gastric tissues, and 28 GC cell lines using Nano-ChIP-seq technology. We applied Capture-based Self-Transcribing Active Regulatory Region sequencing (CapSTARR-seq) to assess functional enhancer activity. An Activity-by-contact (ABC) model was employed to explore the effects of histone acetylation and CapSTARR-seq levels on enhancer-promoter interactions. RESULTS: We report a comprehensive catalog of 75,730 recurrent predicted enhancers, the majority of which are GC-associated in vivo (> 50,000) and associated with lower somatic mutation rates inferred by whole-genome sequencing. Applying CapSTARR-seq to the enhancer catalog, we observed significant correlations between CapSTARR-seq functional activity and H3K27ac/H3K4me1 levels. Super-enhancer regions exhibited increased CapSTARR-seq signals compared to regular enhancers, even when decoupled from native chromatin contexture. We show that combining histone modification and CapSTARR-seq functional enhancer data improves the prediction of enhancer-promoter interactions and pinpointing of germline single nucleotide polymorphisms (SNPs), somatic copy number alterations (SCNAs), and trans-acting TFs involved in GC expression. We identified cancer-relevant genes (ING1, ARL4C) whose expression between patients is influenced by enhancer differences in genomic copy number and germline SNPs, and HNF4α as a master trans-acting factor associated with GC enhancer heterogeneity. CONCLUSIONS: Our results indicate that combining histone modification and functional assay data may provide a more accurate metric to assess enhancer activity than either platform individually, providing insights into the relative contribution of genetic (cis) and regulatory (trans) mechanisms to GC enhancer functional heterogeneity.
An expanded registry of candidate cis-regulatory elementsAbstract Mammalian genomes contain millions of regulatory elements that control the complex patterns of gene expression 1 . Previously, the ENCODE consortium mapped biochemical signals across hundreds of cell types and tissues and integrated these data to develop a registry containing 0.9 million human and 300,000 mouse candidate cis -regulatory elements (cCREs) annotated with potential functions 2 . Here we have expanded the registry to include 2.37 million human and 967,000 mouse cCREs, leveraging new ENCODE datasets and enhanced computational methods. This expanded registry covers hundreds of unique cell and tissue types, providing a comprehensive understanding of gene regulation. Functional characterization data from assays such as STARR-seq 3 , massively parallel reporter assay 4 , CRISPR perturbation 5,6 and transgenic mouse assays 7 have profiled more than 90% of human cCREs, revealing complex regulatory functions. We identified thousands of novel silencer cCREs and demonstrated their dual enhancer and silencer roles in different cellular contexts. Integrating the registry with other ENCODE annotations facilitates genetic variation interpretation and trait-associated gene identification, exemplified by the identification of KLF1 as a novel causal gene for red blood cell traits. This expanded registry is a valuable resource for studying the regulatory genome and its impact on health and disease.