CORUM: the comprehensive resource of mammalian protein complexes—2019CORUM is a database that provides a manually curated repository of experimentally characterized protein complexes from mammalian organisms, mainly human (67%), mouse (15%) and rat (10%). Given the vital functions of these macromolecular machines, their identification and functional characterization is foundational to our understanding of normal and disease biology. The new CORUM 3.0 release encompasses 4274 protein complexes offering the largest and most comprehensive publicly available dataset of mammalian protein complexes. The CORUM dataset is built from 4473 different genes, representing 22% of the protein coding genes in humans. Protein complexes are described by a protein complex name, subunit composition, cellular functions as well as the literature references. Information about stoichiometry of subunits depends on availability of experimental data. Recent developments include a graphical tool displaying known interactions between subunits. This allows the prediction of structural interconnections within protein complexes of unknown structure. In addition, we present a set of 58 protein complexes with alternatively spliced subunits. Those were found to affect cellular functions such as regulation of apoptotic activity, protein complex assembly or define cellular localization. CORUM is freely accessible at http://mips.helmholtz-muenchen.de/corum/.
CORUM: the comprehensive resource of mammalian protein complexes–2022The CORUM database has been providing comprehensive reference information about experimentally characterized, mammalian protein complexes and their associated biological and biomedical properties since 2007. Given that most catalytic and regulatory functions of the cell are carried out by protein complexes, their composition and characterization is of greatest importance in basic and disease biology. The new CORUM 4.0 release encompasses 5204 protein complexes offering the largest and most comprehensive publicly available dataset of manually curated mammalian protein complexes. The CORUM dataset is built from 5299 different genes, representing 26% of the protein coding genes in humans. Complex information from 3354 scientific articles is mainly obtained from human (70%), mouse (16%) and rat (9%) cells and tissues. Recent curation work includes sets of protein complexes, Functional Complex Groups, that offer comprehensive collections of published data in specific biological processes and molecular functions. In addition, a new graphical analysis tool was implemented that displays co-expression data from the subunits of protein complexes. CORUM is freely accessible at http://mips.helmholtz-muenchen.de/corum/.
Parallel sequencing of extrachromosomal circular DNAs and transcriptomes in single cancer cellsExtrachromosomal DNAs (ecDNAs) are common in cancer, but many questions about their origin, structural dynamics and impact on intratumor heterogeneity are still unresolved. Here we describe single-cell extrachromosomal circular DNA and transcriptome sequencing (scEC&T-seq), a method for parallel sequencing of circular DNAs and full-length mRNA from single cells. By applying scEC&T-seq to cancer cells, we describe intercellular differences in ecDNA content while investigating their structural heterogeneity and transcriptional impact. Oncogene-containing ecDNAs were clonally present in cancer cells and drove intercellular oncogene expression differences. In contrast, other small circular DNAs were exclusive to individual cells, indicating differences in their selection and propagation. Intercellular differences in ecDNA structure pointed to circular recombination as a mechanism of ecDNA evolution. These results demonstrate scEC&T-seq as an approach to systematically characterize both small and large circular DNA in cancer cells, which will facilitate the analysis of these DNA elements in cancer and beyond.
Metabolic targeting of cancer by a ubiquinone uncompetitive inhibitor of mitochondrial complex IShashi Jain, Cheng Hu, Jérôme Kluza et al.|Cell chemical biology|2021 Long-read sequencing of diagnosis and post-therapy medulloblastoma reveals complex rearrangement patterns and epigenetic signaturesCancer genomes harbor a broad spectrum of structural variants (SVs) driving tumorigenesis, a relevant subset of which escape discovery using short-read sequencing. We employed Oxford Nanopore Technologies (ONT) long-read sequencing in a paired diagnostic and post-therapy medulloblastoma to unravel the haplotype-resolved somatic genetic and epigenetic landscape. We assembled complex rearrangements, including a 1.55-Mbp chromothripsis event, and we uncover a complex SV pattern termed templated insertion (TI) thread, characterized by short (mostly <1 kb) insertions showing prevalent self-concatenation into highly amplified structures of up to 50 kbp in size. TI threads occur in 3% of cancers, with a prevalence up to 74% in liposarcoma, and frequent colocalization with chromothripsis. We also perform long-read-based methylome profiling and discover allele-specific methylation (ASM) effects, complex rearrangements exhibiting differential methylation, and differential promoter methylation in cancer-driver genes. Our study shows the advantage of long-read sequencing in the discovery and characterization of complex somatic rearrangements.