Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic PipelinesThe Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects.
Pathogenic Germline Variants in 10,389 Adult CancersHuman polymorphism at microRNAs and microRNA target sitesMatthew A. Saunders, Han Liang, Wen‐Hsiung Li|Proceedings of the National Academy of Sciences|2007 MicroRNAs (miRNAs) function as endogenous translational repressors of protein-coding genes in animals by binding to target sites in the 3' UTRs of mRNAs. Because a single nucleotide change in the sequence of a target site can affect miRNA regulation, naturally occurring SNPs in target sites are candidates for functional variation that may be of interest for biomedical applications and evolutionary studies. However, little is known to date about variation among humans at miRNAs and their target sites. In this study, we analyzed publicly available SNP data in context with miRNAs and their target sites throughout the human genome, and we found a relatively low level of variation in functional regions of miRNAs, but an appreciable level of variation at target sites. Approximately 400 SNPs were found at experimentally verified target sites or predicted target sites that are otherwise evolutionarily conserved across mammals. Moreover, approximately 250 SNPs potentially create novel target sites for miRNAs in humans. If some variants have functional effects, they might confer phenotypic differences among humans. Although the majority of these SNPs appear to be evolving under neutrality, interestingly, some of these SNPs are found at relatively high population frequencies even in experimentally verified targets, and a few variants are associated with atypically long-range haplotypes that may have been subject to recent positive selection.
lncRNA Epigenetic Landscape Analysis Identifies EPIC1 as an Oncogenic lncRNA that Interacts with MYC and Promotes Cell-Cycle Progression in CancerZehua Wang, Bo Yang, Min Zhang et al.|Cancer Cell|2018 Somatic Mutational Landscape of Splicing Factor Genes and Their Functional Consequences across 33 Cancer TypesHotspot mutations in splicing factor genes have been recently reported at high frequency in hematological malignancies, suggesting the importance of RNA splicing in cancer. We analyzed whole-exome sequencing data across 33 tumor types in The Cancer Genome Atlas (TCGA), and we identified 119 splicing factor genes with significant non-silent mutation patterns, including mutation over-representation, recurrent loss of function (tumor suppressor-like), or hotspot mutation profile (oncogene-like). Furthermore, RNA sequencing analysis revealed altered splicing events associated with selected splicing factor mutations. In addition, we were able to identify common gene pathway profiles associated with the presence of these mutations. Our analysis suggests that somatic alteration of genes involved in the RNA-splicing process is common in cancer and may represent an underappreciated hallmark of tumorigenesis.