Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulationSequence-based machine-learning models trained on genomics data improve genetic variant interpretation by providing functional predictions describing their impact on the cis-regulatory code. However, current tools do not predict RNA-seq expression profiles because of modeling challenges. Here, we introduce Borzoi, a model that learns to predict cell-type-specific and tissue-specific RNA-seq coverage from DNA sequence. Using statistics derived from Borzoi's predicted coverage, we isolate and accurately score DNA variant effects across multiple layers of regulation, including transcription, splicing and polyadenylation. Evaluated on quantitative trait loci, Borzoi is competitive with and often outperforms state-of-the-art models trained on individual regulatory functions. By applying attribution methods to the derived statistics, we extract cis-regulatory motifs driving RNA expression and post-transcriptional regulation in normal tissues. The wide availability of RNA-seq data across species, conditions and assays profiling specific aspects of regulation emphasizes the potential of this approach to decipher the mapping from DNA sequence to regulatory function.
scBasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networksStage-Specific Human Induced Pluripotent Stem Cells Map the Progression of Myeloid Transformation to Transplantable Leukemia<i>Drosophila</i>Muller F Elements Maintain a Distinct Set of Genomic Properties Over 40 Million Years of EvolutionWilson Leung, C. Shaffer, Laura K Reed et al.|G3 Genes Genomes Genetics|2015 The Muller F element (4.2 Mb, ~80 protein-coding genes) is an unusual autosome of Drosophila melanogaster; it is mostly heterochromatic with a low recombination rate. To investigate how these properties impact the evolution of repeats and genes, we manually improved the sequence and annotated the genes on the D. erecta, D. mojavensis, and D. grimshawi F elements and euchromatic domains from the Muller D element. We find that F elements have greater transposon density (25-50%) than euchromatic reference regions (3-11%). Among the F elements, D. grimshawi has the lowest transposon density (particularly DINE-1: 2% vs. 11-27%). F element genes have larger coding spans, more coding exons, larger introns, and lower codon bias. Comparison of the Effective Number of Codons with the Codon Adaptation Index shows that, in contrast to the other species, codon bias in D. grimshawi F element genes can be attributed primarily to selection instead of mutational biases, suggesting that density and types of transposons affect the degree of local heterochromatin formation. F element genes have lower estimated DNA melting temperatures than D element genes, potentially facilitating transcription through heterochromatin. Most F element genes (~90%) have remained on that element, but the F element has smaller syntenic blocks than genome averages (3.4-3.6 vs. 8.4-8.8 genes per block), indicating greater rates of inversion despite lower rates of recombination. Overall, the F element has maintained characteristics that are distinct from other autosomes in the Drosophila lineage, illuminating the constraints imposed by a heterochromatic milieu.
Individual cell types in C. elegans age differently and activate distinct cell-protective responsesAging is characterized by a global decline in physiological function. However, by constructing a complete single-cell gene expression atlas, we find that Caenorhabditis elegans aging is not random in nature but instead is characterized by coordinated changes in functionally related metabolic, proteostasis, and stress-response genes in a cell-type-specific fashion, with downregulation of energy metabolism being the only nearly universal change. Similarly, the rates at which cells age differ significantly between cell types. In some cell types, aging is characterized by an increase in cell-to-cell variance, whereas in others, variance actually decreases. Remarkably, multiple resilience-enhancing transcription factors known to extend lifespan are activated across many cell types with age; we discovered new longevity candidates, such as GEI-3, among these. Together, our findings suggest that cells do not age passively but instead react strongly, and individualistically, to events that occur during aging. This atlas can be queried through a public interface.