Mouse genomic variation and its effect on phenotypes and gene regulationWe report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism. The laboratory mouse has become the workhorse of biomedical research. The draft sequence of the mouse reference genome was published in 2002, but some forms of variation are still poorly documented. Two papers in this issue go a long way towards filling the gaps. The generation and analysis of sequence from 17 key mouse genomes, including most of the commonly used inbred strains and their progenitors, reveal extensive genetic variation and provide insights into the molecular nature of functional variants as well as the phylogenetic history of the lab mouse. The data will be an important resource for a new era of functional analysis. The second paper describes the landscape of structural variants in the genomes of 13 classical and four wild-derived inbred mouse strains, mapping many of them to base-pair resolution. Despite their prevalence, structural variants are shown to have a relatively small impact on phenotypic variation.
Sequence-based characterization of structural variation in the mouse genomeThe genomic landscape shaped by selection on transposable elements across 18 mouse strainsBACKGROUND: Transposable element (TE)-derived sequence dominates the landscape of mammalian genomes and can modulate gene function by dysregulating transcription and translation. Our current knowledge of TEs in laboratory mouse strains is limited primarily to those present in the C57BL/6J reference genome, with most mouse TEs being drawn from three distinct classes, namely short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs) and the endogenous retrovirus (ERV) superfamily. Despite their high prevalence, the different genomic and gene properties controlling whether TEs are preferentially purged from, or are retained by, genetic drift or positive selection in mammalian genomes remain poorly defined. RESULTS: Using whole genome sequencing data from 13 classical laboratory and 4 wild-derived mouse inbred strains, we developed a comprehensive catalogue of 103,798 polymorphic TE variants. We employ this extensive data set to characterize TE variants across the Mus lineage, and to infer neutral and selective processes that have acted over 2 million years. Our results indicate that the majority of TE variants are introduced though the male germline and that only a minority of TE variants exert detectable changes in gene expression. However, among genes with differential expression across the strains there are twice as many TE variants identified as being putative causal variants as expected. CONCLUSIONS: Most TE variants that cause gene expression changes appear to be purged rapidly by purifying selection. Our findings demonstrate that past TE insertions have often been highly deleterious, and help to prioritize TE variants according to their likely contribution to gene expression or phenotype variation.
Elusive Copy Number Variation in the Mouse GenomeBACKGROUND: Array comparative genomic hybridization (aCGH) to detect copy number variants (CNVs) in mammalian genomes has led to a growing awareness of the potential importance of this category of sequence variation as a cause of phenotypic variation. Yet there are large discrepancies between studies, so that the extent of the genome affected by CNVs is unknown. We combined molecular and aCGH analyses of CNVs in inbred mouse strains to investigate this question. PRINCIPAL FINDINGS: Using a 2.1 million probe array we identified 1,477 deletions and 499 gains in 7 inbred mouse strains. Molecular characterization indicated that approximately one third of the CNVs detected by the array were false positives and we estimate the false negative rate to be more than 50%. We show that low concordance between studies is largely due to the molecular nature of CNVs, many of which consist of a series of smaller deletions and gains interspersed by regions where the DNA copy number is normal. CONCLUSIONS: Our results indicate that CNVs detected by arrays may be the coincidental co-localization of smaller CNVs, whose presence is more likely to perturb an aCGH hybridization profile than the effect of an isolated, small, copy number alteration. Our findings help explain the hitherto unexplored discrepancies between array-based studies of copy number variation in the mouse genome.
Implementing Agent-based Web ServicesAs part of the Agentcities project, we have developed a prototype of an Evening Organiser application which allows users to flexibly and dynamically schedule activities within an itinerary. The Evening Organiser and the Web-accessible restaurant and cinema services which it uses have been developed within a generic service environment and the implementation of this has been built using the April Agent Platform, the DAML+OIL ontology language, the DAML Query Language and the Java Theorem Prover. This service environment is populated with agents of different natures, such as service instances and service finders. Service instances represent individual business entities, such as restaurants and cinemas. Service finders represent aggregated views over service instances, such as Yahoo!-hosted restaurants or Citysearch-hosted cinemas. The details of the implementation of these Web Services are described through the use of a motivating scenario.