Identification of a Coordinate Regulator of Interleukins 4, 13, and 5 by Cross-Species Sequence ComparisonsLong-range regulatory elements are difficult to discover experimentally; however, they tend to be conserved among mammals, suggesting that cross-species sequence comparisons should identify them. To search for regulatory sequences, we examined about 1 megabase of orthologous human and mouse sequences for conserved noncoding elements with greater than or equal to 70% identity over at least 100 base pairs. Ninety noncoding sequences meeting these criteria were discovered, and the analysis of 15 of these elements found that about 70% were conserved across mammals. Characterization of the largest element in yeast artificial chromosome transgenic mice revealed it to be a coordinate regulator of three genes, interleukin-4, interleukin-13, and interleukin-5, spread over 120 kilobases.
ECR Browser: a tool for visualizing and accessing data from comparisons of multiple vertebrate genomesWith an increasing number of vertebrate genomes being sequenced in draft or finished form, unique opportunities for decoding the language of DNA sequence through comparative genome alignments have arisen. However, novel tools and strategies are required to accommodate this large volume of genomic information and to facilitate the transfer of predictions generated by comparative sequence alignment to researchers focused on experimental annotation of genome function. Here, we present the ECR Browser, a tool that provides easy and dynamic access to whole genome alignments of human, mouse, rat and fish sequences. This web-based tool (http://ecrbrowser.dcode.org) provides the starting point for discovery of novel genes, identification of distant gene regulatory elements and prediction of transcription factor binding sites. The genome alignment portal of the ECR Browser also permits fast and automated alignments of any user-submitted sequence to the genome of choice. The interconnection of the ECR Browser with other DNA sequence analysis tools creates a unique portal for studying and exploring vertebrate genomes.
Genomic deletion of a long-range bone enhancer misregulates sclerostin in Van Buchem diseaseMutations in distant regulatory elements can have a negative impact on human development and health, yet because of the difficulty of detecting these critical sequences, we predominantly focus on coding sequences for diagnostic purposes. We have undertaken a comparative sequence-based approach to characterize a large noncoding region deleted in patients affected by Van Buchem (VB) disease, a severe sclerosing bone dysplasia. Using BAC recombination and transgenesis, we characterized the expression of human sclerostin (SOST) from normal (SOST(wt)) or Van Buchem (SOST(vbDelta) alleles. Only the SOST(wt) allele faithfully expressed high levels of human SOST in the adult bone and had an impact on bone metabolism, consistent with the model that the VB noncoding deletion removes a SOST-specific regulatory element. By exploiting cross-species sequence comparisons with in vitro and in vivo enhancer assays, we were able to identify a candidate enhancer element that drives human SOST expression in osteoblast-like cell lines in vitro and in the skeletal anlage of the embryonic day 14.5 (E14.5) mouse embryo, and discovered a novel function for sclerostin during limb development. Our approach represents a framework for characterizing distant regulatory elements associated with abnormal human phenotypes.
<tt>rVista</tt>for Comparative Sequence-Based Discovery of Functional Transcription Factor Binding SitesIdentifying transcriptional regulatory elements represents a significant challenge in annotating the genomes of higher vertebrates. We have developed a computational tool, rVista, for high-throughput discovery of cis-regulatory elements that combines clustering of predicted transcription factor binding sites (TFBSs) and the analysis of interspecies sequence conservation to maximize the identification of functional sites. To assess the ability of rVista to discover true positive TFBSs while minimizing the prediction of false positives, we analyzed the distribution of several TFBSs across 1 Mb of the well-annotated cytokine gene cluster (Hs5q31; Mm11). Because a large number of AP-1, NFAT, and GATA-3 sites have been experimentally identified in this interval, we focused our analysis on the distribution of all binding sites specific for these transcription factors. The exploitation of the orthologous human-mouse dataset resulted in the elimination of > 95% of the approximately 58,000 binding sites predicted on analysis of the human sequence alone, whereas it identified 88% of the experimentally verified binding sites in this region.
rVISTA 2.0: evolutionary analysis of transcription factor binding sitesIdentifying and characterizing the transcription factor binding site (TFBS) patterns of cis-regulatory elements represents a challenge, but holds promise to reveal the regulatory language the genome uses to dictate transcriptional dynamics. Several studies have demonstrated that regulatory modules are under positive selection and, therefore, are often conserved between related species. Using this evolutionary principle, we have created a comparative tool, rVISTA, for analyzing the regulatory potential of noncoding sequences. Our ability to experimentally identify functional noncoding sequences is extremely limited, therefore, rVISTA attempts to fill this great gap in genomic analysis by offering a powerful approach for eliminating TFBSs least likely to be biologically relevant. The rVISTA tool combines TFBS predictions, sequence comparisons and cluster analysis to identify noncoding DNA regions that are evolutionarily conserved and present in a specific configuration within genomic sequences. Here, we present the newly developed version 2.0 of the rVISTA tool, which can process alignments generated by both the zPicture and blastz alignment programs or use pre-computed pairwise alignments of several vertebrate genomes available from the ECR Browser and GALA database. The rVISTA web server is closely interconnected with the TRANSFAC database, allowing users to either search for matrices present in the TRANSFAC library collection or search for user-defined consensus sequences. The rVISTA tool is publicly available at http://rvista.dcode.org/.