Icahn School of Medicine at Mount Sinai
Publishes on Chromosomal and Genetic Variations, Genomic variations and chromosomal abnormalities, Genomics and Chromatin Dynamics. 67 papers and 4k citations.
Add your photo, update your bio, and get notified when your ranking changes.
We have performed the first genome-wide analysis of the Inverted Repeat (IR) structure in the human genome, using a novel and efficient software package called Inverted Repeats Finder (IRF). After masking of known repetitive elements, IRF detected 22,624 human IRs characterized by arm size from 25 bp to >100 kb with at least 75% identity, and spacer length up to 100 kb. This analysis required 6 h on a desktop PC. In all, 166 IRs had arm lengths >8 kb. From this set, IRs were excluded if they were in unfinished/unassembled regions of the genome, or clustered with other closely related IRs, yielding a set of 96 large IRs. Of these, 24 (25%) occurred on the X-chromosome, although it represents only approximately 5% of the genome. Of the X-chromosome IRs, 83.3% were >/=99% identical, compared with 28.8% of autosomal IRs. Eleven IRs from Chromosome X, one from Chromosome 11, and seven already described from Chromosome Y contain genes predominantly expressed in testis. PCR analysis of eight of these IRs correctly amplified the corresponding region in the human genome, and six were also confirmed in gorilla or chimpanzee genomes. Similarity dot-plots revealed that 22 IRs contained further secondary homologous structures partially categorized into three distinct patterns. The prevalence of large highly homologous IRs containing testes genes on the X- and Y-chromosomes suggests a possible role in male germ-line gene expression and/or maintaining sequence integrity by gene conversion.
BACKGROUND: Tandemly Repeated DNA represents a large portion of the human genome, and accounts for a significant amount of copy number variation. Here we present a genome wide analysis of the largest tandem repeats found in the human genome sequence. RESULTS: Using Tandem Repeats Finder (TRF), tandem repeat arrays greater than 10 kb in total size were identified, and classified into simple sequence e.g. GAATG, classical satellites e.g. alpha satellite DNA, and locus specific VNTR arrays. Analysis of these large sequenced regions revealed that several "simple sequence" arrays actually showed complex domain and/or higher order repeat organization. Using additional methods, we further identified a total of 96 additional arrays with tandem repeat units greater than 2 kb (the detection limit of TRF), 53 of which contained genes or repeated exons. The overall size of an array of tandem 12 kb repeats which spanned a gap on chromosome 8 was found to be 600 kb to 1.7 Mbp in size, representing one of the largest non-centromeric arrays characterized. Several novel megasatellite tandem DNA families were observed that are characterized by repeating patterns of interspersed transposable elements that have expanded presumably by unequal crossing over. One of these families is found on 11 different chromosomes in >25 arrays, and represents one of the largest most widespread megasatellite DNA families. CONCLUSION: This study represents the most comprehensive genome wide analysis of large tandem repeats in the human genome, and will serve as an important resource towards understanding the organization and copy number variation of these complex DNA families.