T

Ted Johnson

Cochin University of Science and Technology

Publishes on Genomics and Phylogenetic Studies, Complex Network Analysis Techniques, Opinion Dynamics and Social Influence. 7 papers and 608 citations.

7Publications
608Total Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Molecular Fossils in the Human Genome: Identification and Analysis of the Pseudogenes in Chromosomes 21 and 22
Cited by 193Open Access

We have developed an initial approach for annotating and surveying pseudogenes in the human genome. We search human genomic DNA for regions that are similar to known protein sequences and contain obvious disablements (i.e., mid-sequence stop codons or frameshifts), while ensuring minimal overlap with annotations of known genes. Pseudogenes can be divided into "processed" and "nonprocessed"; the former are reverse transcribed from mRNA (and therefore have no intron structure), whereas the latter presumably arise from genomic duplications. We annotate putative processed pseudogenes based on whether there is a continuous span of homology that is >70% of the length of the closest matching human protein (i.e., with introns removed), or whether there is evidence of polyadenylation. We have applied our approach to chromosomes 21 and 22, the first parts of the human genome completely sequenced, finding 190 new pseudogene annotations beyond the 264 reported by the sequencing centers. In total, on chromosomes 21 and 22, there are 189 processed pseudogenes, 195 nonprocessed pseudogenes, and, additionally, 70 pseudogenic immunoglobulin gene segments. (Detailed assignments are available at http://bioinfo.mbb.yale.edu/genome/pseudogene or http://genecensus.org/pseudogene.) By extrapolation, we predict that there could be up to approximately 20,000 pseudogenes in the whole human genome, with a little more than half of them processed. We have determined the main populations and clusters of pseudogenes on chromosomes 21 and 22. There are notable excesses of pseudogenes relative to genes near the centromeres of both chromosomes, indicating the existence of pseudogenic "hot-spots" in the genome. We have looked at the distribution of InterPro families and Gene Ontology (GO) functional categories in our pseudogenes. Overall, the families in both processed and nonprocessed pseudogene populations occur according to a similar power-law distribution as that found for the occurrence of gene families, with a few big families and many small ones. The processed population is, in particular, enriched in highly expressed ribosomal-protein sequences (approximately 20%), which appear fairly evenly distributed across the chromosomes. We compared processed pseudogenes of different evolutionary ages, observing a high degree of similarity between "ancient" and "modern" subpopulations. This may be attributable to the consistently high expression of ribosomal proteins over evolutionary time. Finally, we find that chromosome 22 pseudogene population is dominated by immunoglobulin segments, which have a greater rate of disablement per amino acid than the other pseudogene populations and are also substantially more diverged.

The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties
Nicholas M. Luscombe, Jiang Qian, Zhaolei Zhang et al.|Genome biology|2002
Cited by 121Open Access

BACKGROUND: The sequencing of genomes provides us with an inventory of the 'molecular parts' in nature, such as protein families and folds, and their functions in living organisms. Through the analysis of such inventories, it has been shown that different genomes have very different usage of parts; for example, the common folds in the worm are very different from those in Escherichia coli. RESULTS: Despite these differences, we find that the genomic occurrence of generalized parts follows a well-known mathematical framework called the power law, with a few parts occurring many times and most occurring only a few times. This observation is true in a wide variety of genomic contexts. Earlier studies found power laws in a few specific cases, such as the occurrence of protein families. Here, we find many further cases of power-law behavior, for example in the occurrence of pseudogenes and in levels of gene expression. We show comprehensively that this behavior applies across many different genomes, for many different types of parts (DNA words, InterPro families, protein superfamilies and folds, pseudogene families and pseudomotifs), and for the many disparate attributes associated with these parts (their functions, interactions and expression levels). CONCLUSIONS: Power-law behavior provides a concise mathematical description of an important biological feature: the sheer dominance of a few members over the overall population. We present this behavior in a unified framework and propose that all these observations are connected to an underlying DNA duplication process as genomes evolved to their current state.

Studying Macromolecular Motions in a Database Framework: From Structure to Sequence
Mark Gerstein, Ronald Jansen, Ted Johnson et al.|Kluwer Academic Publishers eBooks|2005
Cited by 15

We describe database approaches taken in our lab to the study of protein and nucleic acid motions. We have developed a database of macromolecular motions, which is accessible on the World Wide Web with an entry point at http://bioinfo.mbb.yale.edu/MolMovDB. This attempts to systematize all instances of macromolecular movement for which there is at least some structural information. At present it contains detailed descriptions of more than 100 motions, most of which are of proteins. Protein motions are further classified hierarchically into a limited number of categories, first on the basis of size (distinguishing between fragment, domain, and subunit motions) and then on the basis of packing. Our packing classification divides motions into various categories (shear, hinge, other) depending on whether or not they involve sliding over a continuously maintained and tightly packed interface. We quantitatively systematize the description of packing through the use of Voronoi polyhedra and Delaunay triangulation. In addition to the packing classification, the database provides some indication about the evidence behind each motion (i.e. the type of experimental information or whether the motion is inferred based on structural similarity) and attempts to describe many aspects of a motion in terms of a standardized nomenclature (e.g. the maximum rotation, the residue selection of a fixed core, etc). Currently, we use a standard relational design to implement the database. However, the complexity and heterogeneity of the information kept in the database makes it an ideal application for an object-relational approach, and we are moving it in this direction. The database, moreover, incorporates innovative Internet cooperatively features that allow authorized remote experts to serve as database editors. The database also contains plausible representations for motion pathways, derived from restrained 3D interpolation between known endpoint conformations. These pathways can be viewed in a variety of movie formats, and the database is associated with a server that can automatically generate these movies from submitted coordinates. Based on the structures in the database we have developed sequence patterns for linkers and flexible hinges and are currently using these for the annotation of genome sequence data.