Trinity College Dublin
ORCID: 0000-0003-2552-6220Publishes on Genomics and Phylogenetic Studies, Chromosomal and Genetic Variations, RNA and protein synthesis mechanisms. 100 papers and 29.8k citations.
Add your photo, update your bio, and get notified when your ranking changes.
UNLABELLED: Porter is a new system for protein secondary structure prediction in three classes. Porter relies on bidirectional recurrent neural networks with shortcut connections, accurate coding of input profiles obtained from multiple sequence alignments, second stage filtering by recurrent neural networks, incorporation of long range information and large-scale ensembles of predictors. Porter's accuracy, tested by rigorous 5-fold cross-validation on a large set of proteins, exceeds 79%, significantly above a copy of the state-of-the-art SSpro server, better than any system published to date. AVAILABILITY: Porter is available as a public web server at http://distill.ucd.ie/porter/ CONTACT: gianluca.pollastri@ucd.ie.
The origin of new genes is extremely important to evolutionary innovation. Most new genes arise from existing genes through duplication or recombination. The origin of new genes from noncoding DNA is extremely rare, and very few eukaryotic examples are known. We present evidence for the de novo origin of at least three human protein-coding genes since the divergence with chimp. Each of these genes has no protein-coding homologs in any other genome, but is supported by evidence from expression and, importantly, proteomics data. The absence of these genes in chimp and macaque cannot be explained by sequencing gaps or annotation error. High-quality sequence data indicate that these loci are noncoding DNA in other primates. Furthermore, chimp, gorilla, gibbon, and macaque share the same disabling sequence difference, supporting the inference that the ancestral sequence was noncoding over the alternative possibility of parallel gene inactivation in multiple primate lineages. The genes are not well characterized, but interestingly, one of them was first identified as an up-regulated gene in chronic lymphocytic leukemia. This is the first evidence for entirely novel human-specific protein-coding genes originating from ancestrally noncoding sequences. We estimate that 0.075% of human genes may have originated through this mechanism leading to a total expectation of 18 such cases in a genome of 24,000 protein-coding genes.
About 30% of protein-coding genes in the human genome are related through two whole genome duplication (WGD) events. Although WGD is often credited with great evolutionary importance, the processes governing the retention of these genes and their biological significance remain unclear. One increasingly popular hypothesis is that dosage balance constraints are a major determinant of duplicate gene retention. We test this hypothesis and show that WGD-duplicated genes (ohnologs) have rarely experienced subsequent small-scale duplication (SSD) and are also refractory to copy number variation (CNV) in human populations and are thus likely to be sensitive to relative quantities (i.e., they are dosage-balanced). By contrast, genes that have experienced SSD in the vertebrate lineage are more likely to also display CNV. This supports the hypothesis of biased retention of dosage-balanced genes after WGD. We also show that ohnologs have a strong association with human disease. In particular, Down Syndrome (DS) caused by trisomy 21 is widely assumed to be caused by dosage effects, and 75% of previously reported candidate genes for this syndrome are ohnologs that experienced no other copy number changes. We propose the remaining dosage-balanced ohnologs on chromosome 21 as candidate DS genes. These observations clearly show a persistent resistance to dose changes in genes duplicated by WGD. Dosage balance constraints simultaneously explain duplicate gene retention and essentiality after WGD.