Publishes on Computational Drug Discovery Methods, Bioinformatics and Genomic Networks, Machine Learning in Bioinformatics. 187 papers and 15.9k citations.
A 16S rRNA gene database (http://greengenes.lbl.gov) addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies. It was found that there is incongruent taxonomic nomenclature among curators even at the phylum level. Putative chimeras were identified in 3% of environmental sequences and in 0.2% of records derived from isolates. Environmental sequences were classified into 100 phylum-level lineages in the Archaea and Bacteria.
Background: Contemporary studies suggest that familial hypercholesterolemia (FH) is more frequent than previously reported and increasingly recognized as affecting individuals of all ethnicities and across many regions of the world. Precise estimation of its global prevalence and prevalence across World Health Organization regions is needed to inform policies aiming at early detection and atherosclerotic cardiovascular disease (ASCVD) prevention. The present study aims to provide a comprehensive assessment and more reliable estimation of the prevalence of FH than hitherto possible in the general population (GP) and among patients with ASCVD. Methods: We performed a systematic review and meta-analysis including studies reporting on the prevalence of heterozygous FH in the GP or among those with ASCVD. Studies reporting gene founder effects and focused on homozygous FH were excluded. The search was conducted through Medline, Embase, Cochrane, and Global Health, without time or language restrictions. A random-effects model was applied to estimate the overall pooled prevalence of FH in the general and ASCVD populations separately and by World Health Organization regions. Results: From 3225 articles, 42 studies from the GP and 20 from populations with ASCVD were eligible, reporting on 7 297 363 individuals/24 636 cases of FH and 48 158 patients/2827 cases of FH, respectively. More than 60% of the studies were from Europe. Use of the Dutch Lipid Clinic Network criteria was the commonest diagnostic method. Within the GP, the overall pooled prevalence of FH was 1:311 (95% CI, 1:250–1:397; similar between children [1:364] and adults [1:303], P =0.60; across World Health Organization regions where data were available, P =0.29; and between population-based and electronic health records–based studies, P =0.82). Studies with ≤10 000 participants reported a higher prevalence (1:200–289) compared with larger cohorts (1:365–407; P <0.001). The pooled prevalence among those with ASCVD was 18-fold higher than in the GP (1:17 [95% CI, 1:12–1:24]), driven mainly by coronary artery disease (1:16; [95% CI, 1:12–1:23]). Between-study heterogeneity was large ( I 2 >95%). Tests assessing bias were nonsignificant ( P >0.3). Conclusions: With an overall prevalence of 1:311, FH is among the commonest genetic disorders in the GP, similarly present across different regions of the world, and is more frequent among those with ASCVD. The present results support the advocacy for the institution of public health policies, including screening programs, to identify FH early and to prevent its global burden.
The antagonistic crosstalk between gibberellic acid (GA) and abscisic acid (ABA) plays a pivotal role in the modulation of seed germination. However, the molecular mechanism of such phytohormone interaction remains largely elusive. Here we show that three Arabidopsis NUCLEAR FACTOR-Y C (NF-YC) homologues NF-YC3, NF-YC4 and NF-YC9 redundantly modulate GA- and ABA-mediated seed germination. These NF-YCs interact with the DELLA protein RGL2, a key repressor of GA signalling. The NF-YC-RGL2 module targets ABI5, a gene encoding a core component of ABA signalling, via specific CCAAT elements and collectively regulates a set of GA- and ABA-responsive genes, thus controlling germination. These results suggest that the NF-YC-RGL2-ABI5 module integrates GA and ABA signalling pathways during seed germination.
Proteins interact with each other to play critical roles in many biological processes in cells. Although promising, laboratory experiments usually suffer from the disadvantages of being time-consuming and labor-intensive. The results obtained are often not robust and considerably uncertain. Due recently to advances in high-throughput technologies, a large amount of proteomics data has been collected and this presents a significant opportunity and also a challenge to develop computational models to predict protein-protein interactions (PPIs) based on these data. In this paper, we present a comprehensive survey of the recent efforts that have been made towards the development of effective computational models for PPI prediction. The survey introduces the algorithms that can be used to learn computational models for predicting PPIs, and it classifies these models into different categories. To understand their relative merits, the paper discusses different validation schemes and metrics to evaluate the prediction performance. Biological databases that are commonly used in different experiments for performance comparison are also described and their use in a series of extensive experiments to compare different prediction models are discussed. Finally, we present some open issues in PPI prediction for future work. We explain how the performance of PPI prediction can be improved if these issues are effectively tackled.
The study of protein-protein interactions (PPIs) can be very important for the understanding of biological cellular functions. However, detecting PPIs in the laboratories are both timeconsuming and expensive. For this reason, there has been much recent effort to develop techniques for computational prediction of PPIs as this can complement laboratory procedures and provide an inexpensive way of predicting the most likely set of interactions at the entire proteome scale. Although much progress has already been achieved in this direction, the problem is still far from being solved. More effective approaches are still required to overcome the limitations of the current ones. In this study, a novel Multi-scale Local Descriptor (MLD) feature representation scheme is proposed to extract features from a protein sequence. This scheme can capture multi-scale local information by varying the length of protein-sequence segments. Based on the MLD, an ensemble learning method, the Random Forest (RF) method, is used as classifier. The MLD feature representation scheme facilitates the mining of interaction information from multi-scale continuous amino acid segments, making it easier to capture multiple overlapping continuous binding patterns within a protein sequence. When the proposed method is tested with the PPI data of Saccharomyces cerevisiae, it achieves a prediction accuracy of 94.72% with 94.34% sensitivity at the precision of 98.91%. Extensive experiments are performed to compare our method with existing sequence-based method. Experimental results show that the performance of our predictor is better than several other state-of-the-art predictors also with the H. pylori dataset. The reason why such good results are achieved can largely be credited to the learning capabilities of the RF model and the novel MLD feature representation scheme. The experiment results show that the proposed approach can be very promising for predicting PPIs and can be a useful tool for future proteomic studies.