Koon-Kiu Yan

Ranking scientific publications using a model of network traffic

D. Walker, Huafeng Xie, Koon-Kiu Yan et al.|Journal of Statistical Mechanics Theory and Experiment|2007

Cited by 244Open Access

To account for strong aging characteristics of citation networks, we modify Google's PageRank algorithm by initially distributing random surfers exponentially with age, in favor of more recent publications. The output of this algorithm, which we call CiteRank, is interpreted as approximate traffic to individual publications in a simple model of how researchers find new information. We develop an analytical understanding of traffic flow in terms of an RPA-like model and optimize parameters of our algorithm to achieve the best performance. The results are compared for two rather different citation networks: all American Physical Society publications and the set of high-energy physics theory (hep-th) preprints. Despite major differences between these two networks, we find that their optimal parameters for the CiteRank algorithm are remarkably similar.

Comparative analysis of regulatory information and circuits across distant species

Alan P. Boyle, Carlos L. Araya, Cathleen Brdlik et al.|Nature|2014

Cited by 232Open Access

Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.

An integrative ENCODE resource for cancer genomics

Jing Zhang, Donghoon Lee, Vineet K. Dhiman et al.|Nature Communications|2020

Cited by 183Open Access

ENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF. Furthermore, we show how our annotation allows us to place oncogenic transformations in the context of a broad cell space; here, many normal-to-tumor transitions move towards a stem-like state, while oncogene knockdowns show an opposing trend. Finally, we organize the resource into a coherent workflow to prioritize key elements and variants, in addition to regulators. We showcase the application of this prioritization to somatic burdening, cancer differential expression and GWAS. Targeted validations of the prioritized regulators, elements and variants using siRNA knockdowns, CRISPR-based editing, and luciferase assays demonstrate the value of the ENCODE resource.

Upstream plasticity and downstream robustness in evolution of molecular networks

Sergei Maslov, Kim Sneppen, Kasper Astrup Eriksen et al.|BMC Evolutionary Biology|2004

Cited by 66Open Access

BACKGROUND: Gene duplication followed by the functional divergence of the resulting pair of paralogous proteins is a major force shaping molecular networks in living organisms. Recent species-wide data for protein-protein interactions and transcriptional regulations allow us to assess the effect of gene duplication on robustness and plasticity of these molecular networks. RESULTS: We demonstrate that the transcriptional regulation of duplicated genes in baker's yeast Saccharomyces cerevisiae diverges fast so that on average they lose 3% of common transcription factors for every 1% divergence of their amino acid sequences. The set of protein-protein interaction partners of their protein products changes at a slower rate exhibiting a broad plateau for amino acid sequence similarity above 70%. The stability of functional roles of duplicated genes at such relatively low sequence similarity is further corroborated by their ability to substitute for each other in single gene knockout experiments in yeast and RNAi experiments in a nematode worm Caenorhabditis elegans. We also quantified the divergence rate of physical interaction neighborhoods of paralogous proteins in a bacterium Helicobacter pylori and a fly Drosophila melanogaster. However, in the absence of system-wide data on transcription factors' binding in these organisms we could not compare this rate to that of transcriptional regulation of duplicated genes. CONCLUSIONS: For all molecular networks studied in this work we found that even the most distantly related paralogous proteins with amino acid sequence identities around 20% on average have more similar positions within a network than a randomly selected pair of proteins. For yeast we also found that the upstream regulation of genes evolves more rapidly than downstream functions of their protein products. This is in accordance with a view which puts regulatory changes as one of the main driving forces of the evolution. In this context a very important open question is to what extent our results obtained for homologous genes within a single species (paralogs) carries over to homologous proteins in different species (orthologs).

Parameters of proteome evolution from histograms of amino-acid sequence identities of paralogous proteins

Jacob Bock Axelsen, Koon-Kiu Yan, Sergei Maslov|Biology Direct|2007

Cited by 9Open Access

Abstract Background The evolution of the full repertoire of proteins encoded in a given genome is mostly driven by gene duplications, deletions, and sequence modifications of existing proteins. Indirect information about relative rates and other intrinsic parameters of these three basic processes is contained in the proteome-wide distribution of sequence identities of pairs of paralogous proteins. Results We introduce a simple mathematical framework based on a stochastic birth-and-death model that allows one to extract some of this information and apply it to the set of all pairs of paralogous proteins in H. pylori , E. coli , S. cerevisiae , C. elegans , D. melanogaster , and H. sapiens . It was found that the histogram of sequence identities p generated by an all-to-all alignment of all protein sequences encoded in a genome is well fitted with a power-law form ~ p - γ with the value of the exponent γ around 4 for the majority of organisms used in this study. This implies that the intra-protein variability of substitution rates is best described by the Gamma-distribution with the exponent α ≈ 0.33. Different features of the shape of such histograms allow us to quantify the ratio between the genome-wide average deletion/duplication rates and the amino-acid substitution rate. Conclusion We separately measure the short-term ("raw") duplication and deletion rates <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:semantics> <mml:mrow> <mml:msubsup> <mml:mi>r</mml:mi> <mml:mrow> <mml:mtext>dup</mml:mtext> </mml:mrow> <mml:mo>∗</mml:mo> </mml:msubsup> </mml:mrow> <mml:annotation>MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemOCai3aa0baaSqaaiabbsgaKjabbwha1jabbchaWbqaaiabgEHiQaaaaaa@3283@</mml:annotation> </mml:semantics> </mml:math> , <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:semantics> <mml:mrow> <mml:msubsup> <mml:mi>r</mml:mi> <mml:mrow> <mml:mtext>del</mml:mtext> </mml:mrow> <mml:mo>∗</mml:mo> </mml:msubsup> </mml:mrow> <mml:annotation>MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemOCai3aa0baaSqaaiabbsgaKjabbwgaLjabbYgaSbqaaiabgEHiQaaaaaa@325B@</mml:annotation> </mml:semantics> </mml:math> which include gene copies that will be removed soon after the duplication event and their dramatically reduced long-term counterparts r dup , r del . High deletion rate among recently duplicated proteins is consistent with a scenario in which they didn't have enough time to significantly change their functional roles and thus are to a large degree disposable. Systematic trends of each of the four duplication/deletion rates with the total number of genes in the genome were analyzed. All but the deletion rate of recent duplicates <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:semantics> <mml:mrow> <mml:msubsup> <mml:mi>r</mml:mi> <mml:mrow> <mml:mtext>del</mml:mtext> </mml:mrow> <mml:mo>∗</mml:mo> </mml:msubsup> </mml:mrow> <mml:annotation>MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemOCai3aa0baaSqaaiabbsgaKjabbwgaLjabbYgaSbqaaiabgEHiQaaaaaa@325B@</mml:annotation> </mml:semantics> </mml:math> were shown to systematically increase with N genes . Abnormally flat shapes of sequence identity histograms observed for yeast and human are consistent with lineages leading to these organisms undergoing one or more whole-genome duplications. This interpretation is corroborated by our analysis of the genome of Paramecium tetraurelia where the p -4 profile of the histogram is gradually restored by the successive removal of paralogs generated in its four known whole-genome duplication events.

Is this you? Claim your profile.

Top publicationsby citations