Xiguo Yuan

Identification of Molecular Pathway Aberrations in Uterine Serous Carcinoma by Genome-wide Analyses

Elisabetta Kuhn, Ren‐Chin Wu, Bin Guan et al.|JNCI Journal of the National Cancer Institute|2012

Cited by 248Open Access

BACKGROUND: Uterine cancer is the fourth most common malignancy in women, and uterine serous carcinoma is the most aggressive subtype. However, the molecular pathogenesis of uterine serous carcinoma is largely unknown. We analyzed the genomes of uterine serous carcinoma samples to better understand the molecular genetic characteristics of this cancer. METHODS: Whole-exome sequencing was performed on 10 uterine serous carcinomas and the matched normal blood or tissue samples. Somatically acquired sequence mutations were further verified by Sanger sequencing. The most frequent molecular genetic changes were further validated by Sanger sequencing in 66 additional uterine serous carcinomas and in nine serous endometrial intraepithelial carcinomas (the preinvasive precursor of uterine serous carcinoma) that were isolated by laser capture microdissection. In addition, gene copy number was characterized by single-nucleotide polymorphism (SNP) arrays in 23 uterine serous carcinomas, including 10 that were subjected to whole-exome sequencing. RESULTS: We found frequent somatic mutations in TP53 (81.6%), PIK3CA (23.7%), FBXW7 (19.7%), and PPP2R1A (18.4%) among the 76 uterine serous carcinomas examined. All nine serous carcinomas that had an associated serous endometrial intraepithelial carcinoma had concordant PIK3CA, PPP2R1A, and TP53 mutation status between uterine serous carcinoma and the concurrent serous endometrial intraepithelial carcinoma component. DNA copy number analysis revealed frequent genomic amplification of the CCNE1 locus (which encodes cyclin E, a known substrate of FBXW7) and deletion of the FBXW7 locus. Among 23 uterine serous carcinomas that were subjected to SNP array analysis, seven tumors with FBXW7 mutations (four tumors with point mutations, three tumors with hemizygous deletions) did not have CCNE1 amplification, and 13 (57%) tumors had either a molecular genetic alteration in FBXW7 or CCNE1 amplification. Nearly half of these uterine serous carcinomas (48%) harbored PIK3CA mutation and/or PIK3CA amplification. CONCLUSION: Molecular genetic aberrations involving the p53, cyclin E-FBXW7, and PI3K pathways represent major mechanisms in the development of uterine serous carcinoma.

An Overview of Population Genetic Data Simulation

Xiguo Yuan, David J. Miller, Junying Zhang et al.|Journal of Computational Biology|2011

Cited by 103Open Access

Simulation studies in population genetics play an important role in helping to better understand the impact of various evolutionary and demographic scenarios on sequence variation and sequence patterns, and they also permit investigators to better assess and design analytical methods in the study of disease-associated genetic factors. To facilitate these studies, it is imperative to develop simulators with the capability to accurately generate complex genomic data under various genetic models. Currently, a number of efficient simulation software packages for large-scale genomic data are available, and new simulation programs with more sophisticated capabilities and features continue to emerge. In this article, we review the three basic simulation frameworks--coalescent, forward, and resampling--and some of the existing simulators that fall under these frameworks, comparing them with respect to their evolutionary and demographic scenarios, their computational complexity, and their specific applications. Additionally, we address some limitations in current simulation algorithms and discuss future challenges in the development of more powerful simulation tools.

Comparative study of whole exome sequencing-based copy number variation detection tools

Lanling Zhao, Han Liu, Xiguo Yuan et al.|BMC Bioinformatics|2020

Cited by 75Open Access

BACKGROUND: With the rapid development of whole exome sequencing (WES), an increasing number of tools are being proposed for copy number variation (CNV) detection based on this technique. However, no comprehensive guide is available for the use of these tools in clinical settings, which renders them inapplicable in practice. To resolve this problem, in this study, we evaluated the performances of four WES-based CNV tools, and established a guideline for the recommendation of a suitable tool according to the application requirements. RESULTS: In this study, first, we selected four WES-based CNV detection tools: CoNIFER, cn.MOPS, CNVkit and exomeCopy. Then, we evaluated their performances in terms of three aspects: sensitivity and specificity, overlapping consistency and computational costs. From this evaluation, we obtained four main results: (1) The sensitivity increases and subsequently stabilizes as the coverage or CNV size increases, while the specificity decreases. (2) CoNIFER performs better for CNV insertions than for CNV deletions, while the remaining tools exhibit the opposite trend. (3) CoNIFER, cn.MOPS and CNVkit realize satisfactory overlapping consistency, which indicates their results are trustworthy. (4) CoNIFER has the best space complexity and cn.MOPS has the best time complexity among these four tools. Finally, we established a guideline for tools' usage according to these results. CONCLUSION: No available tool performs excellently under all conditions; however, some tools perform excellently in some scenarios. Users can obtain a CNV tool recommendation from our paper according to the targeted CNV size, the CNV type or computational costs of their projects, as presented in Table 1, which is helpful even for users with limited knowledge of computer science.

FHSA-SED: Two-Locus Model Detection for Genome-Wide Association Study with Harmony Search Algorithm

Shouheng Tuo, Junying Zhang, Xiguo Yuan et al.|PLoS ONE|2016

Cited by 73Open Access

MOTIVATION: Two-locus model is a typical significant disease model to be identified in genome-wide association study (GWAS). Due to intensive computational burden and diversity of disease models, existing methods have drawbacks on low detection power, high computation cost, and preference for some types of disease models. METHOD: In this study, two scoring functions (Bayesian network based K2-score and Gini-score) are used for characterizing two SNP locus as a candidate model, the two criteria are adopted simultaneously for improving identification power and tackling the preference problem to disease models. Harmony search algorithm (HSA) is improved for quickly finding the most likely candidate models among all two-locus models, in which a local search algorithm with two-dimensional tabu table is presented to avoid repeatedly evaluating some disease models that have strong marginal effect. Finally G-test statistic is used to further test the candidate models. RESULTS: We investigate our method named FHSA-SED on 82 simulated datasets and a real AMD dataset, and compare it with two typical methods (MACOED and CSE) which have been developed recently based on swarm intelligent search algorithm. The results of simulation experiments indicate that our method outperforms the two compared algorithms in terms of detection power, computation time, evaluation times, sensitivity (TPR), specificity (SPC), positive predictive value (PPV) and accuracy (ACC). Our method has identified two SNPs (rs3775652 and rs10511467) that may be also associated with disease in AMD dataset.

Inferring subgroup-specific driver genes from heterogeneous cancer samples via subspace learning with subgroup indication

Jianing Xi, Xiguo Yuan, Minghui Wang et al.|Bioinformatics|2019

Cited by 65Open Access

MOTIVATION: Detecting driver genes from gene mutation data is a fundamental task for tumorigenesis research. Due to the fact that cancer is a heterogeneous disease with various subgroups, subgroup-specific driver genes are the key factors in the development of precision medicine for heterogeneous cancer. However, the existing driver gene detection methods are not designed to identify subgroup specificities of their detected driver genes, and therefore cannot indicate which group of patients is associated with the detected driver genes, which is difficult to provide specifically clinical guidance for individual patients. RESULTS: By incorporating the subspace learning framework, we propose a novel bioinformatics method called DriverSub, which can efficiently predict subgroup-specific driver genes in the situation where the subgroup annotations are not available. When evaluated by simulation datasets with known ground truth and compared with existing methods, DriverSub yields the best prediction of driver genes and the inference of their related subgroups. When we apply DriverSub on the mutation data of real heterogeneous cancers, we can observe that the predicted results of DriverSub are highly enriched for experimentally validated known driver genes. Moreover, the subgroups inferred by DriverSub are significantly associated with the annotated molecular subgroups, indicating its capability of predicting subgroup-specific driver genes. AVAILABILITY AND IMPLEMENTATION: The source code is publicly available at https://github.com/JianingXi/DriverSub. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Is this you? Claim your profile.

Top publicationsby citations