Xuzhou Medical College
ORCID: 0000-0002-3398-4267Publishes on Spinal Cord Injury Research, Gene expression and cancer classification, MicroRNA in disease regulation. 70 papers and 6.7k citations.
Add your photo, update your bio, and get notified when your ranking changes.
Cancer is driven by mutation. Worldwide, tobacco smoking is the principal lifestyle exposure that causes cancer, exerting carcinogenicity through >60 chemicals that bind and mutate DNA. Using massively parallel sequencing technology, we sequenced a small-cell lung cancer cell line, NCI-H209, to explore the mutational burden associated with tobacco smoking. A total of 22,910 somatic substitutions were identified, including 134 in coding exons. Multiple mutation signatures testify to the cocktail of carcinogens in tobacco smoke and their proclivities for particular bases and surrounding sequence context. Effects of transcription-coupled repair and a second, more general, expression-linked repair pathway were evident. We identified a tandem duplication that duplicates exons 3–8 of CHD7 in frame, and another two lines carrying PVT1–CHD7 fusion genes, indicating that CHD7 may be recurrently rearranged in this disease. These findings illustrate the potential for next-generation sequencing to provide unprecedented insights into mutational processes, cellular repair pathways and gene networks associated with cancer. The two cancer genome sequences presented in this issue demonstrate how next-generation sequencing technologies can inform us about mutational processes, repair pathways and gene networks associated with cancer development. First, the genome of a cell line derived from a bone marrow metastasis in a patient who had small-cell lung cancer. This cancer is typical of the type induced by smoking, and the sequence contains mutation signatures characteristic of some of the more than 60 carcinogens present in tobacco smoke. The second paper compares the whole genome sequence of a melanoma cell line to a lymphoblastoid cell line from the same individual. This, the first complete mutational analysis of a solid tumour, reveals a dominant mutational signature reflecting DNA damage due to exposure to ultraviolet light. Tobacco smoke contains more than sixty carcinogens that bind and mutate DNA. Here, massively parallel sequencing technology is used to sequence a small-cell lung cancer cell line, exploring the mutational burden associated with tobacco smoking. Multiple mutation signatures from the cocktail of carcinogens in tobacco smoke are found, as well as evidence of transcription-coupled repair and another, more general, expression-linked repair pathway.
BACKGROUND: Reproducibility is a fundamental requirement in scientific experiments. Some recent publications have claimed that microarrays are unreliable because lists of differentially expressed genes (DEGs) are not reproducible in similar experiments. Meanwhile, new statistical methods for identifying DEGs continue to appear in the scientific literature. The resultant variety of existing and emerging methods exacerbates confusion and continuing debate in the microarray community on the appropriate choice of methods for identifying reliable DEG lists. RESULTS: Using the data sets generated by the MicroArray Quality Control (MAQC) project, we investigated the impact on the reproducibility of DEG lists of a few widely used gene selection procedures. We present comprehensive results from inter-site comparisons using the same microarray platform, cross-platform comparisons using multiple microarray platforms, and comparisons between microarray results and those from TaqMan - the widely regarded "standard" gene expression platform. Our results demonstrate that (1) previously reported discordance between DEG lists could simply result from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion with a non-stringent P-value cutoff filtering, the DEG lists become much more reproducible, especially when fewer genes are selected as differentially expressed, as is the case in most microarray studies; and (3) the instability of short DEG lists solely based on P-value ranking is an expected mathematical consequence of the high variability of the t-values; the more stringent the P-value threshold, the less reproducible the DEG list is. These observations are also consistent with results from extensive simulation calculations. CONCLUSION: We recommend the use of FC-ranking plus a non-stringent P cutoff as a straightforward and baseline practice in order to generate more reproducible DEG lists. Specifically, the P-value cutoff should not be stringent (too small) and FC should be as large as possible. Our results provide practical guidance to choose the appropriate FC and P-value cutoffs when selecting a given number of DEGs. The FC criterion enhances reproducibility, whereas the P criterion balances sensitivity and specificity.
MicroRNAs (miRNAs) are believed to play important roles in developmental and other cellular processes by hybridizing to complementary target mRNA transcripts. This results in either cleavage of the hybridized transcript or negative regulation of translation. Little is known about the regulation or pattern of miRNA expression. The predicted presence of numerous miRNA sequences in higher eukaryotes makes it highly likely that the expression levels of individual miRNA molecules themselves should play an important role in regulating multiple cellular processes. Therefore, determining the pattern of global miRNA expression levels in mammals and other higher eukaryotes is essential to help understand both the mechanism of miRNA transcriptional regulation as well as to help identify miRNA regulated gene expression. Here, we describe a novel method to detect global processed miRNA expression levels in higher eukaryotes, including human, mouse and rats, by using a high-density oligonucleotide array. Array results have been validated by subsequent confirmation of mir expression using northern-blot analysis. Major differences in mir expression have been detected in samples from diverse sources, suggesting highly regulated mir expression, and specific gene regulatory functions for individual miRNA transcripts. For example, five different miRNAs were found to be preferentially expressed in human kidney compared with other human tissues. Comparative analysis of surrounding genomic sequences of the kidney-specific miRNA clusters revealed the occurrence of specific transcription factor binding sites located in conserved phylogenetic foot prints, suggesting that these may be involved in regulating mir expression in kidney.