Yvonne A. Evrard

TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository

Yingdong Zhao, Ming‐Chung Li, Mariam M. Konaté et al.|Journal of Translational Medicine|2021

Cited by 511Open Access

BACKGROUND: In order to correctly decode phenotypic information from RNA-sequencing (RNA-seq) data, careful selection of the RNA-seq quantification measure is critical for inter-sample comparisons and for downstream analyses, such as differential gene expression between two or more conditions. Several methods have been proposed and continue to be used. However, a consensus has not been reached regarding the best gene expression quantification method for RNA-seq data analysis. METHODS: In the present study, we used replicate samples from each of 20 patient-derived xenograft (PDX) models spanning 15 tumor types, for a total of 61 human tumor xenograft samples available through the NCI patient-derived model repository (PDMR). We compared the reproducibility across replicate samples based on TPM (transcripts per million), FPKM (fragments per kilobase of transcript per million fragments mapped), and normalized counts using coefficient of variation, intraclass correlation coefficient, and cluster analysis. RESULTS: Our results revealed that hierarchical clustering on normalized count data tended to group replicate samples from the same PDX model together more accurately than TPM and FPKM data. Furthermore, normalized count data were observed to have the lowest median coefficient of variation (CV), and highest intraclass correlation (ICC) values across all replicate samples from the same model and for the same gene across all PDX models compared to TPM and FPKM data. CONCLUSION: We provided compelling evidence for a preferred quantification measure to conduct downstream analyses of PDX RNA-seq data. To our knowledge, this is the first comparative study of RNA-seq data quantification measures conducted on PDX models, which are known to be inherently more variable than cell line models. Our findings are consistent with what others have shown for human tumors and cell lines and add further support to the thesis that normalized counts are the best choice for the analysis of RNA-seq data across samples.

lunatic fringe is an essential mediator of somite segmentation and patterning

Yvonne A. Evrard, Yi Lun, Alexander Aulehla et al.|Nature|1998

Cited by 395

Loss of Gcn5l2 leads to increased apoptosis and mesodermal defects during mouse development

Wanting Xu, Diane G. Edmondson, Yvonne A. Evrard et al.|Nature Genetics|2000

Cited by 262

Mediation of CTCF transcriptional insulation by DEAD-box RNA-binding protein p68 and steroid receptor RNA activator SRA

Hongjie Yao, Kevin Brick, Yvonne A. Evrard et al.|Genes & Development|2010

Cited by 239Open Access

CCCTC-binding factor (CTCF) is a DNA-binding protein that plays important roles in chromatin organization, although the mechanism by which CTCF carries out these functions is not fully understood. Recent studies show that CTCF recruits the cohesin complex to insulator sites and that cohesin is required for insulator activity. Here we showed that the DEAD-box RNA helicase p68 (DDX5) and its associated noncoding RNA, steroid receptor RNA activator (SRA), form a complex with CTCF that is essential for insulator function. p68 was detected at CTCF sites in the IGF2/H19 imprinted control region (ICR) as well as other genomic CTCF sites. In vivo depletion of SRA or p68 reduced CTCF-mediated insulator activity at the IGF2/H19 ICR, increased levels of IGF2 expression, and increased interactions between the endodermal enhancer and IGF2 promoter. p68/SRA also interacts with members of the cohesin complex. Depletion of either p68 or SRA does not affect CTCF binding to its genomic sites, but does reduce cohesin binding. The results suggest that p68/SRA stabilizes the interaction of cohesin with CTCF by binding to both, and is required for proper insulator function.

Converting tabular data into images for deep learning with convolutional neural networks

Yitan Zhu, Thomas Brettin, Fangfang Xia et al.|Scientific Reports|2021

Cited by 191Open Access

Convolutional neural networks (CNNs) have been successfully used in many applications where important information about data is embedded in the order of features, such as speech and imaging. However, most tabular data do not assume a spatial relationship between features, and thus are unsuitable for modeling using CNNs. To meet this challenge, we develop a novel algorithm, image generator for tabular data (IGTD), to transform tabular data into images by assigning features to pixel positions so that similar features are close to each other in the image. The algorithm searches for an optimized assignment by minimizing the difference between the ranking of distances between features and the ranking of distances between their assigned pixels in the image. We apply IGTD to transform gene expression profiles of cancer cell lines (CCLs) and molecular descriptors of drugs into their respective image representations. Compared with existing transformation methods, IGTD generates compact image representations with better preservation of feature neighborhood structure. Evaluated on benchmark drug screening datasets, CNNs trained on IGTD image representations of CCLs and drugs exhibit a better performance of predicting anti-cancer drug response than both CNNs trained on alternative image representations and prediction models trained on the original tabular data.

Is this you? Claim your profile.

Top publicationsby citations