F

Fábio C. P. Navarro

Universidade Federal de Alfenas

ORCID: 0000-0002-5640-9070

Publishes on Cancer Genomics and Diagnostics, Immunotherapy and Immune Responses, Monoclonal and Polyclonal Antibodies Research. 178 papers and 35.4k citations.

178Publications
35.4kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

GENCODE reference annotation for the human and mouse genomes
Adam Frankish, Mark Diekhans, Anne-Maud Ferreira et al.|Nucleic Acids Research|2018
Cited by 3.5kOpen Access

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.

GENCODE 2021
Adam Frankish, Mark Diekhans, Irwin Jungreis et al.|Nucleic Acids Research|2020
Cited by 1.5kOpen Access

The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.

Comprehensive functional genomic resource and integrative model for the human brain
Cited by 1.1k

INTRODUCTION Strong genetic associations have been found for a number of psychiatric disorders. However, understanding the underlying molecular mechanisms remains challenging. RATIONALE To address this challenge, the PsychENCODE Consortium has developed a comprehensive online resource and integrative models for the functional genomics of the human brain. RESULTS The base of the pyramidal resource is the datasets generated by PsychENCODE, including bulk transcriptome, chromatin, genotype, and Hi-C datasets and single-cell transcriptomic data from ~32,000 cells for major brain regions. We have merged these with data from Genotype-Tissue Expression (GTEx), ENCODE, Roadmap Epigenomics, and single-cell analyses. Via uniform processing, we created a harmonized resource, allowing us to survey functional genomics data on the brain over a sample size of 1866 individuals. From this uniformly processed dataset, we created derived data products. These include lists of brain-expressed genes, coexpression modules, and single-cell expression profiles for many brain cell types; ~79,000 brain-active enhancers with associated Hi-C loops and topologically associating domains; and ~2.5 million expression quantitative-trait loci (QTLs) comprising ~238,000 linkage-disequilibrium–independent single-nucleotide polymorphisms and of other types of QTLs associated with splice isoforms, cell fractions, and chromatin activity. By using these, we found that >88% of the cross-population variation in brain gene expression can be accounted for by cell fraction changes. Furthermore, a number of disorders and aging are associated with changes in cell-type proportions. The derived data also enable comparison between the brain and other tissues. In particular, by using spectral analyses, we found that the brain has distinct expression and epigenetic patterns, including a greater extent of noncoding transcription than other tissues. The top level of the resource consists of integrative networks for regulation and machine-learning models for disease prediction. The networks include a full gene regulatory network (GRN) for the brain, linking transcription factors, enhancers, and target genes from merging of the QTLs, generalized element-activity correlations, and Hi-C data. By using this network, we link disease genes to genome-wide association study (GWAS) variants for psychiatric disorders. For schizophrenia, we linked 321 genes to the 142 reported GWAS loci. We then embedded the regulatory network into a deep-learning model to predict psychiatric phenotypes from genotype and expression. Our model gives a ~6-fold improvement in prediction over additive polygenic risk scores. Moreover, it achieves a ~3-fold improvement over additive models, even when the gene expression data are imputed, highlighting the value of having just a small amount of transcriptome data for disease prediction. Lastly, it highlights key genes and pathways associated with disorder prediction, including immunological, synaptic, and metabolic pathways, recapitulating de novo results from more targeted analyses. CONCLUSION Our resource and integrative analyses have uncovered genomic elements and networks in the brain, which in turn have provided insight into the molecular mechanisms underlying psychiatric disorders. Our deep-learning model improves disease risk prediction over traditional approaches and can be extended with additional data types (e.g., microRNA and neuroimaging). A comprehensive functional genomic resource for the adult human brain. The resource forms a three-layer pyramid. The bottom layer includes sequencing datasets for traits, such as schizophrenia. The middle layer represents derived datasets, including functional genomic elements and QTLs. The top layer contains integrated models, which link genotypes to phenotypes. DSPN, Deep Structured Phenotype Network; PC1 and PC2, principal components 1 and 2; ref, reference; alt, alternate; H3K27ac, histone H3 acetylation at lysine 27.

Multi-platform discovery of haplotype-resolved structural variation in human genomes
Mark Chaisson, Ashley D. Sanders, Xuefang Zhao et al.|Nature Communications|2019
Cited by 1kOpen Access

The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per genome. We also discover 156 inversions per genome and 58 of the inversions intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a three to sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The methods and the dataset presented serve as a gold standard for the scientific community allowing us to make recommendations for maximizing structural variation sensitivity for future genome sequencing studies.

Patterns of somatic structural variation in human cancer genomes
Cited by 983Open Access

Abstract A key mutational process in cancer is structural variation, in which rearrangements delete, amplify or reorder genomic segments that range in size from kilobases to whole chromosomes 1–7 . Here we develop methods to group, classify and describe somatic structural variants, using data from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), which aggregated whole-genome sequencing data from 2,658 cancers across 38 tumour types 8 . Sixteen signatures of structural variation emerged. Deletions have a multimodal size distribution, assort unevenly across tumour types and patients, are enriched in late-replicating regions and correlate with inversions. Tandem duplications also have a multimodal size distribution, but are enriched in early-replicating regions—as are unbalanced translocations. Replication-based mechanisms of rearrangement generate varied chromosomal structures with low-level copy-number gains and frequent inverted rearrangements. One prominent structure consists of 2–7 templates copied from distinct regions of the genome strung together within one locus. Such cycles of templated insertions correlate with tandem duplications, and—in liver cancer—frequently activate the telomerase gene TERT . A wide variety of rearrangement processes are active in cancer, which generate complex configurations of the genome upon which selection can act.