NCBI RefSeq: reference sequence standards through 25 years of curation and annotation

Tamara Goldfarb; Vamsi K. Kodali; Shashikant Pujar; Vyacheslav Brover; Barbara Robbertse; Catherine M. Farrell; Dong‐Ha Oh; Alexander Astashyn; Olga Ermolaeva; Diana Haddad; Wratko Hlavina; Jinna Hoffman; John D. Jackson; Vinita Joardar; David M. Kristensen; Patrick Masterson; Kelly M. McGarvey; Richard McVeigh; Eyal Mozes; Michael R. Murphy; Susan S Schafer; Alexander Souvorov; Brett Spurrier; Pooja K Strope; Hanzhen Sun; Anjana R. Vatsan; Craig Wallin; David Webb; J. Rodney Brister; Eneida Hatcher; Avi Kimchi; William Klimke; Aron Marchler‐Bauer; Kim D. Pruitt; Françoise Thibaud-Nissen; Terence D. Murphy

doi:10.1093/nar/gkae1038

NCBI RefSeq: reference sequence standards through 25 years of curation and annotation

Tamara Goldfarb(National Institutes of Health), Vamsi K. Kodali(National Institutes of Health), Shashikant Pujar(National Institutes of Health), Vyacheslav Brover(National Institutes of Health), Barbara Robbertse(National Institutes of Health), Catherine M. Farrell(National Institutes of Health), Dong‐Ha Oh(National Institutes of Health), Alexander Astashyn(National Institutes of Health), Olga Ermolaeva(National Institutes of Health), Diana Haddad(National Institutes of Health), Wratko Hlavina(National Institutes of Health), Jinna Hoffman(National Institutes of Health), John D. Jackson(National Institutes of Health), Vinita Joardar(National Institutes of Health), David M. Kristensen(National Institutes of Health), Patrick Masterson(National Institutes of Health), Kelly M. McGarvey(National Institutes of Health), Richard McVeigh(National Institutes of Health), Eyal Mozes(National Institutes of Health), Michael R. Murphy(National Institutes of Health), Susan S Schafer(National Institutes of Health), Alexander Souvorov(National Institutes of Health), Brett Spurrier(National Institutes of Health), Pooja K Strope(National Institutes of Health), Hanzhen Sun(National Institutes of Health), Anjana R. Vatsan(National Institutes of Health), Craig Wallin(National Institutes of Health), David Webb(National Institutes of Health), J. Rodney Brister(National Institutes of Health), Eneida Hatcher(National Institutes of Health), Avi Kimchi(National Institutes of Health), William Klimke(National Institutes of Health), Aron Marchler‐Bauer(National Institutes of Health), Kim D. Pruitt(National Institutes of Health), Françoise Thibaud-Nissen(National Institutes of Health), Terence D. Murphy(National Institutes of Health)

Nucleic Acids Research

November 11, 2024

10.1093/nar/gkae1038

Cited by 252Open Access

Full Text

Abstract

Reference sequences and annotations serve as the foundation for many lines of research today, from organism and sequence identification to providing a core description of the genes, transcripts and proteins found in an organism's genome. Interpretation of data including transcriptomics, proteomics, sequence variation and comparative analyses based on reference gene annotations informs our understanding of gene function and possible disease mechanisms, leading to new biomedical discoveries. The Reference Sequence (RefSeq) resource created at the National Center for Biotechnology Information (NCBI) leverages both automatic processes and expert curation to create a robust set of reference sequences of genomic, transcript and protein data spanning the tree of life. RefSeq continues to refine its annotation and quality control processes and utilize better quality genomes resulting from advances in sequencing technologies as well as RNA-Seq data to produce high-quality annotated genomes, ortholog predictions across more organisms and other products that are easily accessible through multiple NCBI resources. This report summarizes the current status of the eukaryotic, prokaryotic and viral RefSeq resources, with a focus on eukaryotic annotation, the increase in taxonomic representation and the effect it will have on comparative genomics. The RefSeq resource is publicly accessible at https://www.ncbi.nlm.nih.gov/refseq.

Alexander Dobin, Carrie Davis, Felix Schlesinger et al.|Bioinformatics|2012|55.7k

featureCounts: an efficient general purpose program for assigning sequence reads to genomic features

Yang Liao, Gordon K. Smyth, Wei Shi|Bioinformatics|2013|28.8k

BLAST+: architecture and applications

Christiam Camacho, George Coulouris, Vahram Avagyan et al.|BMC Bioinformatics|2009|23k

The FAIR Guiding Principles for scientific data management and stewardship

Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg et al.|Scientific Data|2016|17.5k

CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

Donovan H. Parks, Michael Imelfort, Connor T. Skennerton et al.|Genome Research|2015|12.6k

NCBI RefSeq: reference sequence standards through 25 years of curation and annotation

Abstract

Related Papers