NCBI RefSeq: reference sequence standards through 25 years of curation and annotation

Tamara Goldfarb(National Institutes of Health), Vamsi K. Kodali(National Institutes of Health), Shashikant Pujar(National Institutes of Health), Vyacheslav Brover(National Institutes of Health), Barbara Robbertse(National Institutes of Health), Catherine M. Farrell(National Institutes of Health), Dong‐Ha Oh(National Institutes of Health), Alexander Astashyn(National Institutes of Health), Olga Ermolaeva(National Institutes of Health), Diana Haddad(National Institutes of Health), Wratko Hlavina(National Institutes of Health), Jinna Hoffman(National Institutes of Health), John D. Jackson(National Institutes of Health), Vinita Joardar(National Institutes of Health), David M. Kristensen(National Institutes of Health), Patrick Masterson(National Institutes of Health), Kelly M. McGarvey(National Institutes of Health), Richard McVeigh(National Institutes of Health), Eyal Mozes(National Institutes of Health), Michael R. Murphy(National Institutes of Health), Susan S Schafer(National Institutes of Health), Alexander Souvorov(National Institutes of Health), Brett Spurrier(National Institutes of Health), Pooja K Strope(National Institutes of Health), Hanzhen Sun(National Institutes of Health), Anjana R. Vatsan(National Institutes of Health), Craig Wallin(National Institutes of Health), David Webb(National Institutes of Health), J. Rodney Brister(National Institutes of Health), Eneida Hatcher(National Institutes of Health), Avi Kimchi(National Institutes of Health), William Klimke(National Institutes of Health), Aron Marchler‐Bauer(National Institutes of Health), Kim D. Pruitt(National Institutes of Health), Françoise Thibaud-Nissen(National Institutes of Health), Terence D. Murphy(National Institutes of Health)
Nucleic Acids Research
November 11, 2024
Cited by 252Open Access
Full Text

Abstract

Reference sequences and annotations serve as the foundation for many lines of research today, from organism and sequence identification to providing a core description of the genes, transcripts and proteins found in an organism's genome. Interpretation of data including transcriptomics, proteomics, sequence variation and comparative analyses based on reference gene annotations informs our understanding of gene function and possible disease mechanisms, leading to new biomedical discoveries. The Reference Sequence (RefSeq) resource created at the National Center for Biotechnology Information (NCBI) leverages both automatic processes and expert curation to create a robust set of reference sequences of genomic, transcript and protein data spanning the tree of life. RefSeq continues to refine its annotation and quality control processes and utilize better quality genomes resulting from advances in sequencing technologies as well as RNA-Seq data to produce high-quality annotated genomes, ortholog predictions across more organisms and other products that are easily accessible through multiple NCBI resources. This report summarizes the current status of the eukaryotic, prokaryotic and viral RefSeq resources, with a focus on eukaryotic annotation, the increase in taxonomic representation and the effect it will have on comparative genomics. The RefSeq resource is publicly accessible at https://www.ncbi.nlm.nih.gov/refseq.


Related Papers