RefSeq: an update on mammalian reference sequences

Kim D. Pruitt(National Institutes of Health), Garth Brown(National Center for Biotechnology Information), Susan M. Hiatt(National Center for Biotechnology Information), Françoise Thibaud‐Nissen(National Center for Biotechnology Information), Alexander Astashyn(National Institutes of Health), Olga Ermolaeva(National Institutes of Health), Catherine M. Farrell(National Center for Biotechnology Information), Jennifer Hart(National Center for Biotechnology Information), Melissa Landrum(National Institutes of Health), Kelly M. McGarvey(National Center for Biotechnology Information), Michael R. Murphy(National Institutes of Health), Nuala A. O’Leary(National Institutes of Health), Shashikant Pujar(National Institutes of Health), Bhanu Rajput(National Institutes of Health), Sanjida H Rangwala(National Institutes of Health), Lillian D. Riddick(National Institutes of Health), Andrei Shkeda(National Center for Biotechnology Information), Hanzhen Sun(National Institutes of Health), Pamela Tamez(National Institutes of Health), Raymond E. Tully(National Institutes of Health), Craig Wallin(National Center for Biotechnology Information), David Webb(National Institutes of Health), Janet A. Weber(National Center for Biotechnology Information), Wendy Wu(National Institutes of Health), Michael DiCuccio(National Center for Biotechnology Information), Paul Kitts(National Center for Biotechnology Information), Donna Maglott(National Center for Biotechnology Information), Terence D. Murphy(National Institutes of Health), James M. Ostell(National Center for Biotechnology Information)
Nucleic Acids Research
November 19, 2013
Cited by 1,011Open Access
Full Text

Abstract

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI's eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI's eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.


Related Papers

No related papers found

Powered by citation graph analysis