RefSeq: an update on prokaryotic genome annotation and curation

Daniel H. Haft; Michael DiCuccio; Azat Badretdin; Vyacheslav Brover; Vyacheslav Chetvernin; Kathleen O’Neill; Wenjun Li; Farideh Chitsaz; Myra K. Derbyshire; Noreen R. Gonzales; Marc Gwadz; Fu Lu; Gabriele H. Marchler; James S. Song; Narmada Thanki; Roxanne A. Yamashita; Chanjuan Zheng; Françoise Thibaud‐Nissen; Lewis Y. Geer; Aron Marchler‐Bauer; Kim D. Pruitt

doi:10.1093/nar/gkx1068

RefSeq: an update on prokaryotic genome annotation and curation

Daniel H. Haft(National Institutes of Health), Michael DiCuccio(National Institutes of Health), Azat Badretdin(National Institutes of Health), Vyacheslav Brover(National Institutes of Health), Vyacheslav Chetvernin(National Institutes of Health), Kathleen O’Neill(National Institutes of Health), Wenjun Li(National Institutes of Health), Farideh Chitsaz(National Institutes of Health), Myra K. Derbyshire(National Institutes of Health), Noreen R. Gonzales(National Institutes of Health), Marc Gwadz(National Institutes of Health), Fu Lu(National Institutes of Health), Gabriele H. Marchler(National Institutes of Health), James S. Song(National Institutes of Health), Narmada Thanki(National Institutes of Health), Roxanne A. Yamashita(National Institutes of Health), Chanjuan Zheng(National Institutes of Health), Françoise Thibaud‐Nissen(National Institutes of Health), Lewis Y. Geer(National Institutes of Health), Aron Marchler‐Bauer(National Institutes of Health), Kim D. Pruitt(National Institutes of Health)

Nucleic Acids Research

October 25, 2017

10.1093/nar/gkx1068

Cited by 1,018Open Access

Full Text

Abstract

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) provides annotation for over 95 000 prokaryotic genomes that meet standards for sequence quality, completeness, and freedom from contamination. Genomes are annotated by a single Prokaryotic Genome Annotation Pipeline (PGAP) to provide users with a resource that is as consistent and accurate as possible. Notable recent changes include the development of a hierarchical evidence scheme, a new focus on curating annotation evidence sources, the addition and curation of protein profile hidden Markov models (HMMs), release of an updated pipeline (PGAP-4), and comprehensive re-annotation of RefSeq prokaryotic genomes. Antimicrobial resistance proteins have been reannotated comprehensively, improved structural annotation of insertion sequence transposases and selenoproteins is provided, curated complex domain architectures have given upgraded names to millions of multidomain proteins, and we introduce a new kind of annotation rule-BlastRules. Continual curation of supporting evidence, and propagation of improved names onto RefSeq proteins ensures that the functional annotation of genomes is kept current. An increasing share of our annotation now derives from HMMs and other sets of annotation rules that are portable by nature, and available for download and for reuse by other investigators. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.

Related Papers

No related papers found

Powered by citation graph analysis