Database Divisions and Homology Search Files: A Guide for the Perplexed

B. F. Francis Ouellette; Mark S. Boguski

doi:10.1101/gr.7.10.952

Database Divisions and Homology Search Files: A Guide for the Perplexed

B. F. Francis Ouellette(National Institutes of Health), Mark S. Boguski(National Institutes of Health)

Genome Research

October 1, 1997

10.1101/gr.7.10.952

Cited by 82Open Access

Full Text

Abstract

The exponential growth of DNA sequence data has become a challenge for both end users and database curators alike. When one of us (M.S.B.) was finishing graduate school, GenBankt (release 42) contained a mere 6.7 Mb in 9700 sequences. However, as we write this, GenBank (Benson et al. 1997) has topped 1000 Mb in >1.6 million sequences (release 102). (Information on GenBank releases is available at ftp:// ncbi.nlm.nih.gov/genbank/gbrel.txt). The National Center for Biotechnology Information (NCBI) and its partners in the international database collaboration—the DNA Database of Japan (DDBJ) and the European Molecular Biology Laboratory (EMBL)—all strive to collect, manage, and distribute this data in the most efficient and usable manner possible. These organizations also provide homology search, database query, and information retrieval services that serve the general molecular biology community as well as more specialized users. Unfortunately, it is easy to become confused about the many ways in which the data are made available for downloading, homology searching, and more general information retrieval purposes. We hope to clarify some of these issues here, with an emphasis on the manner in which high-throughput genomic sequence is processed, distributed, and made available for BLAST searching. We will emphasize services provided through NCBI but also note comparable services at European Bioinformatics Institute and the slight differences between GenBank, DDBJ, and the EMBL Data Library.

Mark S. Boguski, Todd M. Lowe, Carolyn M. Tolstoshev|Nature Genetics|1993|1.3k

Issues in searching molecular sequence databases

Stephen F. Altschul, Mark S. Boguski, Warren Gish et al.|Nature Genetics|1994|786

[10] Entrez: Molecular biology database and retrieval system

Gregory D. Schuler, Jonathan A. Epstein, Hitomi Ohkawa et al.|Methods in enzymology on CD-ROM/Methods in enzymology|1996|440

Sequence Mapping by Electronic PCR

Gregory D. Schuler|Genome Research|1997|404

PowerBLAST: A New Network BLAST Application for Interactive or Automated Sequence Analysis and Annotation

Jinghui Zhang, Thomas Madden|Genome Research|1997|363

Database Divisions and Homology Search Files: A Guide for the Perplexed

Abstract

Related Papers