N

Nomi L. Harris

Lawrence Berkeley National Laboratory

ORCID: 0000-0001-6315-3707

Publishes on Biomedical Text Mining and Ontologies, Bioinformatics and Genomic Networks, Genomics and Phylogenetic Studies. 156 papers and 26.8k citations.

156Publications
26.8kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

The Genome Sequence of <i>Drosophila melanogaster</i>
Cited by 6k

The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

The Gene Ontology resource: enriching a GOld mine
Seth Carbon, Eric Douglass, Benjamin M. Good et al.|Nucleic Acids Research|2020
Cited by 3.8kOpen Access

The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations.

The Gene Ontology knowledgebase in 2023
Cited by 2.7kOpen Access

The Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains, and updates the GO knowledgebase. The GO knowledgebase consists of three components: (1) the GO-a computational knowledge structure describing the functional characteristics of genes; (2) GO annotations-evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and (3) GO Causal Activity Models (GO-CAMs)-mechanistic models of molecular "pathways" (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised, and updated in response to newly published discoveries and receives extensive QA checks, reviews, and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, and guidance on how users can best make use of the data that we provide. We conclude with future directions for the project.

Comparative Genomics of the Eukaryotes
Cited by 1.7kOpen Access

A comparative analysis of the genomes of Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae-and the proteins they are predicted to encode-was undertaken in the context of cellular, developmental, and evolutionary processes. The nonredundant protein sets of flies and worms are similar in size and are only twice that of yeast, but different gene families are expanded in each genome, and the multidomain proteins and signaling pathways of the fly and worm are far more complex than those of yeast. The fly has orthologs to 177 of the 289 human disease genes examined and provides the foundation for rapid analysis of some of the basic processes involved in human disease.

KBase: The United States Department of Energy Systems Biology Knowledgebase
Adam P. Arkin, Robert W. Cottingham, Christopher S. Henry et al.|Nature Biotechnology|2018
Cited by 1.6kOpen Access

The U.S. Department of Energy Systems Biology Knowledgebase (KBase, http://kbase.us) is an open-source software and data platform designed to tackle the grand challenge of systems biology—predicting and designing biological function at scales ranging from the biomolecular to the ecological. KBase is available for anyone to use, and enables researchers to collaboratively generate, test, compare, and share hypotheses about biological functions; perform large analyses on scalable computing infrastructure; and combine experimental evidence and conclusions to model plant and microbial physiology and community dynamics. The KBase platform has extensible analytical capabilities that currently include (meta)genome assembly, annotation, comparative genomics, transcriptomics, and metabolic modeling; a web-based user interface that supports building, sharing, and publishing reproducible and well-annotated analyses with integrated data; and a software development kit that enables the community to add functionality to the system.