NCBI Orthologs: Public Resource and Scalable Method for Computing High-Precision Orthologs Across Eukaryotic Genomes

Dong‐Ha Oh(National Institutes of Health), Alexander Astashyn(National Institutes of Health), Barbara Robbertse(National Institutes of Health), Nuala A. O’Leary(National Institutes of Health), WF Anderson(National Institutes of Health), Laurie Breen(National Institutes of Health), Eric Cox(National Institutes of Health), Olga Ermolaeva(National Institutes of Health), Robert Falk(National Institutes of Health), Vichet Hem(National Institutes of Health), J. Bradley Holmes(National Institutes of Health), Patrick Masterson(National Institutes of Health), Kelly M. McGarvey(National Institutes of Health), Eyal Mozes(National Institutes of Health), John Torcivia-Rodriguez(National Institutes of Health), Mirian T. N. Tsuchiya(National Institutes of Health), Craig Wallin(National Institutes of Health), Françoise Thibaud-Nissen(National Institutes of Health), Terence D. Murphy(National Institutes of Health), Vamsi K. Kodali(National Institutes of Health)
Journal of Molecular Evolution
September 25, 2025
Cited by 6Open Access
Full Text

Abstract

Orthologs are fundamental for enabling comparative genomics analyses that further our understanding of eukaryotic biology. The unprecedented increase in the availability of high-quality eukaryotic genomes necessitates scalable and accurate methods for orthology inference. The National Center for Biotechnology Information (NCBI) developed "NCBI Orthologs", a resource and a computational pipeline designed to meet this challenge within the NCBI RefSeq framework. This system integrates protein similarity, nucleotide alignment, and microsynteny to achieve high-precision ortholog assignments across diverse eukaryotes. The pipeline leverages high-quality RefSeq annotations and processes genomes individually, ensuring scalability. Resulting ortholog data, organized into gene-level anchored sets, enables propagation of functional annotation information and facilitates comparative genomics. Critically, these data are integrated into the NCBI Gene resource, providing users with access from various entry points. The NCBI Datasets resource provides an intuitive interface to explore orthologous relationships on the web and allows bulk data download via the web, command-line tools, and an API. We detail the methodology, including anchor species selection and the decision tree used to arrive at high-confidence one-to-one orthology relationships. NCBI Orthologs is a valuable resource for facilitating functional annotation efforts and enhancing our understanding of eukaryotic gene evolution.


Related Papers

No related papers found

Powered by citation graph analysis