Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science

Deepak Unni(Lawrence Berkeley National Laboratory), Sierra Moxon(Lawrence Berkeley National Laboratory), Michael Bada(University of Colorado Anschutz Medical Campus), Matthew Brush(University of Colorado Anschutz Medical Campus), Richard Bruskiewich, J. Harry Caufield(Lawrence Berkeley National Laboratory), Paul A. Clemons(Broad Institute), Vlado Dančík(Broad Institute), Michel Dumontier(Maastricht University), Karamarie Fecho(University of North Carolina at Chapel Hill), Gustavo Glusman(Institute for Systems Biology), Jennifer Hadlock(Institute for Systems Biology), Nomi L. Harris(Lawrence Berkeley National Laboratory), Arpita Joshi(Institute for Systems Biology), Tim Putman(University of Colorado Anschutz Medical Campus), Guangrong Qin(Institute for Systems Biology), Stephen A. Ramsey(Oregon State University), Kent Shefchek(University of Colorado Anschutz Medical Campus), Harold R. Solbrig(Johns Hopkins University), Karthik Soman(University of California, San Francisco), Anne Thessen(University of Colorado Anschutz Medical Campus), Melissa Haendel(University of Colorado Anschutz Medical Campus), Chris Bizon(University of North Carolina at Chapel Hill), Chris Mungall(Lawrence Berkeley National Laboratory)
Clinical and Translational Science
June 6, 2022
Cited by 97Open Access
Full Text

Abstract

Within clinical, biomedical, and translational science, an increasing number of projects are adopting graphs for knowledge representation. Graph-based data models elucidate the interconnectedness among core biomedical concepts, enable data structures to be easily updated, and support intuitive queries, visualizations, and inference algorithms. However, knowledge discovery across these "knowledge graphs" (KGs) has remained difficult. Data set heterogeneity and complexity; the proliferation of ad hoc data formats; poor compliance with guidelines on findability, accessibility, interoperability, and reusability; and, in particular, the lack of a universally accepted, open-access model for standardization across biomedical KGs has left the task of reconciling data sources to downstream consumers. Biolink Model is an open-source data model that can be used to formalize the relationships between data structures in translational science. It incorporates object-oriented classification and graph-oriented features. The core of the model is a set of hierarchical, interconnected classes (or categories) and relationships between them (or predicates) representing biomedical entities such as gene, disease, chemical, anatomic structure, and phenotype. The model provides class and edge attributes and associations that guide how entities should relate to one another. Here, we highlight the need for a standardized data model for KGs, describe Biolink Model, and compare it with other models. We demonstrate the utility of Biolink Model in various initiatives, including the Biomedical Data Translator Consortium and the Monarch Initiative, and show how it has supported easier integration and interoperability of biomedical KGs, bringing together knowledge from multiple sources and helping to realize the goals of translational science.


Related Papers