C

Cameron L. M. Gilchrist

Seoul National University

ORCID: 0000-0001-7798-427X

Publishes on Microbial Natural Products and Biosynthesis, Genomics and Phylogenetic Studies, RNA and protein synthesis mechanisms. 43 papers and 5.9k citations.

43Publications
5.9kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Fast and accurate protein structure search with Foldseek
Michel van Kempen, Stephanie Kim, Charlotte Tumescheit et al.|Nature Biotechnology|2023
Cited by 2.3kOpen Access

As structure prediction methods are generating millions of publicly available protein structures, searching these databases is becoming a bottleneck. Foldseek aligns the structure of a query protein against a database by describing tertiary amino acid interactions within proteins as sequences over a structural alphabet. Foldseek decreases computation times by four to five orders of magnitude with 86%, 88% and 133% of the sensitivities of Dali, TM-align and CE, respectively.

clinker & clustermap.js: automatic generation of gene cluster comparison figures
Cited by 1.6kOpen Access

SUMMARY: Genes involved in biological pathways are often collocalised in gene clusters, the comparison of which can give valuable insights into their function and evolutionary history. However, comparison and visualization of gene cluster similarity is a tedious process, particularly when many clusters are being compared. Here, we present clinker, a Python based tool and clustermap.js, a companion JavaScript visualization library, which used together can automatically generate accurate, interactive, publication-quality gene cluster comparison figures directly from sequence files. AVAILABILITY AND IMPLEMENTATION: Source code and documentation for clinker and clustermap.js is available on GitHub (github.com/gamcil/clinker and github.com/gamcil/clustermap.js, respectively) under the MIT license. clinker can be installed directly from the Python Package Index via pip. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Fast and accurate protein structure search with Foldseek
Michel van Kempen, Stephanie Kim, Charlotte Tumescheit et al.|bioRxiv (Cold Spring Harbor Laboratory)|2022
Cited by 377Open Access

As structure prediction methods are generating millions of publicly available protein structures, searching these databases is becoming a bottleneck. Foldseek aligns the structure of a query protein against a database by describing the amino acid backbone of proteins as sequences over a structural alphabet. Foldseek decreases computation times by four to five orders of magnitude with 86%, 88% and 133% of the sensitivities of DALI, TM-align and CE, respectively.

Clustering predicted structures at the scale of the known protein universe
Cited by 357Open Access

Abstract Proteins are key to all cellular processes and their structure is important in understanding their function and evolution. Sequence-based predictions of protein structures have increased in accuracy 1 , and over 214 million predicted structures are available in the AlphaFold database 2 . However, studying protein structures at this scale requires highly efficient methods. Here, we developed a structural-alignment-based clustering algorithm—Foldseek cluster—that can cluster hundreds of millions of structures. Using this method, we have clustered all of the structures in the AlphaFold database, identifying 2.30 million non-singleton structural clusters, of which 31% lack annotations representing probable previously undescribed structures. Clusters without annotation tend to have few representatives covering only 4% of all proteins in the AlphaFold database. Evolutionary analysis suggests that most clusters are ancient in origin but 4% seem to be species specific, representing lower-quality predictions or examples of de novo gene birth. We also show how structural comparisons can be used to predict domain families and their relationships, identifying examples of remote structural similarity. On the basis of these analyses, we identify several examples of human immune-related proteins with putative remote homology in prokaryotic species, illustrating the value of this resource for studying protein function and evolution across the tree of life.

cblaster: a remote search tool for rapid identification and visualization of homologous gene clusters
Cameron L. M. Gilchrist, Thomas Booth, Bram van Wersch et al.|Bioinformatics Advances|2021
Cited by 284Open Access

Motivation: Genes involved in coordinated biological pathways, including metabolism, drug resistance and virulence, are often collocalized as gene clusters. Identifying homologous gene clusters aids in the study of their function and evolution, however, existing tools are limited to searching local sequence databases. Tools for remotely searching public databases are necessary to keep pace with the rapid growth of online genomic data. Results: Here, we present cblaster, a Python-based tool to rapidly detect collocated genes in local and remote databases. cblaster is easy to use, offering both a command line and a user-friendly graphical user interface. It generates outputs that enable intuitive visualizations of large datasets and can be readily incorporated into larger bioinformatic pipelines. cblaster is a significant update to the comparative genomics toolbox. Availability and implementation: cblaster source code and documentation is freely available from GitHub under the MIT license (github.com/gamcil/cblaster). Supplementary information: online.