HOCOMOCO in 2024: a rebuild of the curated collection of binding models for human and mouse transcription factors

Ilya E. Vorontsov(Vavilov Institute of General Genetics), Irina A. Eliseeva(Institute of Protein Research), Arsenii Zinkevich(Lomonosov Moscow State University), Mikhail Nikonov(Lomonosov Moscow State University), Sergey Abramov(Altius Institute for Biomedical Sciences), Alexandr Boytsov(Altius Institute for Biomedical Sciences), Vasily Kamenets(Moscow Institute of Physics and Technology), Alexandra M Kasianova(Skolkovo Institute of Science and Technology), Semyon Kolmykov(Sirius University of Science and Technology), Ivan Yevshin, Alexander V. Favorov(Johns Hopkins University), Yulia A. Medvedeva(Russian Academy of Sciences), Arttu Jolma(University of Toronto), Fedor Kolpakov(Sirius University of Science and Technology), Vsevolod J. Makeev(Moscow Institute of Physics and Technology), Ivan V. Kulakovskiy(Kazan Federal University)
Nucleic Acids Research
November 16, 2023
Cited by 145Open Access
Full Text

Abstract

We present a major update of the HOCOMOCO collection that provides DNA binding specificity patterns of 949 human transcription factors and 720 mouse orthologs. To make this release, we performed motif discovery in peak sets that originated from 14 183 ChIP-Seq experiments and reads from 2554 HT-SELEX experiments yielding more than 400 thousand candidate motifs. The candidate motifs were annotated according to their similarity to known motifs and the hierarchy of DNA-binding domains of the respective transcription factors. Next, the motifs underwent human expert curation to stratify distinct motif subtypes and remove non-informative patterns and common artifacts. Finally, the curated subset of 100 thousand motifs was supplied to the automated benchmarking to select the best-performing motifs for each transcription factor. The resulting HOCOMOCO v12 core collection contains 1443 verified position weight matrices, including distinct subtypes of DNA binding motifs for particular transcription factors. In addition to the core collection, HOCOMOCO v12 provides motif sets optimized for the recognition of binding sites in vivo and in vitro, and for annotation of regulatory sequence variants. HOCOMOCO is available at https://hocomoco12.autosome.org and https://hocomoco.autosome.org.


Related Papers