Data:
  • Complete release data is available at Zenodo.
  • Harmonized list of human transcription factors and respective mouse orthologs based on the TFClass classification (Extended with Codebook TFs for v13): tf_masterlist.tsv.

Many practical motif applications require a set of motifs with reduced redundancy i.e. where similar motifs belonging to related transcription factors are grouped together and only a single matrix represents the group. To this end, we have created the non-redundant set of HOCOMOCO v13 motifs, a derivative of the HOCOMOCO v13 CORE collection.

To this end, we estimated the motif similarities with MacroAPE (see opera.autosome.org/macroape and doi:10.1186/1748-7188-8-23) at the motif P-value cutoff of 0.0005 and default matrix discretization of 1 (upscaled to 10 to reach a better precision for the cases when similarity estimates with the default discretization exceeded 0.01).

Using the pairwise motif similarity matrix, we performed hierarchical clustering using sklearn agglomerative clustering ('average' linkage). The number of clusters was taken to maximize the silhouette score resulting in 523 clusters at the silhouette score of 0.16.

For each cluster, the single representative motif was taken according to the best average similarity to other motifs in the cluster. Only ABC-quality motifs were considered as cluster representatives. The annotation contains a list of motifs that constitute a cluster and the list of respective TFs (UniProt IDs).


Contacts
Tools:
  • MoLoTool - web interface for motif finding.
  • SPRY-SARUS tool for motif finding (Java): jar, readme
  • MACRO-APE tool for motif comparison, P-value and threshold estimation: jar, manual, website
  • PERFECTOS-APE tool for functional annotation of sequence variants overlappint TFBS: jar, manual, website
Citation:
Ilya E Vorontsov, Irina A Eliseeva, Arsenii Zinkevich, Mikhail Nikonov, Sergey Abramov, Alexandr Boytsov, Vasily Kamenets, Alexandra Kasianova, Semyon Kolmykov, Ivan S Yevshin, Alexander Favorov, Yulia A Medvedeva, Arttu Jolma, Fedor Kolpakov, Vsevolod J Makeev, Ivan V Kulakovskiy
Nucleic Acids Research, gkad1077 (16 November 2023)
doi: 10.1093/nar/gkad1077
License: HOCOMOCO motif collection is distributed under WTFPL. If you prefer more standard licenses, feel free to treat WTFPL as CC-BY.

HOCOMOCO v13 subcollections

H13CORE H13INVIVO H13INVITRO H13RSNP
Number of motifs 1611
(MOUSE subset: 1253)
1611
(MOUSE subset: 1253)
1595
(MOUSE subset: 1237)
1611
(MOUSE subset: 1253)
Complete model annotation
(including gene id mapping)
All motifs H13CORE_annotation.jsonl H13INVIVO_annotation.jsonl H13INVITRO_annotation.jsonl H13RSNP_annotation.jsonl
MOUSE subset H13CORE-MOUSE_annotation.jsonl H13INVIVO-MOUSE_annotation.jsonl H13INVITRO-MOUSE_annotation.jsonl H13RSNP-MOUSE_annotation.jsonl
PWM One file per matrix
H13CORE_pwm.tar.gz H13INVIVO_pwm.tar.gz H13INVITRO_pwm.tar.gz H13RSNP_pwm.tar.gz
Flat file H13CORE_pwms.txt H13INVIVO_pwms.txt H13INVITRO_pwms.txt H13RSNP_pwms.txt
PCM One file per matrix
H13CORE_pcm.tar.gz H13INVIVO_pcm.tar.gz H13INVITRO_pcm.tar.gz H13RSNP_pcm.tar.gz
Flat file H13CORE_pcms.txt H13INVIVO_pcms.txt H13INVITRO_pcms.txt H13RSNP_pcms.txt
PFM One file per matrix
H13CORE_pfm.tar.gz H13INVIVO_pfm.tar.gz H13INVITRO_pfm.tar.gz H13RSNP_pfm.tar.gz
Flat file H13CORE_pfms.txt H13INVIVO_pfms.txt H13INVITRO_pfms.txt H13RSNP_pfms.txt
Threshold to P-value map
H13CORE_thresholds.tar.gz H13INVIVO_thresholds.tar.gz H13INVITRO_thresholds.tar.gz H13RSNP_thresholds.tar.gz
Matrices in other formats JASPAR H13CORE_jaspar_format.txt H13INVIVO_jaspar_format.txt H13INVITRO_jaspar_format.txt H13RSNP_jaspar_format.txt
MEME H13CORE_meme_format.meme H13INVIVO_meme_format.meme H13INVITRO_meme_format.meme H13RSNP_meme_format.meme
TRANSFAC H13CORE_transfac_format.txt H13INVIVO_transfac_format.txt H13INVITRO_transfac_format.txt H13RSNP_transfac_format.txt
HOMER