The Latentverse: An Open-Source Benchmarking Toolkit for Evaluating Latent Representations

Yoanna Turura(Massachusetts Institute of Technology), Sam Friedman(Broad Institute), Aurora Cremer(Broad Institute), Mahnaz Maddah(Broad Institute), Sana Tonekaboni(Broad Institute)
bioRxiv (Cold Spring Harbor Laboratory)
April 29, 2025
Cited by 0Open Access
Full Text

Abstract

Abstract Self-supervised representation learning is a powerful approach for extracting meaningful features without relying on large amounts of labeled data, making it particularly valuable in fields like healthcare. This enables pretrained models to be shared and fine-tuned with minimal data for various downstream applications. However, evaluating the quality and behavior of these representations remains challenging. To address this, we introduce Latentverse, an open-source library and web-based platform for evaluating latent representations. Latentverse generates detailed reports with visualizations and metrics that provide a comprehensive perspective on different properties of representations, such as clustering, disentanglement, generalization, expressiveness, and robustness. It also allows for the comparison of different representations, enabling developers to refine model architectures and helping users assess how well an embedding model aligns with the requirements of their specific applications. Data and Code Availability The Latentverse code is available at: https://github.com/broadinstitute/ml4h-latentverse . Institutional Review Board (IRB) This work doesn’t require IRB approval.


Related Papers

No related papers found

Powered by citation graph analysis