An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates<ns3:p>In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. </ns3:p> <ns3:p> The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at <ns3:ext-link xmlns:ns4="http://www.w3.org/1999/xlink" ext-link-type="uri" ns4:href="https://github.com/collaborativebioinformatics/nibSV">https://github.com/collaborativebioinformatics</ns3:ext-link> provides valuable insights for both participants and the research community. </ns3:p>
An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebratesIn October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.
Designing AI-programmable therapeutics with the EDEN family of foundation modelsGeraldene Munsamy, Gavin Ayres, Carla Greco et al.|bioRxiv (Cold Spring Harbor Laboratory)|2026 Abstract The ability to interpret, modify, and design DNA has driven many of the most significant advances in modern medicine, from diagnostics, biologics, and vaccines to cell and gene therapies. However, the inherent complexity of biological systems means that most modern medicines are still engineered using bespoke, labor-intensive processes. To address the need for a generalisable and programmable approach to therapeutic design, we introduce the EDEN (environmentally-derived evolutionary network) family of metagenomic foundation models, including a 28 billion parameter model trained on 9.7 trillion nucleotide tokens from BaseData 1 . This dataset, at the time of training, contained more than 10 billion novel genes from over 1 million new species, and is intentionally enriched for environmental and host-associated metagenomes, phage sequences, and mobile genetic elements, enabling the model to learn from diverse and novel cross-species evolutionary mechanisms and apply them to key challenges in human health. EDEN achieves state-of-the-art performance across a series of predictive and generative genomic and protein benchmarks. To demonstrate the models’ broad applicability across biology, we evaluate EDEN’s capacity for programmable therapeutic design by challenging a single architecture to design biological novelty across three distinct therapeutic modalities, disease areas and biological scales: (i) large gene insertion, (ii) antibiotic peptide design, and (iii) microbiome design. First, we demonstrate AI-programmable Gene Insertion (aiPGI), in which EDEN designs de novo large serine recombinases (LSRs) capable of inserting large pieces of DNA at desired target sites in the human genome when prompted only on 30 nucleotides of DNA sequence from the desired target site. In low-N experimental validation, EDEN generated multiple active recombinases for all tested disease-associated genomic loci (ATM, DMD, F9, FANCC, GALC, IDS, P4HA1, PHEX, RYR2, USH2A) and 4 potential safe harbor sites in the human genome. EDEN achieves an overall functional hit rate of 63.2% across diverse DNA prompts when prompted on only 30bp of DNA from outside the training data. 50% of EDEN-generated LSRs were active in human cells, achieving therapeutically relevant levels of CAR insertion in primary human T cells. We also show that EDEN can generate active bridge recombinases when prompted on the associated guide RNA alone, with sequence identities to training and public data as low as 65%. These results pave the way for a new generation of cell and gene therapies by opening the door to rapid, programmable and site-specific integration of large genetic payloads without double-strand breaks. This offers an alternative to the safety, efficiency and payload limitations inherent in viral or nuclease-based editing at thousands of currently intractable human therapeutic targets. Second, we use the same model to generate a focused low-N library of novel antimicrobial peptides where 97% showed activity, with top candidates achieving single-digit micromolar potency against critical-priority multidrug-resistant pathogens. Third, to demonstrate that EDEN captures inter -genomic features, we design a gigabase-scale microbiome with over 94,000 synthetic metagenomic assemblies, including prophage genomes and correct cross-species metabolic pathway completions. The EDEN-generated synthetic microbiome covers 9,067 species with a biome-specific taxonomic accuracy of 99%. Over 1,500 of the generated species were outside the fine-tuning dataset while retaining the correct microecological properties and biome association, thus significantly expanding genetic and taxonomic diversity. Together, these results establish a new strategic direction for AI-programmable therapeutics, in which a single foundation model architecture designs candidate therapeutics across diverse modalities and disease areas. This suggests that the combination of billions of years of evolutionary data with specific therapeutic records offers a clear, scaling-driven path to making therapeutic design a predictable engineering discipline. Abstract Figure
Temporal AI model predicts drivers of cell state trajectories across human agingJavier Gómez Ortega, Rangarajan D. Nadadur, Akira Kunitomi et al.|bioRxiv (Cold Spring Harbor Laboratory)|2026 Foundational AI models have recently shown promise for predicting the impact of perturbations on cell states. However, current models typically consider only one cell state at a time, limiting their ability to learn how cellular responses unfold over time, particularly across long trajectories such as diseases of aging. Here, we develop a temporal AI model, MaxToki, trained on nearly 1 trillion gene tokens including cell state trajectories across the human lifespan to generate cell states across long timelapses of human aging. MaxToki generalized to unseen trajectories through in-context learning and predicted novel age-modulating targets that were experimentally verified to influence age-related gene programs and functional decline in vivo. MaxToki represents a promising strategy for temporal modeling to accelerate the discovery of interventions for programming therapeutic cellular trajectories.