Herman De Beukelaer

Core Hunter 3: flexible core subset selection

Herman De Beukelaer, Guy Davenport, Veerle Fack|BMC Bioinformatics|2018

Cited by 189Open Access

BACKGROUND: Core collections provide genebank curators and plant breeders a way to reduce size of their collections and populations, while minimizing impact on genetic diversity and allele frequency. Many methods have been proposed to generate core collections, often using distance metrics to quantify the similarity of two accessions, based on genetic marker data or phenotypic traits. Core Hunter is a multi-purpose core subset selection tool that uses local search algorithms to generate subsets relying on one or more metrics, including several distance metrics and allelic richness. RESULTS: In version 3 of Core Hunter (CH3) we have incorporated two new, improved methods for summarizing distances to quantify diversity or representativeness of the core collection. A comparison of CH3 and Core Hunter 2 (CH2) showed that these new metrics can be effectively optimized with less complex algorithms, as compared to those used in CH2. CH3 is more effective at maximizing the improved diversity metric than CH2, still ensures a high average and minimum distance, and is faster for large datasets. Using CH3, a simple stochastic hill-climber is able to find highly diverse core collections, and the more advanced parallel tempering algorithm further increases the quality of the core and further reduces variability across independent samples. We also evaluate the ability of CH3 to simultaneously maximize diversity, and either representativeness or allelic richness, and compare the results with those of the GDOpt and SimEli methods. CH3 can sample equally representative cores as GDOpt, which was specifically designed for this purpose, and is able to construct cores that are simultaneously more diverse, and either are more representative or have higher allelic richness, than those obtained by SimEli. CONCLUSIONS: In version 3, Core Hunter has been updated to include two new core subset selection metrics that construct cores for representativeness or diversity, with improved performance. It combines and outperforms the strengths of other methods, as it (simultaneously) optimizes a variety of metrics. In addition, CH3 is an improvement over CH2, with the option to use genetic marker data or phenotypic traits, or both, and improved speed. Core Hunter 3 is freely available on http://www.corehunter.org .

Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search

Herman De Beukelaer, Petr Smýkal, Guy Davenport et al.|BMC Bioinformatics|2012

Cited by 75Open Access

BACKGROUND: Sampling core subsets from genetic resources while maintaining as much as possible the genetic diversity of the original collection is an important but computationally complex task for gene bank managers. The Core Hunter computer program was developed as a tool to generate such subsets based on multiple genetic measures, including both distance measures and allelic diversity indices. At first we investigate the effect of minimum (instead of the default mean) distance measures on the performance of Core Hunter. Secondly, we try to gain more insight into the performance of the original Core Hunter search algorithm through comparison with several other heuristics working with several realistic datasets of varying size and allelic composition. Finally, we propose a new algorithm (Mixed Replica search) for Core Hunter II with the aim of improving the diversity of the constructed core sets and their corresponding generation times. RESULTS: Our results show that the introduction of minimum distance measures leads to core sets in which all accessions are sufficiently distant from each other, which was not always obtained when optimizing mean distance alone. Comparison of the original Core Hunter algorithm, Replica Exchange Monte Carlo (REMC), with simpler heuristics shows that the simpler algorithms often give very good results but with lower runtimes than REMC. However, the performance of the simpler algorithms is slightly worse than REMC under lower sampling intensities and some heuristics clearly struggle with minimum distance measures. In comparison the new advanced Mixed Replica search algorithm (MixRep), which uses heterogeneous replicas, was able to sample core sets with equal or higher diversity scores than REMC and the simpler heuristics, often using less computation time than REMC. CONCLUSION: The REMC search algorithm used in the original Core Hunter computer program performs well, sometimes leading to slightly better results than some of the simpler methods, although it doesn't always give the best results. By switching to the new Mixed Replica algorithm overall results and runtimes can be significantly improved. Finally we recommend including minimum distance measures in the objective function when looking for core sets in which all accessions are sufficiently distant from each other. Core Hunter II is freely available as an open source project at http://www.corehunter.org.

Moving Beyond Managing Realized Genomic Relationship in Long-Term Genomic Selection

Herman De Beukelaer, Yvonne M Badke, Veerle Fack et al.|Genetics|2017

Cited by 73Open Access

Long-term genomic selection (GS) requires strategies that balance genetic gain with population diversity, to sustain progress for traits under selection, and to keep diversity for future breeding. In a simulation model for a recurrent selection scheme, we provide the first head-to-head comparison of two such existing strategies: genomic optimal contributions selection (GOCS), which limits realized genomic relationship among selection candidates, and weighted genomic selection (WGS), which upscales rare allele effects in GS. Compared to GS, both methods provide the same higher long-term genetic gain and a similar lower inbreeding rate, despite some inherent limitations. GOCS does not control the inbreeding rate component linked to trait selection, and, therefore, does not strike the optimal balance between genetic gain and inbreeding. This makes it less effective throughout the breeding scheme, and particularly so at the beginning, where genetic gain and diversity may not be competing. For WGS, truncation selection proved suboptimal to manage rare allele frequencies among the selection candidates. To overcome these limitations, we introduce two new set selection methods that maximize a weighted index balancing genetic gain with controlling expected heterozygosity (IND-HE) or maintaining rare alleles (IND-RA), and show that these outperform GOCS and WGS in a nearly identical way. While requiring further testing, we believe that the inherent benefits of the IND-HE and IND-RA methods will transfer from our simulation framework to many practical breeding settings, and are therefore a major step forward toward efficient long-term genomic selection.

JASPAR 2026: expansion of transcription factor binding profiles and integration of deep learning models

Damla Ovek, Ieva Rauluševičiūtė, Dina Ruud Aronsen et al.|Nucleic Acids Research|2025

Cited by 37Open Access

JASPAR (https://jaspar.elixir.no/) is an open-access database that has provided high-quality, manually curated, and non-redundant DNA binding profiles for transcription factors (TFs) as position frequency matrices (PFMs) for over 20 years. We expanded the CORE (306 new profiles, 12% increase) and UNVALIDATED (433, 60% increase) collections with new PFMs and updated 13 existing profiles. We updated the TF binding site predictions and genome tracks for eight species. TF binding profile clusters and familial TF binding sites were updated accordingly. We integrate the inMOTIFin software to easily simulate regulatory sequences using JASPAR PFMs. To enrich TFs' annotations, we provide scientific literature-based human TF target information. Notably, this release features a deep learning (DL) collection, providing a paradigm shift in modeling and characterizing TF-DNA interactions with 1259 BPNet models trained on Homo sapiens ENCODE chromatin immunoprecipitation followed by sequencing (ChIP-seq) datasets from 240 TFs and interpreted to reveal predictive motif patterns for the models. The motifs associated with the same TF were clustered to provide a summary of the binding properties, resulting in 240 primary and 113 alternative motif patterns in the DL collection. The JASPAR 2026 collections lay a foundation for future endeavors in genomic research, serving the scientific community in uncovering the mechanisms of gene regulation.

JAMES: An object‐oriented Java framework for discrete optimization using local search metaheuristics

Herman De Beukelaer, Guy Davenport, Geert De Meyer et al.|Software Practice and Experience|2016

Cited by 19

Summary This paper describes the Java Metaheuristics Search framework (JAMES, v1.1): an object‐oriented Java framework for discrete optimization using local search algorithms that exploits the generality of such metaheuristics by clearly separating search implementation and application from problem specification. A wide range of generic local searches are provided, including (stochastic) hill climbing, tabu search, variable neighbourhood search and parallel tempering. These can be applied to any user‐defined problem by plugging in a custom neighbourhood for the corresponding solution type. Using an automated analysis workflow, the performance of different search algorithms can be compared in order to select an appropriate optimization strategy. Implementations of specific components are included for subset selection, such as a predefined solution type, generic problem definition and several subset neighbourhoods used to modify the set of selected items. Additional components for other types of problems (e.g. permutation problems) are provided through an extensions module which also includes the analysis workflow. In comparison with existing Java metaheuristics frameworks that mainly focus on population‐based algorithms, JAMES has a much lower memory footprint and promotes efficient application of local searches by taking full advantage of move‐based evaluation. Releases of JAMES are deployed to the Maven Central Repository so that the framework can easily be included as a dependency in other Java applications. The project is fully open source and hosted on GitHub. More information can be found at http://www.jamesframework.org . Copyright © 2016 John Wiley & Sons, Ltd.

Herman De Beukelaer

Is this you? Claim your profile.

Top publicationsby citations