ChEMBL: towards direct deposition of bioassay dataChEMBL is a large, open-access bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012, 2014 and 2017 Nucleic Acids Research Database Issues. In the last two years, several important improvements have been made to the database and are described here. These include more robust capture and representation of assay details; a new data deposition system, allowing updating of data sets and deposition of supplementary data; and a completely redesigned web interface, with enhanced search and filtering capabilities.
The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periodsBarbara Zdrazil, Eloy Félix, Fiona Hunter et al.|Nucleic Acids Research|2023 ChEMBL (https://www.ebi.ac.uk/chembl/) is a manually curated, high-quality, large-scale, open, FAIR and Global Core Biodata Resource of bioactive molecules with drug-like properties, previously described in the 2012, 2014, 2017 and 2019 Nucleic Acids Research Database Issues. Since its introduction in 2009, ChEMBL's content has changed dramatically in size and diversity of data types. Through incorporation of multiple new datasets from depositors since the 2019 update, ChEMBL now contains slightly more bioactivity data from deposited data vs data extracted from literature. In collaboration with the EUbOPEN consortium, chemical probe data is now regularly deposited into ChEMBL. Release 27 made curated data available for compounds screened for potential anti-SARS-CoV-2 activity from several large-scale drug repurposing screens. In addition, new patent bioactivity data have been added to the latest ChEMBL releases, and various new features have been incorporated, including a Natural Product likeness score, updated flags for Natural Products, a new flag for Chemical Probes, and the initial annotation of the action type for ∼270 000 bioactivity measurements.
EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assembliesEBI metagenomics (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the analysis and archiving of sequence data derived from the microbial populations found in a particular environment. Over the past two years, EBI metagenomics has increased the number of datasets analysed 10-fold. In addition to increased throughput, the underlying analysis pipeline has been overhauled to include both new or updated tools and reference databases. Of particular note is a new workflow for taxonomic assignments that has been extended to include assignments based on both the large and small subunit RNA marker genes and to encompass all cellular micro-organisms. We also describe the addition of metagenomic assembly as a new analysis service. Our pilot studies have produced over 2400 assemblies from datasets in the public domain. From these assemblies, we have produced a searchable, non-redundant protein database of over 50 million sequences. To provide improved access to the data stored within the resource, we have developed a programmatic interface that provides access to the analysis results and associated sample metadata. Finally, we have integrated the results of a series of statistical analyses that provide estimations of diversity and sample comparisons.
Calcium silicate hydrate (C-S-H) gel dissolution and pH buffering in a cementitious near fieldAbstract A cementitious backfill has been proposed in many geological disposal concepts for intermediate-level waste and low-level waste in the UK and elsewhere. In this paper, the main features of the chemical evolution of backfill and the associated changes in the near-field pH are illustrated with results from recent work. For example, interaction of the groundwater with calcium silicate hydrate (C-S-H) phases in a backfill is expected to play an important role in the long-term pH-buffering behaviour. Existing experimental data for the dissolution of C-S-H gels are compared with recent experimental results from leach tests on gels of a lower calcium to silicon ratio (C/S) to provide a consistent set of data across the full C/S range. The results confirm that a congruent dissolution point around C/S = 0.8 is approached by leaching from below (i.e. for gels with 0.29 < C/S < 0.8), as well as from above, as reported elsewhere. In addition, a spreadsheet model has been developed to calculate the volume of backfill required at the vault scale to meet specified pH performance criteria. This model includes the major reactions of the backfill with the groundwater, waste encapsulants and waste components. It can also consider the effects of specific waste packages on local pH performance to allow comparison with the vault-scale calculations.
Network integration and modelling of dynamic drug responses at multi-omics levelsUncovering cellular responses from heterogeneous genomic data is crucial for molecular medicine in particular for drug safety. This can be realized by integrating the molecular activities in networks of interacting proteins. As proof-of-concept we challenge network modeling with time-resolved proteome, transcriptome and methylome measurements in iPSC-derived human 3D cardiac microtissues to elucidate adverse mechanisms of anthracycline cardiotoxicity measured with four different drugs (doxorubicin, epirubicin, idarubicin and daunorubicin). Dynamic molecular analysis at in vivo drug exposure levels reveal a network of 175 disease-associated proteins and identify common modules of anthracycline cardiotoxicity in vitro, related to mitochondrial and sarcomere function as well as remodeling of extracellular matrix. These in vitro-identified modules are transferable and are evaluated with biopsies of cardiomyopathy patients. This to our knowledge most comprehensive study on anthracycline cardiotoxicity demonstrates a reproducible workflow for molecular medicine and serves as a template for detecting adverse drug responses from complex omics data.