Accurate proteome-wide missense variant effect prediction with AlphaMissenseThe vast majority of missense variants observed in the human genome are of unknown clinical significance. We present AlphaMissense, an adaptation of AlphaFold fine-tuned on human and primate variant population frequency databases to predict missense variant pathogenicity. By combining structural context and evolutionary conservation, our model achieves state-of-the-art results across a wide range of genetic and experimental benchmarks, all without explicitly training on such data. The average pathogenicity score of genes is also predictive for their cell essentiality, capable of identifying short essential genes that existing statistical approaches are underpowered to detect. As a resource to the community, we provide a database of predictions for all possible human single amino acid substitutions and classify 89% of missense variants as either likely benign or likely pathogenic.
Specific identification and quantification of circular RNAs from sequencing dataMOTIVATION: Circular RNAs (circRNAs) are a poorly characterized class of molecules that have been identified decades ago. Emerging high-throughput sequencing methods as well as first reports on confirmed functions have sparked new interest in this RNA species. However, the computational detection and quantification tools are still limited. RESULTS: We developed the software tandem, DCC and CircTest DCC uses output from the STAR read mapper to systematically detect back-splice junctions in next-generation sequencing data. DCC applies a series of filters and integrates data across replicate sets to arrive at a precise list of circRNA candidates. We assessed the detection performance of DCC on a newly generated mouse brain data set and publicly available sequencing data. Our software achieves a much higher precision than state-of-the-art competitors at similar sensitivity levels. Moreover, DCC estimates circRNA versus host gene expression from counting junction and non-junction reads. These read counts are finally used to test for host gene-independence of circRNA expression across different experimental conditions by our R package CircTest We demonstrate the benefits of this approach on previously reported age-dependent circRNAs in the fruit fly. AVAILABILITY AND IMPLEMENTATION: The source code of DCC and CircTest is licensed under the GNU General Public Licence (GPL) version 3 and available from https://github.com/dieterich-lab/[DCC or CircTest]. CONTACT: christoph.dieterich@age.mpg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MMSplice: modular modeling improves the predictions of genetic variant effects on splicingPredicting the effects of genetic variants on splicing is highly relevant for human genetics. We describe the framework MMSplice (modular modeling of splicing) with which we built the winning model of the CAGI5 exon skipping prediction challenge. The MMSplice modules are neural networks scoring exon, intron, and splice sites, trained on distinct large-scale genomics datasets. These modules are combined to predict effects of variants on exon skipping, splice site choice, splicing efficiency, and pathogenicity, with matched or higher performance than state-of-the-art. Our models, available in the repository Kipoi, apply to variants including indels directly from VCF files.
The Kipoi repository accelerates community exchange and reuse of predictive models for genomicsAdvances in the application of molecular diagnostic techniques for the detection of infectious disease pathogens (Review)Qingqing Liu, Xiaojuan Jin, Jun Cheng et al.|Molecular Medicine Reports|2023 Infectious diseases are a major global cause of morbidity and mortality, seriously affecting public health and socioeconomic stability. Since infectious diseases can be caused by a wide variety of pathogens with similar clinical manifestations and symptoms that are difficult to accurately distinguish, selecting the appropriate diagnostic techniques for the rapid identification of pathogens is crucial for clinical disease diagnosis and public health management. However, traditional diagnostic techniques have low detection rates, long detection times and limited automation, which means that they do not meet the requirements for rapid diagnosis. Recent years have seen continuous developments in molecular detection technology, which has a higher sensitivity and specificity, shorter detection time and increased automation, and performs an important role in the early and rapid detection of infectious disease pathogens. The present study summarizes recent progress in molecular diagnostic technologies such as PCR, isothermal amplification, gene chips and high‑throughput sequencing for the detection of infectious disease pathogens, and compares the technical principles, advantages and disadvantages, applicability and costs of these diagnostic techniques.