MAFFT online service: multiple sequence alignment, interactive sequence choice and visualizationThis article describes several features in the MAFFT online service for multiple sequence alignment (MSA). As a result of recent advances in sequencing technologies, huge numbers of biological sequences are available and the need for MSAs with large numbers of sequences is increasing. To extract biologically relevant information from such data, sophistication of algorithms is necessary but not sufficient. Intuitive and interactive tools for experimental biologists to semiautomatically handle large data are becoming important. We are working on development of MAFFT toward these two directions. Here, we explain (i) the Web interface for recently developed options for large data and (ii) interactive usage to refine sequence data sets and MSAs.
MAFFT-DASH: integrated protein sequence and structural alignmentHere, we describe a web server that integrates structural alignments with the MAFFT multiple sequence alignment (MSA) tool. For this purpose, we have prepared a web-based Database of Aligned Structural Homologs (DASH), which provides structural alignments at the domain and chain levels for all proteins in the Protein Data Bank (PDB), and can be queried interactively or by a simple REST-like API. MAFFT-DASH integration can be invoked with a single flag on either the web (https://mafft.cbrc.jp/alignment/server/) or command-line versions of MAFFT. In our benchmarks using 878 cases from the BAliBase, HomFam, OXFam, Mattbench and SISYPHUS datasets, MAFFT-DASH showed 10-20% improvement over standard MAFFT for MSA problems with weak similarity, in terms of Sum-of-Pairs (SP), a measure of how well a program succeeds at aligning input sequences in comparison to a reference alignment. When MAFFT alignments were supplemented with homologous sequences, further improvement was observed. Potential applications of DASH beyond MSA enrichment include functional annotation through detection of remote homology and assembly of template libraries for homology modeling.
Repertoire Builder: high-throughput structural modeling of B and T cell receptorsDimitri Schritt, Songling Li, John Rozewicki et al.|Molecular Systems Design & Engineering|2019 Repertoire Builder (https://sysimm.org/rep_builder/) is a method for generating atomic-resolution, three-dimensional models of B cell receptors (BCRs) or T cell receptors (TCRs) from their amino acid sequences.
Flexible, Functional, and Familiar: Characteristics of SARS-CoV-2 Spike Protein EvolutionThe SARS-CoV-2 S protein is a major point of interaction between the virus and the human immune system. As a consequence, the S protein is not a static target but undergoes rapid molecular evolution. In order to more fully understand the selection pressure during evolution, we examined residue positions in the S protein that vary greatly across closely related viruses but are conserved in the subset of viruses that infect humans. These "evolutionarily important" residues were not distributed evenly across the S protein but were concentrated in two domains: the N-terminal domain and the receptor-binding domain, both of which play a role in host cell binding in a number of related viruses. In addition to being localized in these two domains, evolutionary importance correlated with structural flexibility and inversely correlated with distance from known or predicted host receptor-binding residues. Finally, we observed a bias in the composition of the amino acids that make up such residues toward more human-like, rather than virus-like, sequence motifs.
Methods for sequence and structural analysis of B and T cell receptor repertoiresShunsuke Teraguchi, Dianita S. Saputri, Mara Anaís Llamas-Covarrubias et al.|Computational and Structural Biotechnology Journal|2020 B cell receptors (BCRs) and T cell receptors (TCRs) make up an essential network of defense molecules that, collectively, can distinguish self from non-self and facilitate destruction of antigen-bearing cells such as pathogens or tumors. The analysis of BCR and TCR repertoires plays an important role in both basic immunology as well as in biotechnology. Because the repertoires are highly diverse, specialized software methods are needed to extract meaningful information from BCR and TCR sequence data. Here, we review recent developments in bioinformatics tools for analysis of BCR and TCR repertoires, with an emphasis on those that incorporate structural features. After describing the recent sequencing technologies for immune receptor repertoires, we survey structural modeling methods for BCR and TCRs, along with methods for clustering such models. We review downstream analyses, including BCR and TCR epitope prediction, antibody-antigen docking and TCR-peptide-MHC Modeling. We also briefly discuss molecular dynamics in this context.