B

Basem Al-Shayeb

Innovative Genomics Institute

ORCID: 0000-0002-3120-3201

Publishes on CRISPR and Genetic Engineering, SARS-CoV-2 detection and testing, Genomics and Phylogenetic Studies. 50 papers and 3.8k citations.

50Publications
3.8kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

CRISPR-CasΦ from huge phages is a hypercompact genome editor
Cited by 551Open Access

CRISPR-Cas systems are found widely in prokaryotes, where they provide adaptive immunity against virus infection and plasmid transformation. We describe a minimal functional CRISPR-Cas system, comprising a single ~70-kilodalton protein, CasΦ, and a CRISPR array, encoded exclusively in the genomes of huge bacteriophages. CasΦ uses a single active site for both CRISPR RNA (crRNA) processing and crRNA-guided DNA cutting to target foreign nucleic acids. This hypercompact system is active in vitro and in human and plant cells with expanded target recognition capabilities relative to other CRISPR-Cas proteins. Useful for genome editing and DNA detection but with a molecular weight half that of Cas9 and Cas12a genome-editing enzymes, CasΦ offers advantages for cellular delivery that expand the genome editing toolbox.

Clades of huge phages from across Earth’s ecosystems
Cited by 540Open Access

. Here we sequenced DNA from diverse ecosystems and found hundreds of phage genomes with lengths of more than 200 kilobases (kb), including a genome of 735 kb, which is-to our knowledge-the largest phage genome to be described to date. Thirty-five genomes were manually curated to completion (circular and no gaps). Expanded genetic repertoires include diverse and previously undescribed CRISPR-Cas systems, transfer RNAs (tRNAs), tRNA synthetases, tRNA-modification enzymes, translation-initiation and elongation factors, and ribosomal proteins. The CRISPR-Cas systems of phages have the capacity to silence host transcription factors and translational genes, potentially as part of a larger interaction network that intercepts translation to redirect biosynthesis to phage-encoded functions. In addition, some phages may repurpose bacterial CRISPR-Cas systems to eliminate competing phages. We phylogenetically define the major clades of huge phages from human and other animal microbiomes, as well as from oceans, lakes, sediments, soils and the built environment. We conclude that the large gene inventories of huge phages reflect a conserved biological strategy, and that the phages are distributed across a broad bacterial host range and across Earth's ecosystems.

Petabase-scale sequence alignment catalyses viral discovery
Cited by 522Open Access

Public databases contain a planetary collection of nucleic acid sequences, but their systematic exploration has been inhibited by a lack of efficient methods for searching this corpus, which (at the time of writing) exceeds 20 petabases and is growing exponentially1. Here we developed a cloud computing infrastructure, Serratus, to enable ultra-high-throughput sequence alignment at the petabase scale. We searched 5.7 million biologically diverse samples (10.2 petabases) for the hallmark gene RNA-dependent RNA polymerase and identified well over 105 novel RNA viruses, thereby expanding the number of known species by roughly an order of magnitude. We characterized novel viruses related to coronaviruses, hepatitis delta virus and huge phages, respectively, and analysed their environmental reservoirs. To catalyse the ongoing revolution of viral discovery, we established a free and comprehensive database of these data and tools. Expanding the known sequence diversity of viruses can reveal the evolutionary origins of emerging pathogens and improve pathogen surveillance for the anticipation and mitigation of future pandemics. Serratus, an open-source cloud-computing infrastructure, can be used to screen millions of nucleic acid sequencing libraries at the petabase scale, and has enabled many new RNA viruses to be identified efficiently.

Genome Sequencing of Sewage Detects Regionally Prevalent SARS-CoV-2 Variants
Cited by 378Open Access

Viral genome sequencing has guided our understanding of the spread and extent of genetic diversity of SARS-CoV-2 during the COVID-19 pandemic. SARS-CoV-2 viral genomes are usually sequenced from nasopharyngeal swabs of individual patients to track viral spread. Recently, RT-qPCR of municipal wastewater has been used to quantify the abundance of SARS-CoV-2 in several regions globally. However, metatranscriptomic sequencing of wastewater can be used to profile the viral genetic diversity across infected communities. Here, we sequenced RNA directly from sewage collected by municipal utility districts in the San Francisco Bay Area to generate complete and nearly complete SARS-CoV-2 genomes. The major consensus SARS-CoV-2 genotypes detected in the sewage were identical to clinical genomes from the region. Using a pipeline for single nucleotide variant calling in a metagenomic context, we characterized minor SARS-CoV-2 alleles in the wastewater and detected viral genotypes which were also found within clinical genomes throughout California. Observed wastewater variants were more similar to local California patient-derived genotypes than they were to those from other regions within the United States or globally. Additional variants detected in wastewater have only been identified in genomes from patients sampled outside California, indicating that wastewater sequencing can provide evidence for recent introductions of viral lineages before they are detected by local clinical sequencing. These results demonstrate that epidemiological surveillance through wastewater sequencing can aid in tracking exact viral strains in an epidemic context.