J

Jessica Alföldi

Broad Institute

ORCID: 0000-0001-9713-6200

Publishes on Genomics and Phylogenetic Studies, Genomics and Rare Diseases, Virus-based gene therapy research. 98 papers and 24.3k citations.

98Publications
24.3kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

The mutational constraint spectrum quantified from variation in 141,456 humans
Cited by 10kOpen Access

Abstract Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes 1 . Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.

The mutational constraint spectrum quantified from variation in 141,456 humans
Konrad J. Karczewski, Laurent C. Francioli, Grace Tiao et al.|bioRxiv (Cold Spring Harbor Laboratory)|2019
Cited by 1.8kOpen Access

Summary Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes 1 . Here, we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence pLoF variants in this cohort after filtering for sequencing and annotation artifacts. Using an improved human mutation rate model, we classify human protein-coding genes along a spectrum representing tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.

A high-resolution map of human evolutionary constraint using 29 mammals
Cited by 1.2kOpen Access

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease. This comparative genomics study, comparing the complete human genome sequence with those of 29 placental mammals, including chimpanzees, mice and dogs, identifies 4.2% of the human genome as constrained by evolutionary selection, and ascribes a potential function to about 60% of these constrained bases. A series of evolutionary signatures emerges, providing insights into coding and non-coding functional genomic elements, candidate RNA structural families and aspects of genome organization and evolution. Overlap with disease-associated variants indicates that the findings will be relevant for studies of human disease.

The genomic substrate for adaptive radiation in African cichlid fish
Cited by 1kOpen Access

Cichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs. In addition, we analysed sequence data from sixty individuals representing six closely related species from Lake Victoria, and show genome-wide diversifying selection on coding and regulatory variants, some of which were recruited from ancient polymorphisms. We conclude that a number of molecular mechanisms shaped East African cichlid genomes, and that amassing of standing variation during periods of relaxed purifying selection may have been important in facilitating subsequent evolutionary diversification. Genomes and transcriptomes of five distinct lineages of African cichlids, a textbook example of adaptive radiation, have been sequenced and analysed to reveal that many types of molecular changes contributed to rapid evolution, and that standing variation accumulated during periods of relaxed selection may have primed subsequent diversification. The 2,000 or so species of cichlid fish, to be found in the lakes and rivers of Africa's Rift Valley, provide the classic example of adaptive radiations. This large-scale international collaboration has sequenced and analysed the genomes and transcriptomes of five distinct lineages of African cichlids. The data reveal an excess of gene duplications in comparison to other fish species. There is an abundance of non-coding element divergence; accelerated coding sequence evolution; expression divergence associated with transposable element insertions in orthologous gene pairs; and regulation by novel miRNAs. Sequencing data from sixty individuals from six closely related Lake Victoria species point to rapid cichlid speciation associated with genome-wide diversifying selection on coding and regulatory variants, and imply that ancient periods of relaxed purifying selection enabled the accumulation of standing variation, which may have been important in facilitating diversification.