C

Clare Bycroft

Google DeepMind (United Kingdom)

ORCID: 0000-0002-1139-0732

Publishes on Genetic Associations and Epidemiology, Forensic and Genetic Research, Genetic diversity and population structure. 18 papers and 13k citations.

18Publications
13kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

The UK Biobank resource with deep phenotyping and genomic data
Cited by 9.6kOpen Access

The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.

Accurate proteome-wide missense variant effect prediction with AlphaMissense
Jun Cheng, Guido Novati, Joshua Pan et al.|Science|2023
Cited by 2k

The vast majority of missense variants observed in the human genome are of unknown clinical significance. We present AlphaMissense, an adaptation of AlphaFold fine-tuned on human and primate variant population frequency databases to predict missense variant pathogenicity. By combining structural context and evolutionary conservation, our model achieves state-of-the-art results across a wide range of genetic and experimental benchmarks, all without explicitly training on such data. The average pathogenicity score of genes is also predictive for their cell essentiality, capable of identifying short essential genes that existing statistical approaches are underpowered to detect. As a resource to the community, we provide a database of predictions for all possible human single amino acid substitutions and classify 89% of missense variants as either likely benign or likely pathogenic.

Genome-wide genetic data on ~500,000 UK Biobank participants
Clare Bycroft, Colin Freeman, Desislava Petkova et al.|bioRxiv (Cold Spring Harbor Laboratory)|2017
Cited by 710

Abstract The UK Biobank project is a large prospective cohort study of ~500,000 individuals from across the United Kingdom, aged between 40-69 at recruitment. A rich variety of phenotypic and health-related information is available on each participant, making the resource unprecedented in its size and scope. Here we describe the genome-wide genotype data (~805,000 markers) collected on all individuals in the cohort and its quality control procedures. Genotype data on this scale offers novel opportunities for assessing quality issues, although the wide range of ancestries of the individuals in the cohort also creates particular challenges. We also conducted a set of analyses that reveal properties of the genetic data – such as population structure and relatedness – that can be important for downstream analyses. In addition, we phased and imputed genotypes into the dataset, using computationally efficient methods combined with the Haplotype Reference Consortium (HRC) and UK10K haplotype resource. This increases the number of testable variants by over 100-fold to ~96 million variants. We also imputed classical allelic variation at 11 human leukocyte antigen (HLA) genes, and as a quality control check of this imputation, we replicate signals of known associations between HLA alleles and many common diseases. We describe tools that allow efficient genome-wide association studies (GWAS) of multiple traits and fast phenome-wide association studies (PheWAS), which work together with a new compressed file format that has been used to distribute the dataset. As a further check of the genotyped and imputed datasets, we performed a test-case genome-wide association scan on a well-studied human trait, standing height.

AlphaFold Protein Structure Database and 3D-Beacons: New Data and Capabilities
Jennifer Fleming, Paulyna Magaña, Sreenath Nair et al.|Journal of Molecular Biology|2025
Cited by 166Open Access

• AlphaMissense scores now integrated, enabling large-scale pathogenicity analysis of protein missense variants. • Foldseek added for rapid, accurate protein structure searches and comparisons. • Bulk data downloads introduced for enhanced analysis and workflow integration. • 3D-Beacons updates include AlphaMissense annotations and LevyLab’s homomeric models. • Training modules in "AlphaFold: A Practical Guide" enhance PDBe resources. The AlphaFold Protein Structure Database ( https://alphafold.ebi.ac.uk/ ) has made significant strides in enhancing its utility and accessibility for the life science research community. The recent integration of AlphaMissense predictions enables access to the pathogenicity of human protein missense variants, with an innovative and interactive heatmap and 3D visualisation that display variant data at the residue level. Users can now toggle between structure model quality (pLDDT) and average pathogenicity scores, providing insights into the implications of specific residue changes. The Foldseek integration offers a rapid and accurate method for protein structure searches and comparisons. Bulk data download options further facilitate comprehensive data analysis and integration with other computational tools. The 3D-Beacons framework ( https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/ ) has also been enhanced with detailed annotation endpoints (such as AlphaMissense data) and integrates LevyLab’s dataset of homomeric AlphaFold 2 models. These advancements significantly improve the functionality and accessibility of these resources, enabling discoveries using structure data.

The Configuration of RPA, RAD51, and DMC1 Binding in Meiosis Reveals the Nature of Critical Recombination Intermediates
Anjali Gupta Hinch, Philipp Becker, Tao Li et al.|Molecular Cell|2020
Cited by 161Open Access

Meiotic recombination proceeds via binding of RPA, RAD51, and DMC1 to single-stranded DNA (ssDNA) substrates created after formation of programmed DNA double-strand breaks. Here we report high-resolution in vivo maps of RPA and RAD51 in meiosis, mapping their binding locations and lifespans to individual homologous chromosomes using a genetically engineered hybrid mouse. Together with high-resolution microscopy and DMC1 binding maps, we show that DMC1 and RAD51 have distinct spatial localization on ssDNA: DMC1 binds near the break site, and RAD51 binds away from it. We characterize inter-homolog recombination intermediates bound by RPA in vivo, with properties expected for the critical displacement loop (D-loop) intermediates. These data support the hypothesis that DMC1, not RAD51, performs strand exchange in mammalian meiosis. RPA-bound D-loops can be resolved as crossovers or non-crossovers, but crossover-destined D-loops may have longer lifespans. D-loops resemble crossover gene conversions in size, but their extent is similar in both repair pathways.