Large scale genotype‐ and phenotype‐driven machine learning in Von Hippel‐Lindau disease

Andreea Chiorean(University Health Network), Kirsten M. Farncombe(University Health Network), Sean DeLong(University Health Network), Veronica Andric(University Health Network), Safa Ansar(University Health Network), Clarissa Chan(University Health Network), Kaitlin Clark(James S. McDonnell Foundation), Arpad Danos(James S. McDonnell Foundation), Yizhuo Gao(University Health Network), Rachel H. Giles(Netherlands Comprehensive Cancer Organisation), Anna Goldenberg(Hospital for Sick Children), Payal Jani(University Health Network), Kilannin Krysiak(James S. McDonnell Foundation), Lynzey Kujan(James S. McDonnell Foundation), Samantha Macpherson(University Health Network), Eamonn R. Maher(University of Cambridge), Liam G. McCoy(University Health Network), Yasser Salama(University Health Network), Jason Saliba(James S. McDonnell Foundation), Lana Sheta(James S. McDonnell Foundation), Malachi Griffith(James S. McDonnell Foundation), Obi L. Griffith(James S. McDonnell Foundation), Lauren Erdman(Hospital for Sick Children), Arun Ramani(Hospital for Sick Children), Raymond H. Kim(Ontario Institute for Cancer Research)
Human Mutation
April 27, 2022
Cited by 20Open Access
Full Text

Abstract

Von Hippel-Lindau (VHL) disease is a hereditary cancer syndrome where individuals are predisposed to tumor development in the brain, adrenal gland, kidney, and other organs. It is caused by pathogenic variants in the VHL tumor suppressor gene. Standardized disease information has been difficult to collect due to the rarity and diversity of VHL patients. Over 4100 unique articles published until October 2019 were screened for germline genotype-phenotype data. Patient data were translated into standardized descriptions using Human Genome Variation Society gene variant nomenclature and Human Phenotype Ontology terms and has been manually curated into an open-access knowledgebase called Clinical Interpretation of Variants in Cancer. In total, 634 unique VHL variants, 2882 patients, and 1991 families from 427 papers were captured. We identified relationship trends between phenotype and genotype data using classic statistical methods and spectral clustering unsupervised learning. Our analyses reveal earlier onset of pheochromocytoma/paraganglioma and retinal angiomas, phenotype co-occurrences and genotype-phenotype correlations including hotspots. It confirms existing VHL associations and can be used to identify new patterns and associations in VHL disease. Our database serves as an aggregate knowledge translation tool to facilitate sharing information about the pathogenicity of VHL variants.


Related Papers

No related papers found

Powered by citation graph analysis