Variation analysis and gene annotation of eight MHC haplotypes: The MHC Haplotype Project

Roger W. Horton(Wellcome Sanger Institute), Richard Gibson(Wellcome Sanger Institute), Penny Coggill(Wellcome Sanger Institute), Marcos Miretti(Wellcome Sanger Institute), Richard Allcock(The University of Western Australia), J. P. Almeida(Wellcome Sanger Institute), Simon Forbes(Wellcome Sanger Institute), James Gilbert(Wellcome Sanger Institute), Karen Halls(University of Cambridge), Jennifer Harrow(Wellcome Sanger Institute), Elizabeth A. Hart(Wellcome Sanger Institute), Kevin Howe, David K. Jackson(Wellcome Sanger Institute), Sophie Palmer(Wellcome Sanger Institute), Anne N. Roberts(University of Cambridge), Sarah Sims(Wellcome Sanger Institute), Claudia Stewart(National Cancer Institute), James A. Traherne(University of Cambridge), Steve Trevanion(Wellcome Sanger Institute), Laurens Wilming(Wellcome Sanger Institute), Jane Rogers(Wellcome Sanger Institute), Pieter J. de Jong(Oaklands Hospital), John F. Elliott(University of Alberta), Stephen Sawcer(University of Cambridge), John A. Todd(University of Cambridge), John Trowsdale(University of Cambridge), Stephan Beck(Wellcome Sanger Institute)
Immunogenetics
January 1, 2008
Cited by 334Open Access
Full Text

Abstract

The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine.


Related Papers