The complete and fully-phased diploid genome of a male Han Chinese

Chentao Yang(BGI Group (China)), Yang Zhou(BGI Group (China)), Yanni Song(Agricultural Genomics Institute at Shenzhen), Dongya Wu(Women's Hospital, School of Medicine, Zhejiang University), Yan Zeng(BGI Group (China)), Lei Nie(BGI Group (China)), Panhong Liu(BGI Group (China)), Shilong Zhang(Shanghai Jiao Tong University), Guangji Chen(BGI Group (China)), Jinjin Xu(BGI Group (China)), Hongling Zhou(Agricultural Genomics Institute at Shenzhen), Long Zhou(Women's Hospital, School of Medicine, Zhejiang University), Qian Xiaobo(BGI Group (China)), Chenlu Liu(Zhejiang University), Shangjin Tan(BGI Group (China)), Chengran Zhou(BGI Group (China)), Wei Dai(BGI Group (China)), Mengyang Xu(BGI Group (China)), Yanwei Qi, Xiaobo Wang(Agricultural Genomics Institute at Shenzhen), Lidong Guo(University of Chinese Academy of Sciences), Guangyi Fan, Aijun Wang, Yuan Deng(BGI Group (China)), Yong Zhang(BGI Group (China)), Jiazheng Jin(BGI Group (China)), Yunqiu He(Women's Hospital, School of Medicine, Zhejiang University), Chunxue Guo(BGI Group (China)), Guoji Guo(Zhejiang University), Qing Zhou(Zhejiang University), Xun Xu(BGI Group (China)), Huanming Yang(BGI Group (China)), Jian Wang(BGI Group (China)), Shuhua Xu(Jiangsu Normal University), Yafei Mao(Shanghai Jiao Tong University), Xin Jin(BGI Group (China)), Jue Ruan(Agricultural Genomics Institute at Shenzhen), Guojie Zhang(Zhejiang International Studies University)
Cell Research
July 14, 2023
Cited by 90Open Access
Full Text

Abstract

Since the release of the complete human genome, the priority of human genomic study has now been shifting towards closing gaps in ethnic diversity. Here, we present a fully phased and well-annotated diploid human genome from a Han Chinese male individual (CN1), in which the assemblies of both haploids achieve the telomere-to-telomere (T2T) level. Comparison of this diploid genome with the CHM13 haploid T2T genome revealed significant variations in the centromere. Outside the centromere, we discovered 11,413 structural variations, including numerous novel ones. We also detected thousands of CN1 alleles that have accumulated high substitution rates and a few that have been under positive selection in the East Asian population. Further, we found that CN1 outperforms CHM13 as a reference genome in mapping and variant calling for the East Asian population owing to the distinct structural variants of the two references. Comparison of SNP calling for a large cohort of 8869 Chinese genomes using CN1 and CHM13 as reference respectively showed that the reference bias profoundly impacts rare SNP calling, with nearly 2 million rare SNPs miss-called with different reference genomes. Finally, applying the CN1 as a reference, we discovered 5.80 Mb and 4.21 Mb putative introgression sequences from Neanderthal and Denisovan, respectively, including many East Asian specific ones undetected using CHM13 as the reference. Our analyses reveal the advances of using CN1 as a reference for population genomic studies and paleo-genomic studies. This complete genome will serve as an alternative reference for future genomic studies on the East Asian population.


Related Papers