Whole Genome Analyses of Chinese Population and <i>De Novo</i> Assembly of A Northern Han Genome

Zhenglin Du(Chinese Academy of Sciences), Liang Ma(Chinese Academy of Sciences), Hongzhu Qu(Chinese Academy of Sciences), Wei Chen(Chinese Academy of Sciences), Bing Zhang(Chinese Academy of Sciences), Xi Lu(Chinese Academy of Sciences), Weibo Zhai(Chinese Academy of Sciences), Xin Sheng(Chinese Academy of Sciences), Yongqiao Sun(Chinese Academy of Sciences), Wenjie Li(Chinese Academy of Sciences), Lei Meng(Chinese Academy of Sciences), Qiuhui Qi(Chinese Academy of Sciences), Na Yuan(Chinese Academy of Sciences), Shuo Shi(Chinese Academy of Sciences), Jingyao Zeng(Chinese Academy of Sciences), Jinyue Wang(Chinese Academy of Sciences), Yadong Yang(Chinese Academy of Sciences), Qi Liu(Chinese Academy of Sciences), Yaqiang Hong(Chinese Academy of Sciences), Lili Dong(Chinese Academy of Sciences), Zhewen Zhang(Chinese Academy of Sciences), Dong Zou(Chinese Academy of Sciences), Yanqing Wang(Chinese Academy of Sciences), Shuhui Song(Chinese Academy of Sciences), Fan Liu(Chinese Academy of Sciences), Xiangdong Fang(Chinese Academy of Sciences), Hua Chen(Chinese Academy of Sciences), Xin Liu(Chinese Academy of Sciences), Jingfa Xiao(Chinese Academy of Sciences), Changqing Zeng(Chinese Academy of Sciences)
Genomics Proteomics & Bioinformatics
June 1, 2019
Cited by 73Open Access
Full Text

Abstract

To unravel the genetic mechanisms of disease and physiological traits, it requires comprehensive sequencing analysis of large sample size in Chinese populations. Here, we report the primary results of the Chinese Academy of Sciences Precision Medicine Initiative (CASPMI) project launched by the Chinese Academy of Sciences, including the de novo assembly of a northern Han reference genome (NH1.0) and whole genome analyses of 597 healthy people coming from most areas in China. Given the two existing reference genomes for Han Chinese (YH and HX1) were both from the south, we constructed NH1.0, a new reference genome from a northern individual, by combining the sequencing strategies of PacBio, 10× Genomics, and Bionano mapping. Using this integrated approach, we obtained an N50 scaffold size of 46.63 Mb for the NH1.0 genome and performed a comparative genome analysis of NH1.0 with YH and HX1. In order to generate a genomic variation map of Chinese populations, we performed the whole-genome sequencing of 597 participants and identified 24.85 million (M) single nucleotide variants (SNVs), 3.85 M small indels, and 106,382 structural variations. In the association analysis with collected phenotypes, we found that the T allele of rs1549293 in KAT8 significantly correlated with the waist circumference in northern Han males. Moreover, significant genetic diversity in MTHFR, TCN2, FADS1, and FADS2, which associate with circulating folate, vitamin B12, or lipid metabolism, was observed between northerners and southerners. Especially, for the homocysteine-increasing allele of rs1801133 (MTHFR 677T), we hypothesize that there exists a "comfort" zone for a high frequency of 677T between latitudes of 35-45 degree North. Taken together, our results provide a high-quality northern Han reference genome and novel population-specific data sets of genetic variants for use in the personalized and precision medicine.


Related Papers

No related papers found

Powered by citation graph analysis