The genome of flax (<i>Linum usitatissimum</i>) assembled <i>de novo</i> from short shotgun sequence reads

Yidong Wang(BGI Group (China)), Neil Hobson(University of Alberta), Leonardo Galindo‐González(University of Alberta), Shilin Zhu(BGI Group (China)), Daihu Shi(BGI Group (China)), Joshua McDill(University of Alberta), Linfeng Yang(BGI Group (China)), Simon Hawkins(Stress Abiotiques et Différenciation des Végétaux Cultivés), Godfrey Neutelings(Stress Abiotiques et Différenciation des Végétaux Cultivés), Raju Datla(Plant Biotechnology Institute), Georgina M. Lambert(University of Arizona), David W. Galbraith(University of Arizona), Christopher J. Grassa(University of British Columbia), Armando Geraldes(University of British Columbia), Quentin Cronk(University of British Columbia), Christopher A. Cullis(Case Western Reserve University), Prasanta K. Dash(Indian Agricultural Research Institute), Polumetla Ananda Kumar(Indian Agricultural Research Institute), Sylvie Cloutier(Agriculture and Agri-Food Canada), Andrew Sharpe(Plant Biotechnology Institute), Gane Ka‐Shu Wong(BGI Group (China)), Jun Wang(BGI Group (China)), Michael K. Deyholos(University of Alberta)
The Plant Journal
July 3, 2012
Cited by 440Open Access
Full Text

Abstract

Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep-coverage (approximately 94× raw, approximately 69× filtered) short-sequence reads (44-100 bp), produced a set of scaffolds with N(50) =694 kb, including contigs with N(50)=20.1 kb. The contig assembly contained 302 Mb of non-redundant sequence representing an estimated 81% genome coverage. Up to 96% of published flax ESTs aligned to the whole-genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis-assembly of regions at the genome scale. A total of 43384 protein-coding genes were predicted in the whole-genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (K(s) ) observed within duplicate gene pairs was consistent with a recent (5-9 MYA) whole-genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam-A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species.


Related Papers

No related papers found

Powered by citation graph analysis