Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (<i>Fragaria vesca</i>) with chromosome-scale contiguity

Patrick P. Edger(Michigan State University), Robert VanBuren(Michigan State University), Marivi Colle(Michigan State University), Thomas J. Poorten(University of California, Davis), Ching Man Wai(Michigan State University), Chad E. Niederhuth(University of Georgia), Elizabeth I. Alger(Michigan State University), Shujun Ou(Michigan State University), Charlotte B. Acharya(University of California, Davis), Jie Wang(Michigan State University), Pete Callow(Michigan State University), Michael R. McKain(Donald Danforth Plant Science Center), Jinghua Shi(BioNano Genomics (United States)), Chad C. Collier(BioNano Genomics (United States)), Zhiyong Xiong(Inner Mongolia University), Jeffrey P. Mower(University of Nebraska–Lincoln), Janet P. Slovin(U.S. Vegetable Laboratory), Timo Hytönen(University of Helsinki), Ning Jiang(Michigan State University), Kevin L. Childs(Michigan State University), Steven J. Knapp(University of California, Davis)
GigaScience
December 13, 2017
Cited by 323Open Access
Full Text

Abstract

Background: Although draft genomes are available for most agronomically important plant species, the majority are incomplete, highly fragmented, and often riddled with assembly and scaffolding errors. These assembly issues hinder advances in tool development for functional genomics and systems biology. Findings: Here we utilized a robust, cost-effective approach to produce high-quality reference genomes. We report a near-complete genome of diploid woodland strawberry (Fragaria vesca) using single-molecule real-time sequencing from Pacific Biosciences (PacBio). This assembly has a contig N50 length of ∼7.9 million base pairs (Mb), representing a ∼300-fold improvement of the previous version. The vast majority (>99.8%) of the assembly was anchored to 7 pseudomolecules using 2 sets of optical maps from Bionano Genomics. We obtained ∼24.96 Mb of sequence not present in the previous version of the F. vesca genome and produced an improved annotation that includes 1496 new genes. Comparative syntenic analyses uncovered numerous, large-scale scaffolding errors present in each chromosome in the previously published version of the F. vesca genome. Conclusions: Our results highlight the need to improve existing short-read based reference genomes. Furthermore, we demonstrate how genome quality impacts commonly used analyses for addressing both fundamental and applied biological questions.


Related Papers

No related papers found

Powered by citation graph analysis