A Draft Sequence of the Rice Genome ( <i>Oryza sativa</i> L. ssp. <i>indica</i> )The genome of the japonica subspecies of rice, an important cereal and model monocot, was sequenced and assembled by whole-genome shotgun sequencing. The assembled sequence covers 93% of the 420-megabase genome. Gene predictions on the assembled sequence suggest that the genome contains 32,000 to 50,000 genes. Homologs of 98% of the known maize, wheat, and barley proteins are found in rice. Synteny and gene homology between rice and the other cereal genomes are extensive, whereas synteny with Arabidopsis is limited. Assignment of candidate rice orthologs to Arabidopsis genes is possible in many cases. The rice genome sequence provides a foundation for the improvement of cereals, our most important crops.
A Draft Sequence for the Genome of the Domesticated Silkworm ( <i>Bombyx mori</i> )We report a draft sequence for the genome of the domesticated silkworm ( Bombyx mori ), covering 90.9% of all known silkworm genes. Our estimated gene count is 18,510, which exceeds the 13,379 genes reported for Drosophila melanogaster . Comparative analyses to fruitfly, mosquito, spider, and butterfly reveal both similarities and differences in gene content.
The Genomes of Oryza sativa: A History of DuplicationsJun Yu, Jun Wang, Wei Lin et al.|PLoS Biology|2005 We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000-40,000. Only 2%-3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family.
The diploid genome sequence of an Asian individualHere we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual’s genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics. The power of the latest massively parallel synthetic DNA sequencing technologies is demonstrated in two major collaborations that shed light on the nature of genomic variation with ethnicity. The first describes the genomic characterization of an individual from the Yoruba ethnic group of west Africa. The second reports a personal genome of a Han Chinese, the group comprising 30% of the world's population. These new resources can now be used in conjunction with the Venter, Watson and NIH reference sequences. A separate study looked at genetic ethnicity on the continental scale, based on data from 1,387 individuals from more than 30 European countries. Overall there was little genetic variation between countries, but the differences that do exist correspond closely to the geographic map. Statistical analysis of the genome data places 50% of the individuals within 310 km of their reported origin. As well as its relevance for testing genetic ancestry, this work has implications for evaluating genome-wide association studies that link genes with diseases.
The draft genome of watermelon (Citrullus lanatus) and resequencing of 20 diverse accessionsZhangjun Fei and colleagues report the draft genome of a Chinese elite watermelon inbred line 97103 and resequencing of 20 diverse accessions that represent the three subspecies of Citrullus lunatus. Comparative genome-wide analyses identify the extent of genetic diversity and population structure of watermelon germplasm. Watermelon, Citrullus lanatus, is an important cucurbit crop grown throughout the world. Here we report a high-quality draft genome sequence of the east Asia watermelon cultivar 97103 (2n = 2× = 22) containing 23,440 predicted protein-coding genes. Comparative genomics analysis provided an evolutionary scenario for the origin of the 11 watermelon chromosomes derived from a 7-chromosome paleohexaploid eudicot ancestor. Resequencing of 20 watermelon accessions representing three different C. lanatus subspecies produced numerous haplotypes and identified the extent of genetic diversity and population structure of watermelon germplasm. Genomic regions that were preferentially selected during domestication were identified. Many disease-resistance genes were also found to be lost during domestication. In addition, integrative genomic and transcriptomic analyses yielded important insights into aspects of phloem-based vascular signaling in common between watermelon and cucumber and identified genes crucial to valuable fruit-quality traits, including sugar accumulation and citrulline metabolism.