G

Gina L. Costa

Exact Sciences (United States)

Publishes on T-cell and B-cell Immunology, Cancer Genomics and Diagnostics, Virus-based gene therapy research. 28 papers and 8.3k citations.

28Publications
8.3kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

A small-cell lung cancer genome with complex signatures of tobacco exposure
Cited by 1.1kOpen Access

Cancer is driven by mutation. Worldwide, tobacco smoking is the principal lifestyle exposure that causes cancer, exerting carcinogenicity through >60 chemicals that bind and mutate DNA. Using massively parallel sequencing technology, we sequenced a small-cell lung cancer cell line, NCI-H209, to explore the mutational burden associated with tobacco smoking. A total of 22,910 somatic substitutions were identified, including 134 in coding exons. Multiple mutation signatures testify to the cocktail of carcinogens in tobacco smoke and their proclivities for particular bases and surrounding sequence context. Effects of transcription-coupled repair and a second, more general, expression-linked repair pathway were evident. We identified a tandem duplication that duplicates exons 3–8 of CHD7 in frame, and another two lines carrying PVT1–CHD7 fusion genes, indicating that CHD7 may be recurrently rearranged in this disease. These findings illustrate the potential for next-generation sequencing to provide unprecedented insights into mutational processes, cellular repair pathways and gene networks associated with cancer. The two cancer genome sequences presented in this issue demonstrate how next-generation sequencing technologies can inform us about mutational processes, repair pathways and gene networks associated with cancer development. First, the genome of a cell line derived from a bone marrow metastasis in a patient who had small-cell lung cancer. This cancer is typical of the type induced by smoking, and the sequence contains mutation signatures characteristic of some of the more than 60 carcinogens present in tobacco smoke. The second paper compares the whole genome sequence of a melanoma cell line to a lymphoblastoid cell line from the same individual. This, the first complete mutational analysis of a solid tumour, reveals a dominant mutational signature reflecting DNA damage due to exposure to ultraviolet light. Tobacco smoke contains more than sixty carcinogens that bind and mutate DNA. Here, massively parallel sequencing technology is used to sequence a small-cell lung cancer cell line, exploring the mutational burden associated with tobacco smoking. Multiple mutation signatures from the cocktail of carcinogens in tobacco smoke are found, as well as evidence of transcription-coupled repair and another, more general, expression-linked repair pathway.

Demographic history and rare allele sharing among human populations
Simon Gravel, Brenna M. Henn, Ryan N. Gutenkunst et al.|Proceedings of the National Academy of Sciences|2011
Cited by 731Open Access

High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including whole-genome 2-4× coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.

A high-resolution, nucleosome position map of <i>C. elegans</i> reveals a lack of universal sequence-dictated positioning
Anton Valouev, Jeffrey K. Ichikawa, Thaisan Tonthat et al.|Genome Research|2008
Cited by 578Open Access

Using the massively parallel technique of sequencing by oligonucleotide ligation and detection (SOLiD; Applied Biosystems), we have assessed the in vivo positions of more than 44 million putative nucleosome cores in the multicellular genetic model organism Caenorhabditis elegans. These analyses provide a global view of the chromatin architecture of a multicellular animal at extremely high density and resolution. While we observe some degree of reproducible positioning throughout the genome in our mixed stage population of animals, we note that the major chromatin feature in the worm is a diversity of allowed nucleosome positions at the vast majority of individual loci. While absolute positioning of nucleosomes can vary substantially, relative positioning of nucleosomes (in a repeated array structure likely to be maintained at least in part by steric constraints) appears to be a significant property of chromatin structure. The high density of nucleosomal reads enabled a substantial extension of previous analysis describing the usage of individual oligonucleotide sequences along the span of the nucleosome core and linker. We release this data set, via the UCSC Genome Browser, as a resource for the high-resolution analysis of chromatin conformation and DNA accessibility at individual loci within the C. elegans genome.