Large-Scale Uniform Analysis of Cancer Whole Genomes in Multiple Computing Environments

Christina K. Yung(Ontario Institute for Cancer Research), Brian D. O’Connor(Ontario Institute for Cancer Research), Sergei Yakneen(Ontario Institute for Cancer Research), Junjun Zhang(Ontario Institute for Cancer Research), Kyle Ellrott(Oregon Health & Science University), Kortine Kleinheinz(German Cancer Research Center), Naoki Miyoshi(Tokyo Medical University), Keiran Raine(Wellcome Sanger Institute), Romina Royo(Barcelona Supercomputing Center), Gordon Saksena(Broad Institute), Matthias Schlesner(German Cancer Research Center), Solomon I. Shorser(Ontario Institute for Cancer Research), Miguel Vazquez(Centro Nacional de Epidemiología), Joachim Weischenfeldt(Rigshospitalet), Denis Yuen(Ontario Institute for Cancer Research), Adam P. Butler(Wellcome Sanger Institute), Brandi N. Davis‐Dusenbery(Seven Bridges Genomics (United States)), Roland Eils(German Cancer Research Center), Vincent Ferretti(Ontario Institute for Cancer Research), Robert L. Grossman(University of Chicago), Olivier Harismendy(University of San Diego), Young-Wook Kim(Sungkyunkwan University), Hidewaki Nakagawa(RIKEN Center for Integrative Medical Sciences), Steven Newhouse(European Bioinformatics Institute), David Torrents(Institució Catalana de Recerca i Estudis Avançats), Lincoln D. Stein(Ontario Institute for Cancer Research), on behalf of the PCAWG Technical Working Group(Barcelona Supercomputing Center), Javier Bartolomé Rodriguez(RIKEN Center for Integrative Medical Sciences), Keith A. Boroevich(European Bioinformatics Institute), Rich Boyce(University of California, Santa Cruz), Angela N. Brooks(Oregon Health & Science University), Alex Buchanan(German Cancer Research Center), Ivo Buchhalter(Ontario Institute for Cancer Research), Niall J. Byrne(European Bioinformatics Institute), Andy Cafferkey(Wellcome Sanger Institute), Peter J. Campbell(University of San Diego), Zhaohong Chen(Logos Biosystems (South Korea)), Sunghoon Cho(Electronics and Telecommunications Research Institute), Wan Choi(Wellcome Sanger Institute), Peter Clapham(Structural Analytics (United States)), Francisco M. De La Vega(The Francis Crick Institute), Jonas Demeulemeester(University of San Diego), Michelle T. Dow(Ontario Institute for Cancer Research), Lewis Jonathan Dursi(German Cancer Research Center), Juergen Eils(UC San Diego Health System), Claudiu Farcas(Rigshospitalet), Francesco Favero(Ontario Institute for Cancer Research), Nodirjon Fayzullaev(European Bioinformatics Institute), Paul Flicek(European Bioinformatics Institute), Nuno A. Fonseca(Barcelona Supercomputing Center), Josep L. L. Gelpi(Broad Institute), Gad Getz(Ontario Institute for Cancer Research), Bob Gibson(German Cancer Research Center), Michael C. Heinold(Broad Institute), Julian M. Hess(The University of Melbourne), Oliver Hofmann(Intelligent Synthetic Biology Center), Jongwhi H. Hong(Ontario Institute for Cancer Research), Thomas J. Hudson(German Cancer Research Center), Daniel Hüebschmann(German Cancer Research Center), Barbara Hutter(National Human Genome Research Institute), Carolyn M. Hutter(Tokyo Medical University), Seiya Imoto(Seven Bridges Genomics (United States)), Sinisa Ivkovic(Electronics and Telecommunications Research Institute), Seung-Hyup Jeon(Ontario Institute for Cancer Research), Wei Jiao(Intelligent Synthetic Biology Center), Jongsun Jung(German Cancer Research Center), Rolf Kabbe(Memorial Sloan Kettering Cancer Center), André Kahles(German Cancer Research Center), Jules N. A. Kerssemakers(Electronics and Telecommunications Research Institute), Hyunghwan Kim(Ewha Womans University), Hyung‐Lae Kim(University of San Diego), Jihoon Kim(European Bioinformatics Institute), Jan O. Korbel(German Cancer Research Center), Michael Koscher(University of San Diego), Antonios Koures(Seven Bridges Genomics (United States)), Milena Kovacevic(German Cancer Research Center), Chris Lawerenz(Broad Institute), Ignaty Leshchiner(Broad Institute), Dimitri Livitz(Ontario Institute for Cancer Research), George L. Mihaiescu(Seven Bridges Genomics (United States)), Sanja Mijalković(Seven Bridges Genomics (United States)), Ana Mijalkovic Lazic(Tokyo University of Science), Satoru Miyano(Ontario Institute for Cancer Research), Hardeep K. Nahal-Bose(Seven Bridges Genomics (United States)), Mia Nastic(Wellcome Sanger Institute), Jonathan Nicholson(European Bioinformatics Institute), David Ocaña(Tokyo University of Science), Kazuhiro Ohi(UC San Diego Health System), Lucila Ohno‐Machado(Sage Bionetworks), Larsson Omberg(Ontario Institute for Cancer Research), B. F. Francis Ouellette(German Cancer Research Center), Nagarajan Paramasivam(Ontario Institute for Cancer Research), Marc D. Perry(SRA International (United States)), Todd Pihl(German Cancer Research Center), Manuel Prinz(Barcelona Supercomputing Center), Montserrat Puiggròs(Seven Bridges Genomics (United States)), Petar Radovic(Broad Institute), Esther Rheinbay(Broad Institute), Mara Rosenberg(European Bioinformatics Institute), Charles Short(National Human Genome Research Institute), Heidi J. Sofia(University of Chicago), Jonathan Spring(Oregon Health & Science University), Adam J. Struck(Broad Institute), Grace Tiao(Seven Bridges Genomics (United States)), Nebojša Tijanić(The Francis Crick Institute), Peter Van Loo(Barcelona Supercomputing Center), David Vicente(Broad Institute), Jeremiah A. Wala(Office of the Director), Zhining Wang(German Cancer Research Center), Johannes Werner(University of San Diego), Ashley Williams(Electronics and Telecommunications Research Institute), Youngchoon Woo(Ontario Institute for Cancer Research), A. Jordan Wright(Ontario Institute for Cancer Research), Qian Xiang, the PCAWG Network
bioRxiv (Cold Spring Harbor Laboratory)
July 10, 2017
Cited by 29Open Access
Full Text

Abstract

Abstract The International Cancer Genome Consortium (ICGC)’s Pan-Cancer Analysis of Whole Genomes (PCAWG) project aimed to categorize somatic and germline variations in both coding and non-coding regions in over 2,800 cancer patients. To provide this dataset to the research working groups for downstream analysis, the PCAWG Technical Working Group marshalled ~800TB of sequencing data from distributed geographical locations; developed portable software for uniform alignment, variant calling, artifact filtering and variant merging; performed the analysis in a geographically and technologically disparate collection of compute environments; and disseminated high-quality validated consensus variants to the working groups. The PCAWG dataset has been mirrored to multiple repositories and can be located using the ICGC Data Portal. The PCAWG workflows are also available as Docker images through Dockstore enabling researchers to replicate our analysis on their own data.


Related Papers

No related papers found

Powered by citation graph analysis