Large-Scale Uniform Analysis of Cancer Whole Genomes in Multiple Computing EnvironmentsChristina K. Yung, Brian D. O’Connor, Sergei Yakneen et al.|bioRxiv (Cold Spring Harbor Laboratory)|2017 Abstract The International Cancer Genome Consortium (ICGC)’s Pan-Cancer Analysis of Whole Genomes (PCAWG) project aimed to categorize somatic and germline variations in both coding and non-coding regions in over 2,800 cancer patients. To provide this dataset to the research working groups for downstream analysis, the PCAWG Technical Working Group marshalled ~800TB of sequencing data from distributed geographical locations; developed portable software for uniform alignment, variant calling, artifact filtering and variant merging; performed the analysis in a geographically and technologically disparate collection of compute environments; and disseminated high-quality validated consensus variants to the working groups. The PCAWG dataset has been mirrored to multiple repositories and can be located using the ICGC Data Portal. The PCAWG workflows are also available as Docker images through Dockstore enabling researchers to replicate our analysis on their own data.
Author Correction: Butler enables rapid cloud-based analysis of thousands of human genomesCorrection to: Nature Biotechnology, published online 5 February 2020. In the published version of this paper, the members of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium were listed in the Supplementary Information; however, these members should have been included in the main paper. The original Article has been corrected to include the members and affiliations of the PCAWG Consortium in the main paper; the corrections have been made to the HTML version of the Article but not the PDF version. Additional minor corrections to affiliations have been made to the PDF and HTML versions of the original Article for consistency of information between the PCAWG list and the main paper, and in the PCAWG Technical Working Group, the two affiliations for Miguel Vazquez have been changed from Massachusetts General Hospital, Boston, MA, USA and Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain to Barcelona Supercomputing Center, Barcelona, Spain and Department of Clinical and Molecular Medicine, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway.