M

Mark Fleharty

Broad Institute

ORCID: 0000-0002-8141-8091

Publishes on Cancer Genomics and Diagnostics, BRCA gene mutations in cancer, Genetic factors in colorectal cancer. 63 papers and 786 citations.

63Publications
786Total Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms
Maura Costello, Mark Fleharty, Justin Abreu et al.|BMC Genomics|2018
Cited by 293Open Access

BACKGROUND: Here we present an in-depth characterization of the mechanism of sequencer-induced sample contamination due to the phenomenon of index swapping that impacts Illumina sequencers employing patterned flow cells with Exclusion Amplification (ExAmp) chemistry (HiSeqX, HiSeq4000, and NovaSeq). We also present a remediation method that minimizes the impact of such swaps. RESULTS: Leveraging data collected over a two-year period, we demonstrate the widespread prevalence of index swapping in patterned flow cell data. We calculate mean swap rates across multiple sample preparation methods and sequencer models, demonstrating that different library methods can have vastly different swapping rates and that even non-ExAmp chemistry instruments display trace levels of index swapping. We provide methods for eliminating sample data cross contamination by utilizing non-redundant dual indexing for complete filtering of index swapped reads, and share the sequences for 96 non-combinatorial dual indexes we have validated across various library preparation methods and sequencer models. Finally, using computational methods we provide a greater insight into the mechanism of index swapping. CONCLUSIONS: Index swapping in pooled libraries is a prevalent phenomenon that we observe at a rate of 0.2 to 6% in all sequencing runs on HiSeqX, HiSeq 4000/3000, and NovaSeq. Utilizing non-redundant dual indexing allows for the removal (flagging/filtering) of these swapped reads and eliminates swapping induced sample contamination, which is critical for sensitive applications such as RNA-seq, single cell, blood biopsy using circulating tumor DNA, or clinical sequencing.

Sensitive Detection of Minimal Residual Disease in Patients Treated for Early-Stage Breast Cancer
Heather A. Parsons, Justin Rhoades, Sarah C. Reed et al.|Clinical Cancer Research|2020
Cited by 193Open Access

Abstract Purpose: Existing cell-free DNA (cfDNA) methods lack the sensitivity needed for detecting minimal residual disease (MRD) following therapy. We developed a test for tracking hundreds of patient-specific mutations to detect MRD with a 1,000-fold lower error rate than conventional sequencing. Experimental Design: We compared the sensitivity of our approach to digital droplet PCR (ddPCR) in a dilution series, then retrospectively identified two cohorts of patients who had undergone prospective plasma sampling and clinical data collection: 16 patients with ER+/HER2− metastatic breast cancer (MBC) sampled within 6 months following metastatic diagnosis and 142 patients with stage 0 to III breast cancer who received curative-intent treatment with most sampled at surgery and 1 year postoperative. We performed whole-exome sequencing of tumors and designed individualized MRD tests, which we applied to serial cfDNA samples. Results: Our approach was 100-fold more sensitive than ddPCR when tracking 488 mutations, but most patients had fewer identifiable tumor mutations to track in cfDNA (median = 57; range = 2–346). Clinical sensitivity was 81% (n = 13/16) in newly diagnosed MBC, 23% (n = 7/30) at postoperative and 19% (n = 6/32) at 1 year in early-stage disease, and highest in patients with the most tumor mutations available to track. MRD detection at 1 year was strongly associated with distant recurrence [HR = 20.8; 95% confidence interval, 7.3–58.9]. Median lead time from first positive sample to recurrence was 18.9 months (range = 3.4–39.2 months). Conclusions: Tracking large numbers of individualized tumor mutations in cfDNA can improve MRD detection, but its sensitivity is driven by the number of tumor mutations available to track.

Pf7: an open dataset of Plasmodium falciparum genome variation in 20,000 worldwide samples
Cited by 146Open Access

<ns3:p> We describe the MalariaGEN Pf7 data resource, the seventh release of <ns3:italic>Plasmodium falciparum</ns3:italic> genome variation data from the MalariaGEN network. It comprises over 20,000 samples from 82 partner studies in 33 countries, including several malaria endemic regions that were previously underrepresented. For the first time we include dried blood spot samples that were sequenced after selective whole genome amplification, necessitating new methods to genotype copy number variations. We identify a large number of newly emerging <ns3:italic>crt</ns3:italic> mutations in parts of Southeast Asia, and show examples of heterogeneities in patterns of drug resistance within Africa and within the Indian subcontinent. We describe the profile of variations in the C-terminal of the <ns3:italic>csp</ns3:italic> gene and relate this to the sequence used in the RTS,S and R21 malaria vaccines. Pf7 provides high-quality data on genotype calls for 6 million SNPs and short indels, analysis of large deletions that cause failure of rapid diagnostic tests, and systematic characterisation of six major drug resistance loci, all of which can be freely downloaded from the MalariaGEN website. </ns3:p>

Liquid biopsy detection of genomic alterations in pediatric brain tumors from cell-free DNA in peripheral blood, CSF, and urine
Mélanie Pagès, Denisse Rotem, Gregory Gydush et al.|Neuro-Oncology|2022
Cited by 79Open Access

BACKGROUND: The ability to identify genetic alterations in cancers is essential for precision medicine; however, surgical approaches to obtain brain tumor tissue are invasive. Profiling circulating tumor DNA (ctDNA) in liquid biopsies has emerged as a promising approach to avoid invasive procedures. Here, we systematically evaluated the feasibility of profiling pediatric brain tumors using ctDNA obtained from plasma, cerebrospinal fluid (CSF), and urine. METHODS: We prospectively collected 564 specimens (257 blood, 240 urine, and 67 CSF samples) from 258 patients across all histopathologies. We performed ultra-low-pass whole-genome sequencing (ULP-WGS) to assess copy number variations and estimate tumor fraction and developed a pediatric CNS tumor hybrid capture panel for deep sequencing of specific mutations and fusions. RESULTS: ULP-WGS detected copy number alterations in 9/46 (20%) CSF, 3/230 (1.3%) plasma, and 0/153 urine samples. Sequencing detected alterations in 3/10 (30%) CSF, 2/74 (2.7%) plasma, and 0/2 urine samples. The only positive results were in high-grade tumors. However, most samples had insufficient somatic mutations (median 1, range 0-39) discoverable by the sequencing panel to provide sufficient power to detect tumor fractions of greater than 0.1%. CONCLUSIONS: Children with brain tumors harbor very low levels of ctDNA in blood, CSF, and urine, with CSF having the most DNA detectable. Molecular profiling is feasible in a small subset of high-grade tumors. The level of clonal aberrations per genome is low in most of the tumors, posing a challenge for detection using whole-genome or even targeted sequencing methods. Substantial challenges therefore remain to genetically characterize pediatric brain tumors from liquid biopsies.

A complete diploid human genome benchmark for personalized genomics
Nancy F. Hansen, Nathan Dwarshuis, Hyun Joo Ji et al.|bioRxiv (Cold Spring Harbor Laboratory)|2025
Cited by 20Open Access

Human genome resequencing typically involves mapping reads to a reference genome to call variants; however, this approach suffers from both technical and reference biases, leaving many duplicated and structurally polymorphic regions of the genome unmapped. Consequently, existing variant benchmarks, generated by the same methods, fail to assess these complex regions. To address this limitation, we present a telomere-to-telomere genome benchmark that achieves near-perfect accuracy (i.e. no detectable errors) across 99.4% of the complete, diploid HG002 genome. This benchmark adds 701.4 Mb of autosomal sequence and both sex chromosomes (216.8 Mb), totaling 15.3% of the genome that was absent from prior benchmarks. We also provide a diploid annotation of genes, transposable elements, segmental duplications, and satellite repeats, including 39,144 protein-coding genes across both haplotypes. To facilitate application of the benchmark, we developed tools for measuring the accuracy of sequencing reads, phased variant call sets, and genome assemblies against a diploid reference. Genome-wide analyses show that state-of-the-art de novo assembly methods resolve 2-7% more sequence and outperform variant calling accuracy by an order of magnitude, yielding just one error per 100 kb across 99.9% of the benchmark regions. Adoption of genome-based benchmarking is expected to accelerate the development of cost-effective methods for complete genome sequencing, expanding the reach of genomic medicine to the entire genome and enabling a new era of personalized genomics.