Ann He

Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing

Gordon Robertson, Martin Hirst, Matthew N. Bainbridge et al.|Nature Methods|2007

Cited by 1.5k

Complete genomic landscape of a recurring sporadic parathyroid carcinoma

Katayoon Kasaian, Sam M. Wiseman, Nina Thiessen et al.|The Journal of Pathology|2013

Cited by 69

Parathyroid carcinoma is a rare endocrine malignancy with an estimated incidence of less than 1 per million population. Excessive secretion of parathyroid hormone, extremely high serum calcium level, and the deleterious effects of hypercalcaemia are the clinical manifestations of the disease. Up to 60% of patients develop multiple disease recurrences and although long-term survival is possible with palliative surgery, permanent remission is rarely achieved. Molecular drivers of sporadic parathyroid carcinoma have remained largely unknown. Previous studies, mostly based on familial cases of the disease, suggested potential roles for the tumour suppressor MEN1 and proto-oncogene RET in benign parathyroid tumourigenesis, while the tumour suppressor HRPT2 and proto-oncogene CCND1 may also act as drivers in parathyroid cancer. Here, we report the complete genomic analysis of a sporadic and recurring parathyroid carcinoma. Mutational landscapes of the primary and recurrent tumour specimens were analysed using high-throughput sequencing technologies. Such molecular profiling allowed for identification of somatic mutations never previously identified in this malignancy. These included single nucleotide point mutations in well-characterized cancer genes such as mTOR, MLL2, CDKN2C, and PIK3CA. Comparison of acquired mutations in patient-matched primary and recurrent tumours revealed loss of PIK3CA activating mutation during the evolution of the tumour from the primary to the recurrence. Structural variations leading to gene fusions and regions of copy loss and gain were identified at a single-base resolution. Loss of the short arm of chromosome 1, along with somatic missense and truncating mutations in CDKN2C and THRAP3, respectively, provides new evidence for the potential role of these genes as tumour suppressors in parathyroid cancer. The key somatic mutations identified in this study can serve as novel diagnostic markers as well as therapeutic targets.

Learning Dependency Structures for Weak Supervision Models

Paroma Varma, Frédéric Sala, Ann He et al.|International Conference on Machine Learning|2019

Cited by 26

Labeling training data is a key bottleneck in the modern machine learning pipeline. Recent weak supervision approaches combine labels from multiple noisy sources by estimating their accuracies without access to ground truth labels; however, estimating the dependencies among these sources is a critical challenge. We focus on a robust PCA-based algorithm for learning these dependency structures, establish improved theoretical recovery rates, and outperform existing methods on various real-world tasks. Under certain conditions, we show that the amount of unlabeled data needed can scale sublinearly or even logarithmically with the number of sources $m$, improving over previous efforts that ignore the sparsity pattern in the dependency structure and scale linearly in $m$. We provide an information-theoretic lower bound on the minimum sample complexity of the weak supervision setting. Our method outperforms weak supervision approaches that assume conditionally-independent sources by up to 4.64 F1 points and previous structure learning approaches by up to 4.41 F1 points on real-world relation extraction and image classification tasks.

Comparison of segmentation-free and segmentation-dependent computer-aided diagnosis of breast masses on a public mammography dataset

Rebecca Sawyer Lee, Jared Dunnmon, Ann He et al.|Journal of Biomedical Informatics|2020

Cited by 19Open Access

Learning Dependency Structures for Weak Supervision Models

Paroma Varma, Frédéric Sala, Ann He et al.|arXiv (Cornell University)|2019

Cited by 14Open Access

Labeling training data is a key bottleneck in the modern machine learning pipeline. Recent weak supervision approaches combine labels from multiple noisy sources by estimating their accuracies without access to ground truth labels; however, estimating the dependencies among these sources is a critical challenge. We focus on a robust PCA-based algorithm for learning these dependency structures, establish improved theoretical recovery rates, and outperform existing methods on various real-world tasks. Under certain conditions, we show that the amount of unlabeled data needed can scale sublinearly or even logarithmically with the number of sources $m$, improving over previous efforts that ignore the sparsity pattern in the dependency structure and scale linearly in $m$. We provide an information-theoretic lower bound on the minimum sample complexity of the weak supervision setting. Our method outperforms weak supervision approaches that assume conditionally-independent sources by up to 4.64 F1 points and previous structure learning approaches by up to 4.41 F1 points on real-world relation extraction and image classification tasks.

Is this you? Claim your profile.

Top publicationsby citations