A

Artem Sokolov

Harvard University

ORCID: 0000-0002-8056-0504

Publishes on Topic Modeling, Natural Language Processing Techniques, Bioinformatics and Genomic Networks. 150 papers and 9.7k citations.

150Publications
9.7kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

A large-scale evaluation of computational protein function prediction
Predrag Radivojac, Wyatt T. Clark, Tal Oron et al.|Nature Methods|2013
Cited by 1.1kOpen Access

Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.

Clinical and Genomic Characterization of Treatment-Emergent Small-Cell Neuroendocrine Prostate Cancer: A Multi-institutional Prospective Study
Rahul Aggarwal, Jiaoti Huang, Joshi J. Alumkal et al.|Journal of Clinical Oncology|2018
Cited by 803

Purpose The prevalence and features of treatment-emergent small-cell neuroendocrine prostate cancer (t-SCNC) are not well characterized in the era of modern androgen receptor (AR)-targeting therapy. We sought to characterize the clinical and genomic features of t-SCNC in a multi-institutional prospective study. Methods Patients with progressive, metastatic castration-resistant prostate cancer (mCRPC) underwent metastatic tumor biopsy and were followed for survival. Metastatic biopsy specimens underwent independent, blinded pathology review along with RNA/DNA sequencing. Results A total of 202 consecutive patients were enrolled. One hundred forty-eight (73%) had prior disease progression on abiraterone and/or enzalutamide. The biopsy evaluable rate was 79%. The overall incidence of t-SCNC detection was 17%. AR amplification and protein expression were present in 67% and 75%, respectively, of t-SCNC biopsy specimens. t-SCNC was detected at similar proportions in bone, node, and visceral organ biopsy specimens. Genomic alterations in the DNA repair pathway were nearly mutually exclusive with t-SCNC differentiation ( P = .035). Detection of t-SCNC was associated with shortened overall survival among patients with prior AR-targeting therapy for mCRPC (hazard ratio, 2.02; 95% CI, 1.07 to 3.82). Unsupervised hierarchical clustering of the transcriptome identified a small-cell-like cluster that further enriched for adverse survival outcomes (hazard ratio, 3.00; 95% CI, 1.25 to 7.19). A t-SCNC transcriptional signature was developed and validated in multiple external data sets with > 90% accuracy. Multiple transcriptional regulators of t-SCNC were identified, including the pancreatic neuroendocrine marker PDX1. Conclusion t-SCNC is present in nearly one fifth of patients with mCRPC and is associated with shortened survival. The near-mutual exclusivity with DNA repair alterations suggests t-SCNC may be a distinct subset of mCRPC. Transcriptional profiling facilitates the identification of t-SCNC and novel therapeutic targets.

Assessing the clinical utility of cancer genomic and proteomic data across tumor types
Yuan Yuan, Eliezer M. Van Allen, Larsson Omberg et al.|Nature Biotechnology|2014
Cited by 296Open Access

Molecular profiling of tumors promises to advance the clinical management of cancer, but the benefits of integrating molecular data with traditional clinical variables have not been systematically studied. Here we retrospectively predict patient survival using diverse molecular data (somatic copy-number alteration, DNA methylation and mRNA, microRNA and protein expression) from 953 samples of four cancer types from The Cancer Genome Atlas project. We find that incorporating molecular data with clinical variables yields statistically significantly improved predictions (FDR < 0.05) for three cancers but those quantitative gains were limited (2.2-23.9%). Additional analyses revealed little predictive power across tumor types except for one case. In clinically relevant genes, we identified 10,281 somatic alterations across 12 cancer types in 2,928 of 3,277 patients (89.4%), many of which would not be revealed in single-tumor analyses. Our study provides a starting point and resources, including an open-access model evaluation platform, for building reliable prognostic and therapeutic strategies that incorporate molecular data.