Assembling the Community-Scale Discoverable Human Proteome

Mingxun Wang(University of California San Diego), Jian Wang(University of California San Diego), Jeremy Carver(University of California San Diego), Benjamin Pullman(University of California San Diego), Seong Won(University of California San Diego), Nuno Bandeira(University of California San Diego)
Cell Systems
August 29, 2018
Cited by 222Open Access
Full Text

Abstract

The increasing throughput and sharing of proteomics mass spectrometry data have now yielded over one-third of a million public mass spectrometry runs. However, these discoveries are not continuously aggregated in an open and error-controlled manner, which limits their utility. To facilitate the reusability of these data, we built the MassIVE Knowledge Base (MassIVE-KB), a community-wide, continuously updating knowledge base that aggregates proteomics mass spectrometry discoveries into an open reusable format with full provenance information for community scrutiny. Reusing >31 TB of public human data stored in a mass spectrometry interactive virtual environment (MassIVE), the MassIVE-KB contains >2.1 million precursors from 19,610 proteins (48% larger than before; 97% of the total) and doubles proteome coverage to 6 million amino acids (54% of the proteome) with strict library-scale false discovery controls, thereby providing evidence for 430 proteins for which sufficient protein-level evidence was previously missing. Furthermore, MassIVE-KB can inform experimental design, helps identify and quantify new data, and provides tools for community construction of specialized spectral libraries.


Related Papers

No related papers found

Powered by citation graph analysis