Identifying robust communities and multi-community nodes by combining top-down and bottom-up approaches to clustering

Chris Gaiteri(Rush University Medical Center), Mingming Chen(Rensselaer Polytechnic Institute), Bolesław K. Szymański(Rensselaer Polytechnic Institute), Konstantin Kuzmin(Rensselaer Polytechnic Institute), Jierui Xie(Rensselaer Polytechnic Institute), Changkyu Lee(Allen Institute for Brain Science), Timothy J. Blanche(Allen Institute for Brain Science), Elias Chaibub Neto(Sage Bionetworks), Su‐Chun Huang(University of Washington), Thomas J. Grabowski(University of Washington), Tara Madhyastha(University of Washington), Vitalina Komashko(Seattle Institute of Oriental Medicine)
Scientific Reports
November 9, 2015
Cited by 83Open Access
Full Text

Abstract

Biological functions are carried out by groups of interacting molecules, cells or tissues, known as communities. Membership in these communities may overlap when biological components are involved in multiple functions. However, traditional clustering methods detect non-overlapping communities. These detected communities may also be unstable and difficult to replicate, because traditional methods are sensitive to noise and parameter settings. These aspects of traditional clustering methods limit our ability to detect biological communities, and therefore our ability to understand biological functions. To address these limitations and detect robust overlapping biological communities, we propose an unorthodox clustering method called SpeakEasy which identifies communities using top-down and bottom-up approaches simultaneously. Specifically, nodes join communities based on their local connections, as well as global information about the network structure. This method can quantify the stability of each community, automatically identify the number of communities, and quickly cluster networks with hundreds of thousands of nodes. SpeakEasy shows top performance on synthetic clustering benchmarks and accurately identifies meaningful biological communities in a range of datasets, including: gene microarrays, protein interactions, sorted cell populations, electrophysiology and fMRI brain imaging.


Related Papers

No related papers found

Powered by citation graph analysis