Yash Deshpande

Asymptotic mutual information for the balanced binary stochastic block model

Yash Deshpande, Emmanuel Abbé, Andrea Montanari|Information and Inference A Journal of the IMA|2016

Cited by 79

We develop an information-theoretic view of the stochastic block model, a popular statistical model for the large-scale structure of complex networks. A graph |$G$| from such a model is generated by first assigning vertex labels at random from a finite alphabet, and then connecting vertices with edge probabilities depending on the labels of the endpoints. In the case of the symmetric two-group model, we establish an explicit ‘single-letter’ characterization of the per-vertex mutual information between the vertex labels and the graph, when the mean vertex degree diverges. The explicit expression of the mutual information is intimately related to estimation-theoretic quantities, and —in particular— reveals a phase transition at the critical point for community detection. Below the critical point the per-vertex mutual information is asymptotically the same as if edges were independent. Correspondingly, no algorithm can estimate the partition better than random guessing. Conversely, above the threshold, the per-vertex mutual information is strictly smaller than the independent-edges upper bound. In this regime, there exists a procedure that estimates the vertex labels better than random guessing.

Finding Hidden Cliques of Size $$\sqrt{N/e}$$ N / e in Nearly Linear Time

Yash Deshpande, Andrea Montanari|Foundations of Computational Mathematics|2014

Cited by 78

Stratification of amyotrophic lateral sclerosis patients: a crowdsourcing approach

Robert Kueffner, Neta Zach, Maya Bronfeld et al.|Scientific Reports|2019

Cited by 55Open Access

Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease where substantial heterogeneity in clinical presentation urgently requires a better stratification of patients for the development of drug trials and clinical care. In this study we explored stratification through a crowdsourcing approach, the DREAM Prize4Life ALS Stratification Challenge. Using data from >10,000 patients from ALS clinical trials and 1479 patients from community-based patient registers, more than 30 teams developed new approaches for machine learning and clustering, outperforming the best current predictions of disease outcome. We propose a new method to integrate and analyze patient clusters across methods, showing a clear pattern of consistent and clinically relevant sub-groups of patients that also enabled the reliable classification of new patients. Our analyses reveal novel insights in ALS and describe for the first time the potential of a crowdsourcing to uncover hidden patient sub-populations, and to accelerate disease understanding and therapeutic development.

Sparse PCA via Covariance Thresholding

Yash Deshpande, Andrea Montanari|arXiv (Cornell University)|2013

Cited by 45Open Access

In sparse principal component analysis we are given noisy observations of a low-rank matrix of dimension $n\times p$ and seek to reconstruct it under additional sparsity assumptions. In particular, we assume here each of the principal components $\mathbf{v}_1,\dots,\mathbf{v}_r$ has at most $s_0$ non-zero entries. We are particularly interested in the high dimensional regime wherein $p$ is comparable to, or even much larger than $n$. In an influential paper, \cite{johnstone2004sparse} introduced a simple algorithm that estimates the support of the principal vectors $\mathbf{v}_1,\dots,\mathbf{v}_r$ by the largest entries in the diagonal of the empirical covariance. This method can be shown to identify the correct support with high probability if $s_0\le K_1\sqrt{n/\log p}$, and to fail with high probability if $s_0\ge K_2 \sqrt{n/\log p}$ for two constants $0

Linear Bandits in High Dimension and Recommendation Systems

Yash Deshpande, Andrea Montanari|Unknown|2012

Cited by 44

A large number of online services provide automated recommendations to help users to navigate through a large collection of items. New items (products, videos, songs, advertisements) are suggested on the basis of the user’s past history and –when available – her demographic profile. Recommendations have to satisfy the dual goal of helping the user to explore the space of available items, while allowing the system to probe the user’s preferences. We model this trade-off using linearly parametrized multi-armed bandits, propose a policy and prove upper and lower bounds on the cumulative “reward ” that coincide up to constants in the data poor (high-dimensional) regime. Prior work on linear bandits has focused on the data rich (low-dimensional) regime and used cumulative “risk ” as the figure of merit. For this data rich regime, we provide a simple modification for our policy that achieves near-optimal risk performance under more restrictive assumptions on the geometry of the problem. We test (a variation of) the scheme used for establishing achievability on the Netflix and MovieLens datasets and obtain good agreement with the qualitative predictions of the theory we develop. 1

Is this you? Claim your profile.

Top publicationsby citations