Giulia Luise

Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance

Giulia Luise, Alessandro Rudi, Massimiliano Pontil et al.|arXiv (Cornell University)|2018

Cited by 75Open Access

Applications of optimal transport have recently gained remarkable attention thanks to the computational advantages of entropic regularization. However, in most situations the Sinkhorn approximation of the Wasserstein distance is replaced by a regularized version that is less accurate but easy to differentiate. In this work we characterize the differential properties of the original Sinkhorn distance, proving that it enjoys the same smoothness as its regularized version and we explicitly provide an efficient algorithm to compute its gradient. We show that this result benefits both theory and applications: on one hand, high order smoothness confers statistical guarantees to learning with Wasserstein approximations. On the other hand, the gradient formula allows us to efficiently solve learning and optimization problems in practice. Promising preliminary experiments complement our analysis.

On Over-Squashing in Message Passing Neural Networks: The Impact of Width, Depth, and Topology

Francesco Di Giovanni, Lorenzo Giusti, Federico Barbero et al.|arXiv (Cornell University)|2023

Cited by 31Open Access

Message Passing Neural Networks (MPNNs) are instances of Graph Neural Networks that leverage the graph to send messages over the edges. This inductive bias leads to a phenomenon known as over-squashing, where a node feature is insensitive to information contained at distant nodes. Despite recent methods introduced to mitigate this issue, an understanding of the causes for over-squashing and of possible solutions are lacking. In this theoretical work, we prove that: (i) Neural network width can mitigate over-squashing, but at the cost of making the whole network more sensitive; (ii) Conversely, depth cannot help mitigate over-squashing: increasing the number of layers leads to over-squashing being dominated by vanishing gradients; (iii) The graph topology plays the greatest role, since over-squashing occurs between nodes at high commute (access) time. Our analysis provides a unified framework to study different recent methods introduced to cope with over-squashing and serves as a justification for a class of methods that fall under graph rewiring.

Exploiting MMD and Sinkhorn Divergences for Fair and Transferable Representation Learning

Luca Oneto, Michele Donini, Giulia Luise et al.|Neural Information Processing Systems|2020

Cited by 25

A Non-Asymptotic Analysis for Stein Variational Gradient Descent

Anna Korba, Adil Salim, Michael Arbel et al.|arXiv (Cornell University)|2020

Cited by 15Open Access

We study the Stein Variational Gradient Descent (SVGD) algorithm, which optimises a set of particles to approximate a target probability distribution $π\propto e^{-V}$ on $\mathbb{R}^d$. In the population limit, SVGD performs gradient descent in the space of probability distributions on the KL divergence with respect to $π$, where the gradient is smoothed through a kernel integral operator. In this paper, we provide a novel finite time analysis for the SVGD algorithm. We provide a descent lemma establishing that the algorithm decreases the objective at each iteration, and rates of convergence for the average Stein Fisher divergence (also referred to as Kernel Stein Discrepancy). We also provide a convergence result of the finite particle system corresponding to the practical implementation of SVGD to its population version.

A Non-Asymptotic Analysis for Stein Variational Gradient Descent

Anna Korba, Adil Salim, Michael Arbel et al.|King Abdullah University of Science and Technology Repository (King Abdullah University of Science and Technology)|2020

Cited by 15Open Access

We study the Stein Variational Gradient Descent (SVGD) algorithm, which optimises a set of particles to approximate a target probability distribution $\\pi\\propto e^{-V}$ on $\\mathbb{R}^d$. In the population limit, SVGD performs gradient descent in the space of probability distributions on the KL divergence with respect to $\\pi$, where the gradient is smoothed through a kernel integral operator. In this paper, we provide a novel finite time analysis for the SVGD algorithm. We obtain a descent lemma establishing that the algorithm decreases the objective at each iteration, and provably converges, with less restrictive assumptions on the step size than required in earlier analyses. We further provide a guarantee on the convergence rate in Kullback-Leibler divergence, assuming $\\pi$ satisfies a Stein log-Sobolev inequality as in Duncan et al. (2019), which takes into account the geometry induced by the smoothed KL gradient.

Is this you? Claim your profile.

Top publicationsby citations