Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep\n Networks for Thompson Sampling

Carlos Riquelme; George Tucker; Jasper Snoek

doi:10.48550/arxiv.1802.09127

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep\n Networks for Thompson Sampling

Carlos Riquelme, George Tucker, Jasper Snoek

arXiv (Cornell University)

February 25, 2018

10.48550/arxiv.1802.09127

Cited by 46Open Access

Full Text

Abstract

Recent advances in deep reinforcement learning have made significant strides\nin performance on applications such as Go and Atari games. However, developing\npractical methods to balance exploration and exploitation in complex domains\nremains largely unsolved. Thompson Sampling and its extension to reinforcement\nlearning provide an elegant approach to exploration that only requires access\nto posterior samples of the model. At the same time, advances in approximate\nBayesian methods have made posterior approximation for flexible neural network\nmodels practical. Thus, it is attractive to consider approximate Bayesian\nneural networks in a Thompson Sampling framework. To understand the impact of\nusing an approximate posterior on Thompson Sampling, we benchmark\nwell-established and recently developed methods for approximate posterior\nsampling combined with Thompson Sampling over a series of contextual bandit\nproblems. We found that many approaches that have been successful in the\nsupervised learning setting underperformed in the sequential decision-making\nscenario. In particular, we highlight the challenge of adapting slowly\nconverging uncertainty estimates to the online setting.\n

Related Papers

No related papers found

Powered by citation graph analysis