Budgeted multi-armed bandits with multiple plays

Yingce Xia(University of Science and Technology of China), Tao Qin(Microsoft Research Asia (China)), Weidong Ma(Microsoft Research Asia (China)), Nenghai Yu(University of Science and Technology of China), Tie‐Yan Liu(Microsoft Research Asia (China))
Unknown
July 9, 2016
Cited by 51

Abstract

We study the multi-play budgeted multi-armed bandit (MP-BMAB) problem, in which pulling an arm receives both a random reward and a random cost, and a player pulls L(≥ 1) arms at each round. The player targets at maximizing her total expected reward under a budget constraint B for the pulling costs. We present a multiple ratio confidence bound policy: At each round, we first calculate a truncated upper (lower) confidence bound for the expected reward (cost) of each arm, and then pull the L arms with the maximum ratio of the sum of the upper confidence bounds of rewards to the sum of the lower confidence bounds of costs. We design a 0- 1 integer linear fractional programming oracle that can pick such the L arms within polynomial time. We prove that the regret of our policy is sublinear in general and is log-linear for certain parameter settings. We further consider two special cases of MP-BMABs: (1) We derive a lower bound for any consistent policy for MP-BMABs with Bernoulli reward and cost distributions. (2) We show that the proposed policy can also solve conventional budgeted MAB problem (a special case of MP-BMABs with L = 1) and provides better theoretical results than existing UCB-based pulling policies.


Related Papers

No related papers found

Powered by citation graph analysis