effSAMWMIX: An efficient Stochastic Multi-Armed Bandit Algorithm based on a Simulated Annealing with Multiplicative Weights

Villari, Boby Chaitanya; Abdulla, Mohammed Shahid

DSpace Home
→
18. Working Papers
→
2017
→
View Item

dc.contributor.author	Villari, Boby Chaitanya
dc.contributor.author	Abdulla, Mohammed Shahid
dc.date.accessioned	2017-05-18T10:39:48Z
dc.date.available	2017-05-18T10:39:48Z
dc.date.issued	2017-01
dc.identifier.uri	http://hdl.handle.net/2259/935
dc.description.abstract	—SAMWMIX, a Stochastic Multi-Armed Bandit(SMAB) which obtains a 𝑶𝑶(𝒍𝒍𝒍𝒍𝒍𝒍 T) where T being the number of steps in the time horizon, is proposed in the literature . A blind-SAMWMIX which incorporates an input parameter ,which has better empirical performance but obtains a regret of the order 𝑶𝑶(𝒍𝒍𝒍𝒍𝒈𝒈𝟏𝟏+𝟐𝟐𝜶𝜶 𝑻𝑻).Current work proposes an efficient version of SAMWMIX which not only obtains a regret of 𝑶𝑶(𝒍𝒍𝒍𝒍𝒍𝒍 K) but also exults a better performance. A proof for the same is given in this work. The proposed effSAMWMIX algorithm is compared with KL-UCB and Thompson Sampling(TS) algorithms over rewards which follow distributions like Exponential, Poisson, Bernoulli, Triangular, Truncated Normal distribution and a synthetic distribution designed to stress test SMAB algorithms with closely spaced reward means. It is shown that effSAMWMIX performs better than both KL-UCB & TS in both regret performance and execution time	en_US
dc.language.iso	en	en_US
dc.publisher	Indian Institute of Management	en_US
dc.subject	stochastic multi-armed bandit	en_US
dc.subject	stochastic processes	en_US
dc.subject	reward distributions	en_US
dc.subject	optimization	en_US
dc.title	effSAMWMIX: An efficient Stochastic Multi-Armed Bandit Algorithm based on a Simulated Annealing with Multiplicative Weights	en_US
dc.type	Working Paper	en_US