Chris Dance, Onno Zoeter
Proceedings of Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, 2011
This paper studies optimal price learning for one or more items. We introduce the Schr"odinger price experiment (SPE) which superimposes classical price experiments using lotteries,and thereby extracts more information from each customer interaction. If buyers are perfectly rational we show that there exist SPEs that in the limit of infinite superposition learn optimally emph{and} exploit optimally. We refer to the new resulting mechanism as the hopeful mechanism (HM) since although it is incentive compatible, buyers can deviate with extreme consequences for the seller at very little cost to themselves. For real-world settings we propose a robust version of the
approach which takes the form of a Markov decision process where the actions are functions. We provide approximate policies motivated by the best of
sampled set (BOSS) algorithm coupled with approximate Bayesian inference. Numerical studies show that the proposed method significantly increases seller revenue compared to classical price experimentation, even for the single-item case.
Report number: