Optimal policies for observing time series and related restless bandit problems

Published by Christopher Dance at 8 April 2019

Journal of Machine Learning Research (JMLR), 20(35), pp. 1−93

@article{JMLR:v20:17-185,

  author  = {Christopher R. Dance and Tomi Silander},

  title   = {Optimal Policies for Observing Time Series and Related Restless Bandit Problems},

  journal = {Journal of Machine Learning Research},

  year    = {2019},

  volume  = {20},

  number  = {35},

  pages   = {1-93},

  url     = {http://jmlr.org/papers/v20/17-185.html}

}

Careers home

Abstract

The trade-off between the cost of acquiring and processing data, and uncertainty due to a lack of data is fundamental in machine learning. A basic instance of this trade-off is the problem of deciding when to make noisy and costly observations of a discrete-time Gaussian random walk, so as to minimise the posterior variance plus observation costs. We present the first proof that a simple policy, which observes when the posterior variance exceeds a threshold, is optimal for this problem. The proof generalises to a wide range of cost functions other than the posterior variance. It is based on a new verification theorem by Nino-Mora that guarantees threshold structure for Markov decision processes, and on the relation between binary sequences known as Christoffel words and the dynamics of discontinuous nonlinear maps, which frequently arise in physics, control and biology. This result implies that optimal policies for linear-quadratic-Gaussian control with costly observations have a threshold structure. It also implies that the restless bandit problem of observing multiple such time series, has a well-defined Whittle index policy. We discuss computation of that index, give closed-form formulae for it, and compare the performance of the associated index policy with heuristic policies.

NAVER FRANCE Gender Equality 2024

NAVER FRANCE Gender Equality 2023

VISION

Perception to help robots understand and interact with the environment.

INTERACTION

Equip robots to interact safely with humans, other robots and systems.

ACTION

Providing embodied agents with sequential decision-making capabilities to safely execute complex tasks in dynamic environments.

Action

All

Publications

Blog

News

Careers

People

Optimal policies for observing time series and related restless bandit problems

All

Publications

Blog

News

Careers

People

Cookie settings