Elias Jaasaari, Janne Leppa-aho, Tomi Silander, Teemu Roos
International Conference on Algorithmic Learning Theory, Lanzarote, Spain, Spain, 07 - 09 April 2018
The normalized maximum likelihood (NML) distribution achieves minimax log loss and coding regret for the multinomial model. In practice other nearly minimax distributions are used instead as calculating the sequential predictions needed for coding and prediction takes exponential time with NML. The Bayes mixture obtained with the Dirichlet prior Dir(1/2,...,1/2) and asymptotically minimax modifications of it have been widely studied in the context of large sample sizes. However, recently there has been interest in minimax optimal coding distributions for large alphabets. In this paper we investigate Dirichlet priors that achieve minimax coding regret when the alphabet is finite but large in comparison to the sample size. We prove that a Bayes mixture with the Dirichlet prior Dir(1/3,...,1/3) is optimal in this regime (in particular, when $m > 5n/2 + 4/(n - 2) + 3/2). The regret of the resulting distribution approaches the NML regret as the alphabet size grows.
Report number: