Compressed computation of good policies in large MDPs
MetadataShow full item record
Markov decision processes (MDPs) is a minimalist framework to capture that many tasks require long-term plans and feedback due to noisy dynamics. Yet, as a result MDPs lack structure and as such planning and learning in MDPs with the typically enormous state and action spaces is strongly intractable; no algorithm can avoid Bellman's curse of dimensionality in the worst case. However, as recognized already by Bellman and his co-workers at the advent of our field, for many problem of practical interest, the optimal value function of an MDP is well approximated by just using a few basis functions, such as those that are standardly used in numerical calculations. As knowing the optimal value function is essentially equivalent to knowing how to act optimally, one hopes that this observation can be turned into efficient algorithms as there are only a few coefficients to compute. If this is possible, we can think of the resulting algorithms as performing computations with a compressed form of the value functions. While many algorithms have been proposed as early as in the 1960s, until recently not much has been known about whether these compressed computations are possible and when. In this talk, I will discuss a few recent results (some positive, some negative) that are concerned with these compressed computations and conclude with some open problems. As we shall see, still today, there are more open questions than questions that have been satisfactorily answered.