Information relaxation in stochastic optimal control
MetadataShow full item record
Dynamic programming is a principal method for analyzing stochastic optimal control problems. However, the exact computation of dynamic programming can be intractable in large-scale problems due to the "curse of dimensionality". Various approximate dynamic programming methods have been proposed to address this issue and they can often generate suboptimal policies, though it is generally difficult to tell how far these suboptimal policies are from optimal. To this end, this thesis concerns with studying the stochastic control problems from a duality perspective and generating upper bounds on maximal rewards, which complements lower bounds on maximal rewards that can be derived by simulation under heuristic policies. If the gap between the lower and upper bounds is small, it implies that the heuristic policy must be close to optimal. The approach of "information relaxation'' considered in this thesis, proposed by Brown et al. 2013, relaxes the non-anticipativity constraint that requires the decisions to depend only on the information available to the decision maker and impose a penalty that punishes such a violation. This thesis further explores theories of information relaxation and computational methods in several stochastic optimal control problems. We first study the interaction of Lagrangian relaxation and information relaxation in weakly coupled dynamic program. A commonly studied approach builds on the property that this high-dimensional problem can be decoupled by dualizing the resource constraints via Lagrangian relaxation. We generalize the information relaxation approach to improve upon the Lagrangian bound and develop a computational method to tackle large-scale problems. Second, we formulate the information relaxation-based duality in an important class of continuous-time decision-making models -- controlled Markov diffusion. We find that this continuous-time model admits an optimal penalty in compact expression -- an Ito stochastic integral, which enables us to construct approximate penalties in simple forms and achieve tight dual bounds, and to facilitate the computation of dual bounds significantly compared with that of the discrete-time model. Finally, we consider the problem of optimal stopping of discrete-time continuous-state partially observable Markov processes. We develop a filtering-based dual approach, which relies on the martingale duality formulation of the optimal stopping problem and the particle filtering technique.