Global Optimality Guarantees for Policy Gradient Methods
MetadataShow full item record
Policy gradients methods are perhaps the most widely used class of reinforcement learning algorithms. These methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, due to the multi-period nature of the objective, policy gradient algorithms face non-convex optimization problems and can get stuck in suboptimal local minima even for extremely simple problems. This talk with discus structural properties – shared by several canonical control problems – that guarantee the policy gradient objective function has no suboptimal stationary points despite being non-convex. Time permitting, I’ll also discuss (1) convergence rates that follow as a consequence of this theory and (2) consequences of this theory for policy gradient performed with highly expressive policy classes. * This talk is based on ongoing joint work with Jalaj Bhandari.