Improving Model-Predictive Control with Value Function Approximation
MetadataShow full item record
Existing Model Predictive Control methods rely on finite-horizon trajectories from the environment. Such methods are limited by the length of the samples because the robot cannot plan for scenarios beyond this time horizon. Simply extending the time-horizon of sampled trajectories is not feasible as an increase in the time-horizon requires more sampled trajectories from the environment in order to maintain controller performance. On robots such as the AutoRally platform, which operate in real time with limited computational power, increasing the number of sampled trajectories is computationally intractable. This work improves the long-term planning capabilities of autonomous systems by augmenting cost-estimates of trajectories with a learned value of the terminal state. This learned value approximates the expected cost under the car's current control policy from the terminal state for an arbitrary time-horizon without requiring an increase in the number of samples. We show that this improves the lap times of the AutoRally platform.