Robot Learning from Heterogeneous Demonstration
Chen, Letian Zac
MetadataShow full item record
Learning from Demonstration (LfD) has become a ubiquitous and user-friendly technique to teach a robot how to perform a task (e.g., playing Ping Pong) without the need to use a traditional programming language (e.g., C++). As these systems are increasingly being placed in the hands of everyday users, researchers are faced with the reality that end-users are a heterogeneous population with varying levels of skills and experiences. This heterogeneity violates almost universal assumptions in LfD algorithms that demonstrations given by users are near-optimal and uniform in how the task is accomplished. In this thesis, I present algorithms to tackle two specific types of heterogeneity: heterogeneous strategy and heterogeneous performance. First, I present Multi-Strategy Reward Distillation (MSRD), which tackles the problem of learning from users who have adopted heterogeneous strategies. MSRD extracts separate task reward and strategy reward, which represents task specification and demonstrator's strategic preference, respectively. We are able to extract the task reward that has 0.998 and 0.943 correlation with ground-truth reward on two simulated robotic tasks and successfully deploy it on a real-robot table-tennis task. Second, I develop two algorithms to address the problem of learning from suboptimal demonstration: SSRR and OP-AIRL. SSRR is a novel mechanism to regress over noisy demonstrations to infer an idealized reward function. OP-AIRL is a mechanism to learn a policy that more effectively teases out ambiguity from sub-optimal demonstrations. By combining SSRR with OP-AIRL, we are able to achieve a 688% and a 254% improvement over state-of-the-art on two simulated robot tasks.