Emulation and imitation via perceptual goal specifications
Abstract
This dissertation aims to demonstrate how perceptual goal specifications may be used as alternative representations for specifying domain-specific reward functions for reinforcement learning. The works outlined in this document aim to validate the following thesis statement: Employing perceptual goal specifications for goal-directed tasks: is as straightforward as specifying domain-specific rewards; is a more general representation for tasks; and equally enables task completion. We describe various approaches for specifying goals visually and how we may compute rewards and learn policies directly from these representations. Chapter 4 introduces Perceptual Reward Functions and describes how we can utilize a hand-defined similarity metric to enable learning from goals that are different from an agent’s. Chapter 5 introduces Cross-Domain Perceptual Reward Functions and describes how we can learn a reward function for cross-domain goal specifications. Chapter 6 introduces Perceptual Value Functions and describes how we can learn a value function from sequences of expert observations without access to ground-truth actions. Chapter 7 introduces Latent Policy Networks and describes how we can learn a policy from sequences of expert observations without access to ground-truth actions. The remaining chapters motivate and provide background for this dissertation and outline a plan for future research.