Parameter Free Policy Shaping
MetadataShow full item record
Policy Shaping is an algorithm that takes inputs as state-action-evaluation triples, that the learner obtains through interaction with a teacher. These triples are then used in combination with self exploration and a traditional reinforcement learning algorithm to learn a task. Policy Shaping has been experimentally shown to work well with noisy input from non-expert human teachers that are unfamiliar with the algorithm. Interactions with human teachers generate state-action-evaluation triples and, so far, the meaning of the evaluation part of the triple has always been hard coded into the algorithm. We present an algorithm that allows the learner to estimate the meaning of these labels automatically. The learner observes only unidentified evaluation labels, and then continuously re-estimates their meaning during learning and self exploration. Experiments with 30 human teachers, and several different types of simulated teachers, show that the algorithm is able to quickly understand the meaning of, and make use of: demonstrations, explicit action advice, and critique. For each of these three information sources, the parameter free algorithm strongly outperforms all static interpretations of labels when dealing with a set of teachers that have large internal variation in behavior. That is, when no single interpretation fits the full set of teachers that the learner interacts with, autonomously building an individual model for each teacher outperforms any a priori interpretation applied to the entire group.