Anticipatory Robot Control for a Partially Observable Environment Using Episodic Memories
This paper explains an episodic-memory based approach for computing anticipatory robot behavior in a partially observable environment. Inspired by biological findings on the mammalian hippocampus, here, the episodic memories retain a sequence of experienced observation, behavior, and reward. Incorporating multiple machine learning methods, this approach attempts to help reducing the computational burden of the partially observable Markov decision process (POMDP). In particular, the proposed computational reduction techniques include: 1) abstraction of the state space via temporal difference learning; 2) abstraction of the action space by utilizing motor schemata; 3) narrowing down the state space in terms of the goals by employing instance-based learning; 4) elimination of the value-iteration by assuming a unidirectional-linear-chaining formation of the state space; 5) reduction of the state-estimate computation by exploiting the property of the Poisson distribution; and 6) trimming the history length by imposing the cap on the number of episodes that are computed. Furthermore, claims 5) and 6) were empirically verified, and it was confirmed that the state estimation can be in fact computed in an O(n) time (where n is the number of the states), more efficient than a conventional Kalman-filter based approach of O(n2).