Leveraging attention focus for effective reinforcement learning in complex domains
Cobo Rus, Luis Carlos
MetadataShow full item record
One of the hardest challenges in the field of machine learning is to build agents, such as robotic assistants in homes and hospitals, that can autonomously learn new tasks that they were not pre-programmed to tackle, without the intervention of an engineer. Reinforcement learning (RL) and learning from demonstration (LfD) are popular approaches for task learning, but they are often ineffective in high-dimensional domains unless provided with either a great deal of problem-specific domain information or a carefully crafted representation of the state and dynamics of the world. Unfortunately, autonomous agents trying to learn new tasks usually do not have access to such domain information nor to an appropriate representation. We demonstrate that algorithms that focus, at each moment, on the relevant features of the state space can achieve significant speed-ups over previous reinforcement learning algorithms with respect to the number of state features in complex domains. To do so, we introduce and evaluate a family of attention focus algorithms. We show that these algorithms can reduce the dimensionality of complex domains, creating a compact representation of the state space with which satisficing policies can be learned efficiently. Our approach obtains exponential speed-ups with respect to the number of features considered when compared with table-based learning algorithms and polynomial speed-ups when compared with state-of-the-art function approximation RL algorithms such as LSPI or fitted Q-learning. Our attention focus algorithms are divided in two classes, depending on the source of the focus information they require. Attention focus from human demonstrations infers the features to focus on from a set of demonstrations from human teachers performing the task the agent must learn. We introduce two algorithms within this class. The first one, abstraction from demonstration (AfD), identifies features that can be safely ignored in the whole state space and builds a state-space abstraction where a satisficing policy can be learned efficiently. The second, automatic decomposition and abstraction from demonstration, goes one step further, using the demonstrations to identify a set of subtasks and to find an appropriate abstraction for each subtask found. The other class of algorithms we present, attention focus with a world model, does not require a set of human demonstrations. Instead, it extracts the attention focus information from an object-based model of the world together with the agent experience in performing the task. Within this class, we introduce object-focused Q-learning (OF-Q), at first with an assumption of object independence that is later removed to support domains where objects interact with each other. Finally, we show that both sources of focus information can be combined for further speed-ups.