Show simple item record

dc.contributor.advisorIsbell, Charles L.
dc.contributor.authorIrani, Arya John
dc.date.accessioned2015-06-08T18:20:16Z
dc.date.available2015-06-08T18:20:16Z
dc.date.created2015-05
dc.date.issued2015-04-08
dc.date.submittedMay 2015
dc.identifier.urihttp://hdl.handle.net/1853/53481
dc.description.abstractA pilot study by Subramanian et al. on Markov decision problem task decomposition by humans revealed that participants break down tasks into both short-term subgoals with a defined end-condition (such as "go to food") and long-term considerations and invariants with no end-condition (such as "avoid predators"). In the context of Markov decision problems, behaviors having clear start and end conditions are well-modeled by an abstraction known as options, but no abstraction exists in the literature for continuous constraints imposed on the agent's behavior. We propose two representations to fill this gap: the state constraint (a set or predicate identifying states that the agent should avoid) and the state-action constraint (identifying state-action pairs that should not be taken). State-action constraints can be directly utilized by an agent, which must choose an action in each state, while state constraints require an approximation of the MDP’s state transition function to be used; however, it is important to support both representations, as certain constraints may be more easily expressed in terms of one as compared to the other, and users may conceive of rules in either form. Using domains inspired by classic video games, this dissertation demonstrates the thesis that explicitly modeling this negative policy information improves reinforcement learning performance by decreasing the amount of training needed to achieve a given level of performance. In particular, we will show that even the use of negative policy information captured from individuals with no background in artificial intelligence yields improved performance. We also demonstrate that the use of options and constraints together form a powerful combination: an option and constraint can be taken together to construct a constrained option, which terminates in any situation where the original option would violate a constraint. In this way, a naive option defined to perform well in a best-case scenario may still accelerate learning in domains where the best-case scenario is not guaranteed.
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.publisherGeorgia Institute of Technology
dc.subject
dc.subjectInteractive machine learning
dc.subjectState constraints
dc.subjectState-action constraints
dc.subjectReinforcement learning
dc.subjectHierarchical reinforcement learning
dc.titleUtilizing negative policy information to accelerate reinforcement learning
dc.typeDissertation
dc.description.degreePh.D.
dc.contributor.departmentInteractive Computing
thesis.degree.levelDoctoral
dc.contributor.committeeMemberFeigh, Karen
dc.contributor.committeeMemberPrecup, Doina
dc.contributor.committeeMemberThomaz, Andrea L.
dc.contributor.committeeMemberRiedl, Mark O.
dc.date.updated2015-06-08T18:20:17Z


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record