• Login
    View Item 
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Policy-based exploration for efficient reinforcement learning

    Thumbnail
    View/Open
    SUBRAMANIAN-DISSERTATION-2020.pdf (2.363Mb)
    Date
    2020-04-25
    Author
    Subramanian, Kaushik
    Metadata
    Show full item record
    Abstract
    Reinforcement Learning (RL) is the field of research focused on solving sequential decision-making tasks modeled as Markov Decision Processes. Researchers have shown RL to be successful at solving a variety of problems like system operations (logistics), robot tasks (soccer, helicopter control) and computer games (Go, backgammon); however, in general, standard RL approaches do not scale well with the size of the problem. The reason this problem arises is that RL approaches rely on obtaining samples useful for learning the underlying structure. In this work we tackle the problem of smart exploration in RL, autonomously and using human interaction. We propose policy-based methods that serve to effectively bias exploration towards important aspects of the domain. Reinforcement Learning agents use function approximation methods to generalize over large and complex domains. One of the most well-studied approaches is using linear regression algorithms to model the value function of the decision-making problem. We introduce a policy-based method that uses statistical criteria derived from linear regression analysis to bias the agent to explore samples useful for learning. We show how we can learn exploration policies autonomously and from human demonstrations (using concepts of active learning) to facilitate fast convergence to the optimal policy. We then tackle the problem of human-guided exploration in RL. We present a probabilistic method which combines human evaluations, instantiated as policy signals, with Bayesian RL. We show how this approach provides performance speedups while being robust to noisy, suboptimal human signals. We also present an approach that makes use of some of the inherent structure in the exploratory human demonstrations to assist Monte Carlo RL to overcome its limitations and efficiently solve large-scale problems. We implement our methods on popular arcade games and highlight the improvements achieved using our approach. We show how the work on using humans to help agents efficiently explore sequential decision-making tasks is an important and necessary step in applying Reinforcement Learning to complex problems.
    URI
    http://hdl.handle.net/1853/62831
    Collections
    • College of Computing Theses and Dissertations [1191]
    • Georgia Tech Theses and Dissertations [23877]
    • School of Interactive Computing Theses and Dissertations [144]

    Browse

    All of SMARTechCommunities & CollectionsDatesAuthorsTitlesSubjectsTypesThis CollectionDatesAuthorsTitlesSubjectsTypes

    My SMARTech

    Login

    Statistics

    View Usage StatisticsView Google Analytics Statistics
    facebook instagram twitter youtube
    • My Account
    • Contact us
    • Directory
    • Campus Map
    • Support/Give
    • Library Accessibility
      • About SMARTech
      • SMARTech Terms of Use
    Georgia Tech Library266 4th Street NW, Atlanta, GA 30332
    404.894.4500
    • Emergency Information
    • Legal and Privacy Information
    • Human Trafficking Notice
    • Accessibility
    • Accountability
    • Accreditation
    • Employment
    © 2020 Georgia Institute of Technology