Georgia Tech Institutional Repository
https://smartech.gatech.edu:443/xmlui
The SMARTech digital repository system captures, stores, indexes, preserves, and distributes digital research material.
Sat, 28 Mar 2015 12:23:21 GMT
20150328T12:23:21Z

Policy Shaping: Integrating Human Feedback with Reinforcement Learning
http://hdl.handle.net/1853/53270
Policy Shaping: Integrating Human Feedback with Reinforcement Learning
Griffith, Shane; Subramanian, Kaushik; Scholz, Jonathan; Isbell, Charles L.; Thomaz, Andrea L.
A long term goal of Interactive Reinforcement Learning is to
incorporate non
expert human feedback to solve complex tasks. Some stateof
theart methods
have approached this problem by mapping human information to rewards and values and iterating over them to compute better control policies. In this paper we
argue for an alternate, more effective characterization of
human feedback: Policy
Shaping. We introduce
Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels. We compare Advise
to stateoftheart approaches and show that it can outperform
them and is robust to infrequent and inconsistent human feedback.
Copyright© (2013) by Neural Information Processing Systems; Presented at the 27th Annual Conference on Neural Information Processing Systems (NIPS 2013), 510 December 2013, Lake Tahoe, Nevada.
Tue, 01 Jan 2013 00:00:00 GMT
http://hdl.handle.net/1853/53270
20130101T00:00:00Z
Griffith, Shane
Subramanian, Kaushik
Scholz, Jonathan
Isbell, Charles L.
Thomaz, Andrea L.
A long term goal of Interactive Reinforcement Learning is to
incorporate non
expert human feedback to solve complex tasks. Some stateof
theart methods
have approached this problem by mapping human information to rewards and values and iterating over them to compute better control policies. In this paper we
argue for an alternate, more effective characterization of
human feedback: Policy
Shaping. We introduce
Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels. We compare Advise
to stateoftheart approaches and show that it can outperform
them and is robust to infrequent and inconsistent human feedback.

InformationTheoretic Stochastic Optimal Control via Incremental Samplingbased Algorithms
http://hdl.handle.net/1853/53269
InformationTheoretic Stochastic Optimal Control via Incremental Samplingbased Algorithms
Arslan, Oktay; Theodorou, Evangelos A.; Tsiotras, Panagiotis
This paper considers optimal control of dynamical systems which are represented by nonlinear stochastic differential equations. It is wellknown that the optimal control policy for this problem can be obtained as a function of a value function that satisfies a nonlinear partial differential equation, namely, the HamiltonJacobiBellman equation. This nonlinear PDE must be solved backwards in time, and this computation is intractable for large scale systems. Under certain assumptions, and after applying a logarithmic transformation, an alternative characterization of the optimal policy can be given in terms of a path integral. Path Integral (PI) based control methods have recently been shown to provide elegant solutions to a broad class of stochastic optimal control problems. One of the implementation challenges with this formalism is the computation of the expectation of a cost functional over the trajectories of the unforced dynamics. Computing such expectation over trajectories that are sampled uniformly may induce numerical instabilities due to the exponentiation of the cost. Therefore, sampling of lowcost trajectories is essential for the practical implementation of PIbased methods. In this paper, we use incremental samplingbased algorithms to sample useful trajectories from the unforced system dynamics, and make a novel connection between Rapidlyexploring Random Trees (RRTs) and informationtheoretic stochastic optimal control. We show the results from the numerical implementation of the proposed approach to several examples.
Copyright © 2014 IEEE; Presented at IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Orlando, FL, Dec. 912, 2014; DOI: http://dx.doi.org/10.1109/ADPRL.2014.7010617
Mon, 01 Dec 2014 00:00:00 GMT
http://hdl.handle.net/1853/53269
20141201T00:00:00Z
Arslan, Oktay
Theodorou, Evangelos A.
Tsiotras, Panagiotis
This paper considers optimal control of dynamical systems which are represented by nonlinear stochastic differential equations. It is wellknown that the optimal control policy for this problem can be obtained as a function of a value function that satisfies a nonlinear partial differential equation, namely, the HamiltonJacobiBellman equation. This nonlinear PDE must be solved backwards in time, and this computation is intractable for large scale systems. Under certain assumptions, and after applying a logarithmic transformation, an alternative characterization of the optimal policy can be given in terms of a path integral. Path Integral (PI) based control methods have recently been shown to provide elegant solutions to a broad class of stochastic optimal control problems. One of the implementation challenges with this formalism is the computation of the expectation of a cost functional over the trajectories of the unforced dynamics. Computing such expectation over trajectories that are sampled uniformly may induce numerical instabilities due to the exponentiation of the cost. Therefore, sampling of lowcost trajectories is essential for the practical implementation of PIbased methods. In this paper, we use incremental samplingbased algorithms to sample useful trajectories from the unforced system dynamics, and make a novel connection between Rapidlyexploring Random Trees (RRTs) and informationtheoretic stochastic optimal control. We show the results from the numerical implementation of the proposed approach to several examples.

Spectral Analysis of Extended Consensus Algorithms for Multiagent Systems
http://hdl.handle.net/1853/53268
Spectral Analysis of Extended Consensus Algorithms for Multiagent Systems
van de Hoef, Sebastian; Dimarogonas, Dimos V.; Tsiotras, Panagiotis
We analyze an extension of the wellknown linear consensus protocol for agents moving in two dimensions, where the standard consensus feedback is multiplied with a rotation matrix. This leads to a richer family of trajectories, and if only the new feedback term is applied, periodic solutions emerge. For special configurations of the controller gains, the form of the system trajectories is given in terms of the eigenvalues and eigenvectors of the closedloop system matrix. We characterize the resulting closedloop trajectories for specific choices of the controller gains and of the communication graph topology. Furthermore, the control strategy is extended to agents with double integrator dynamics. It is shown that stability is achieved with sufficiently large velocity feedback. The effect of this feedback on the overall system performance is further investigated. We finally provide simulations to illustrate the theoretical results.
Copyright © 2014 IEEE; Presented at 53rd IEEE Conference on Decision and Control, Los Angeles, CA, Dec. 1517, 2014; DOI: http://dx.doi.org/10.1109/CDC.2014.7039725
Mon, 01 Dec 2014 00:00:00 GMT
http://hdl.handle.net/1853/53268
20141201T00:00:00Z
van de Hoef, Sebastian
Dimarogonas, Dimos V.
Tsiotras, Panagiotis
We analyze an extension of the wellknown linear consensus protocol for agents moving in two dimensions, where the standard consensus feedback is multiplied with a rotation matrix. This leads to a richer family of trajectories, and if only the new feedback term is applied, periodic solutions emerge. For special configurations of the controller gains, the form of the system trajectories is given in terms of the eigenvalues and eigenvectors of the closedloop system matrix. We characterize the resulting closedloop trajectories for specific choices of the controller gains and of the communication graph topology. Furthermore, the control strategy is extended to agents with double integrator dynamics. It is shown that stability is achieved with sufficiently large velocity feedback. The effect of this feedback on the overall system performance is further investigated. We finally provide simulations to illustrate the theoretical results.

An Optimal Evader Strategy in a TwoPursuer OneEvader Problem
http://hdl.handle.net/1853/53267
An Optimal Evader Strategy in a TwoPursuer OneEvader Problem
Sun, Wei; Tsiotras, Panagiotis
We consider a relay pursuitevasion problem with two pursuers and one evader. We reduce the problem to a onepursuer/oneevader problem subject to a state constraint. A suboptimal control strategy for the evader to prolong capture is proposed and is compared to the optimal evading strategy. Extensions to the multiplepursuer/oneevader case are also presented and evaluated via numerical simulations.
Copyright © 2014 IEEE; Presented at the 53rd IEEE Conference on Decision and Control, Los Angeles, CA, Dec. 1517, 2014; DOI: http://dx.doi.org/10.1109/CDC.2014.7040054
Mon, 01 Dec 2014 00:00:00 GMT
http://hdl.handle.net/1853/53267
20141201T00:00:00Z
Sun, Wei
Tsiotras, Panagiotis
We consider a relay pursuitevasion problem with two pursuers and one evader. We reduce the problem to a onepursuer/oneevader problem subject to a state constraint. A suboptimal control strategy for the evader to prolong capture is proposed and is compared to the optimal evading strategy. Extensions to the multiplepursuer/oneevader case are also presented and evaluated via numerical simulations.