So thank you for
hosting me great pleasure to.
Give this talk in.
My city living the program and the people
who are involved that's a great.
I'm very happy to be part of it and.
I think that it's going to be one of
the strongest programs in the country.
So I'm very happy to give this lecture so
the title of my talk is
the science of put on me.
Happy some of us is among
controller learning in physics.
And.
Autonomy is a field
that involves a sensory
is very is a very active feel very Can I
be by proxy about the phone on the right.
There are many fields
that are involved and
so when somebody talks about
science before dawn on me.
There are.
Some logical questions that arise
swatches any science right and whether or
not we can think autonomy is the same do
you think this appearing by itself OK
and since I have a good record
of things I'm always looking for
some good word to appear on my title so
this is the group
work very great work here who are living
together having controlled living and
physics and the happiness is
happening I think words because.
As it turns out all of these this
appearance controlled learning in physics
have their own tradition right and
so sometimes there are interesting.
Interactions between people who are
working on ina controlled side of things
or people who are working only
in the lending side of things or
people who are working on the on
the physics side of things but
what I will try to argue in this
presentation with of projects.
I have been doing.
Since I came to Georgia Tech is that
as far as I'm concerned it's all one.
Of mathematics and whether or
not you name it control or
physics of learning that
is mostly a matter of.
Comfortability right we like to boot.
Labels on different areas of science and
engineering so.
I would like to acknowledge.
First of all my students
I have been delighted
to work with very smart students and
I have been delighted to.
Many times together with my students who
are on projects that we have been doing.
If you want to do interesting he says
I feel that failing is a requirement
it's so.
I'm really delighted to have the
opportunity to work with all of them and
then I had some great collaborators
here and great students
working together with other prefer
sort of I'd like to acknowledge its.
Jean-Marie and Biorn for the extensive
collaborations that we had and
then Panos and Magnus and
this is my movie from Emory University.
And the funding agencies so I would like
to acknowledge those who are funding
agencies so my interest is in
the area of decision making and.
If I had to describe what
are the some important tradeoffs in
the area of decision making I would
pull out I would have these graphs
I would have these two axis and
so you mean.
In the in the.
Y.
axis.
Let's.
Set the we have seventy.
In the representation of the system but
we are ready.
For the environment OK So
let's start with having the highest
levels of instead of the.
Here and then as we go up we have
the low levels of our set to right and
then on the X.
axis.
I will.
Clays the time scale of optimization or
time scale of interaction of the
controller with the actual system right or
the decision make it
with the actual system so
this will be a little bit more clear
if I give you some examples so
let's say brain for
smiling is part of this is your making.
Has the model beside the models but
let's pick the more extreme side of
enforcement learning which is model free
where would you please ring for me well
I would place enforcement learning.
Here because essentially in
order to be able if I have no
no knowledge of the environment
any day now I'm sure the system.
I would have to interact with
the system and the environment by
playing future executives on a System
few is something that we can
discuss how many but
the learning here would happen in
a very small timescale right so you have
to assess who work with projectors so
here is one example of a robot that has
to learn how to rotate the stuff right so
that's a tofu with with
a name you in it and.
There is no model of the physics of the in
there action of the robot with thought for
the TOEFL with the environment for
there always has to learn how to rotate
the store food by just trial and
error there is a reinforcement
learning algorithm a sensory
call legs are the words from this
interactions and learn say policy right so
obviously learning interacting with
actual system and making decisions is
slow here because of the fact that we
don't have knowledge of the dynamics
in the physics of their
robot with tofu and
it over with environment right and
then you have the chops extol And
obviously here you have to be
able to be smart to initialize.
To learn from them that that is another
key work that you have to be able to start
with some initial polish and then optimize
these bonds then as you move more into
model based things right once you have
some understanding of the environment and
the way how the system that you
are considering in their lives with
environment then optimization standardly
can become fast you can make decisions
in a much smaller timescale obviously
you have to worry here whether or
not your optimize it can meet the serial
time requirements but given the fact that
many of the algorithms that we have
right now can be bought allies we can
use Pilot computing in order to be able
to make these decisions you know to apply
model predictive control and
then we can be very predictive right and
we can be reactive to right so there is
this indistinct but it will from here.
That has to do with what is
the timescale of optimization or the.
Uncertainty and the end there are three
points that I would like to make
the first point is that.
The first point is that there is
a tradeoff between our set at the and
time scale of optimization and so
uncertainties plays a very important
also that Waco we are going to represent
it is also important here to here these
were living can play a very important
role right in the way you can represent
uncertainty either in the environment or
in the dynamics of the system that you're
you're you're considering and because we
are looking for elegant solutions and
I will explain what I mean elegant We
would like to have a unified approach
right that can actually do both go
from reinforcement learning to M.P.C..
Now typically you have only these two.
Axis but there is one more axis
here with I recently added in this
is the axis of the spatial scale so
this slice here is a slice
of robotics systems that operate and
they can be described by so
costly for cell equations or
ordinary differential equations but
their ideas that I'm going to present here
are actually scalable to systems that.
Can be represented by stochastic partial
differential equations right so this sort
of systems that operate in smaller
timescape if you want to be able for
example to be present nano particles you
will have a different presentation than
just getting all of these or is these so
we are looking we are interested for
decision making principles that can
actually scale or they can actually carry
over to two systems and representational
systems that go outside the typical
resume of ordinary differential equation
or so across a differential equation.
So that's I have one more slide and then
I will dive modding to the algorithms and
these are this is more about
that with the model of Z.
and I would like to be up front from the
second page here from the second slide and
see and say these are the two people that
really have inspired the work in my lab
and I like to explain why OK
why these two people have
inspired us and I believe they
inspired many other people too.
So the first is fine man right
physicist and in one of his.
Physics lectures back at Cornell I
don't remember the exact date
he say's how do we go about a new law.
And then he starts by saying OK so
we will first guess it and
everybody's laughing in the room right now
it's impossible to just guess at low but.
That sub that you may
have to do then you will
compute the consequences of
this of this of this lot and
then if you're going to compare it
with an experiment all right and
then he says if he disagrees with
experiment is wrong and that simple
statement is the key to science he
doesn't make a difference how beautiful
your guess is it doesn't make a difference
how smart to make the guesses or
what his name is if he disagrees
with experiment he's wrong and
that is all that is right
that's a very solid statement
coming from a physicist right and
I can take the statement
a little bit more towards the people
who are doing or have been doing.
Controlled.
This but.
They have not really taken their Then
they're furious all the way to combining
them with with an experiment not
all of them but a significant part
so then there is this other fellow
here again this Nobel Prize
who position himself in a different
way he say's what this is basically
striking energy marketable is that in
fact that mental physics a beautiful and
elegant theory is more likely to be right
than the theory that these inelegant
and in fact he has experience that.
He talks from experience he say's in nine
hundred fifty seven some of us put forward
a complete theory of the force in
disagreement with that is out of
seven experiments OK not one not two
seven experiments it was beautiful and
so we did to publish it believing that all
of those experiments must be wrong and
in fact they were all right so
it is the more logically you
will put these two opinions in
in in in the two sides of the spectrum.
We should do don't know me but
I think they are both right and I think
that they only difference is that and
at least what characterizes
us is that in the SO
time scale of things we are looking for
the most elegant theory.
The most beautiful mathematics but at the
end of the day we will always like to take
this mathematics way down to an experiment
so they're both right and we should.
Know about this I just
wanted to side with you.
So the outline of my talk is going
to be follows I'm going to start by
explaining some connections interesting
connections between information theory and
the cost of control and then you hear
how one can actually do it I've used
a full algorithms for systems for
the classic systems and
then we will take this idea all the way to
different cases we will try to apply it to
the case but we have multi-agency systems
then we will consider the case of possible
observability perceptual control and
then I have I can give anything a lecture
on how you can reproduce an unsettling but
I just want to give you a tour of
the projects that we have here and
then some some future.
Ideas on what is coming next in my lab so
OK so let's start with
the following The following two.
Quantities here so
the first one and I apologize for
the computer scientists in the room
because I'm going to be using for
now expectations by using this measure for
a tick.
Notation so this in the Go Here is
an expectation of whatever you have
inside this exponential and
you can think about it as a course
there is nothing that I can now in these
two equations that against the control
right this is just an expectation
of a general measure.
Probability can measure we are going to
call this want to hear the free energy
because it turns out that if you just
open a book Inside the school physics and
you see the form of free energy
it has the same four and then
we are going to work with another quantity
here that is the generalize entropy.
Well if you forget these minus and
you just make the.
Cancellation we think that the piece this
is nothing else than the black lab or
they virgins between the two probably
the measures couldn't be there isn't
why I'm reading it I'm reading it
in this for is because if you pick.
Some constant then you're going
to go back to sign on entropy So
that's the reason why this is
called generalized and OK.
So you can prove.
In a very.
Very few steps that there is a.
Relationship between the free energy and
the relative entropy and
this relationship is given
by this inequality so
this free energy bounded
by this expectational of J.
affects this was a geo fixed but
now concluded
that it different probably measured and
now here you have this entropy there.
And it turns out that there is
a representation that is anything but
patient while things that this go physics
say took a free energy
smaller than equal to warp.
Times generalize entropy
and as I said you can think about
this one of a row as a temperature
you can think about this as
here this is a generalized.
And what you can do is you can
say OK what is the measured Q.
of the optimal measure that if I was
substituting back into this expression
I could make this make while
going to be an equality so
this is optimal measure is the measure
of the thermal dynamics which
means that this is the case where the
entropy is going to be maximized second go
from a dynamic say that entropy
maximizing So in that case this.
Inequality here will be reduced to
will be minimized and will be equal
to the free energy so there is nothing
about control up to this point and this
is not something that I have discover but
this is very simple right very simple so
now if we want to talk about focus the
control you can start taking measures P.
and Q.
and associate them with
paths generated by focused differential
equations and in particular.
Let's again get to the same
fundamental relationship and
now we're going to take this P.
and this is going to be
a path measure a measure that
essentially it presents that executive
generated by this the Class B.
for some equation OK And then we were
going to get this other measure of Q.
and we are going to associate
this measure Q With the Class B.
for some equation
the difference between the two
is that there is the control you
hear here there is no control
this is an uncontrolled diffusion
because the For Sale equation so
now these probability measures
are probably measures over the past space.
So it's an entire path could be an event
that you sample all right and so
now the question is if I associate
this probability measures to this
across the prism then what is the meaning
of the school by club are they virgins and
then what is the meaning of this
free energy because right now.
I have make the connection between these
measures across the peninsula questions
for class and then I'm going systems
based one more step before I answer these
questions there is a step
of associating these J.
right now with at that exact thirty and
with a course that he is going to be
evaluated along the subject of the course
has a terror state coast and he's going to
have a running course that's going to
be also state state dependent right so
you have a state that is there to you plug
it here you're going to get the value
so then south that this.
Entropy here is nothing else
than a quadratic there and
with respect to controls integrated
over the time or eyes or
of the sample that exactly so this is
a function that we typically happens
because the controlled right would like
to optimize quadratic control functions.
And then.
The question is OK now this looks like
it's across the control optimization
problem the question then is what is
this bond here we know it took us to
control that dynamic programming
will give you the global solution so
what kind of a is this what kind of
bond is this how is it different from
they nomic programming in which case right
so this is you would agree with me that
if I applied this jail fix here I
have a course that I will evaluate by
sampling still pass to
the financial equations OK And so.
What you can do is now
you will take this the I
will take only this part
right this expectation and
I will associated with this letter
five and now there is this interesting
that this girl physics is called
the cat's name and the fireman cuts.
What we does is to create connections
between the cost difference
equations and expectations and
partial differential equations so
in fact what happens is that for
any set of expectation and
is the that you're using to evaluate
an expectation you can find a P.D.F.
that they represent this expectation and
vice versa for the P.D.
especially if it is backward and
you can always find an expectation and
it is the that if you sample forward
you are going to show the P.D.
with with with sampling so the statement
in the frame and cuts to my if and
only if it goes in both ways so then what
you can see what you what you can so
is that this here will satisfy
the buyer got some a comma go to.
Remember that I'm sampling for executives
to generate and then I'm plugging the X.
inside the J.
and I'm something of the sort of zip that
is based on the uncontrolled a nomics
because this by the P.
I associated to Path generated
from uncontrolled a number so what
we have is by what some of the question is
the drift of the uncontrolled a nomics and
then we have the diffusion of these be
part of the multiply as the noise so
it is a specific way to get these P.D.
is not any.
But I'm not interested into
this fight Actually I'm
interested in the entire free
energy I'm interested on this.
This is the lower bound and I know.
The Globe a lot more solutions because
control is given by dynamic programming so
then I'm asking again the question What is
this what is the connection between this
expression and dynamic programming so
all the have to do right now
is essentially the read I have
the speed with respect and
I know that there is a lot if make
a transformation between five and and.
Then if you do that then you end up
finding something that is very popular
a preview that is very popular in
stochastic control this is the Come to
think of partial differential equation so
this side essentially will satisfy
me come to circle people on the question
disks I will become involved function so
we derive dynamic programming we deride
we show that disk size will satisfy
the dynamic programming principle without
using then I make programming arguments.
OK just by using measure theory and
some properties of of.
Of the firemen it cuts.
So that's very interesting why because now
you have a way to approximate the value
function just by only sampling and
sampling from the uncontrolled dynamics
OK that is what the math say's but
unfortunately if you go and
you try to approximate this with something
that's a nightmare that's a free
energy and people have been doing.
Research in terms of how you can
evaluate free energy as this is a hard.
Quantity to compute but
nevertheless there is beauty
right because that's an outcome of
the point of view of looking into.
Focus the control so what we have
them he's essentially to start
with is creating the relative
entropy duality inequality and
then we follow the steps we apply
the five minute cuts Hema going from
expectations to P.D.'s then we do that is
by what some are going to go out of P.D.
then we took a look at it and
transformation to go to what interested
really this and then we show that
this satisfies a common That's a copy
Belmont differential equation and
therefore it is a value function OK.
So instead of south that there is an outer
I think way of looking into this problem
which is stopping completely from control
theory you can start by this time
the optimal control for the nation right
where you have a course function right and
you want to minimize it with respect to
controls there is no measure theory here
right so then what you do is you did I
become that there could be a bell none
of the first an equation and
then you take the exponential plus
formation right now you go
in the opposite direction.
You're going to have the by put some Acoma
go to meet the you're going to apply
the FIND MY GOD SO
you mean the opposite direction from
P.D.'s the expectation and he can so
the same bomb you can show that he's
going to have the form of free energy.
So this attitude two different ways
of looking into the same problem but
now the question is.
The question is.
There is an overlapping between these
two different ways of looking into
this issue and making the question
is that we I think we so
this overlapping is true for
the class of differences.
That are in control and noise.
And obviously the big question is does
the is this is this is this connection
General can I generalize it can I take it
all the way to systems that have
different focus this it is.
That's a very valid question once you
realize once you understand this picture
OK but there is also an I'll
go to benefit that comes out
from this and I'll go to benefit
is that this information
authenticated presentation gives
you more control in pieces.
It doesn't give you
the optimal control as you.
Function of the gradient of of the value
function it gives you the optimal
control implicitly what I mean by
that this measure here is the measure
is what that executives would look like if
you had built more control and you apply
more control back to this because people
are so equation and you were something at
least exactly this piece that
executive should satisfy this measure.
And this hope to measure is actually
very examiner in these measures of
heretical presentation it holds for
general class so focusing systems
can be applied to jump if you processes
it can be applied to doublets the classic
process so more more more focused
this is it can go all the way to
infamy the missional stochastic process
because it's all measured thought it.
So now.
We're going to be using this to do in
France this is where my seat let me will
count right and all obviously must
lead it will be important for
the case of the representation of
the actual surpassing process but
it's also important to hear that so
we're going to be using this measure
this optimal measure and they wake up
we are going to do it as follows so
now I just thought you might be a gave
me a fundamental relationship OK And
now I have these two is the ace
I'm going to stick again
with stochastic processes with
we're not nice right and I am going
to show you at the end of my lecture
that you can do the same thing for.
The casting process not also classic
courses but for the big class
of the classic process the problem is that
you're going to lose the connection with
Dynamic Programming in this case what
you're doing a sense of is not focused.
Because the control in the sense
of dynamic programming but
you had the control and so
now we have this uncontrolled a nomics and
the control dynamics and now since we
have this measure what I'm going to do is
I'm going to put I'm at that eyes my Polly
see you I'm going to take control and
I'm going to but I'm at that eyes so
now I'm going to be looking into
Parliament that I suppose she's right and
I'm going to try to force
this measure induced
by the by I mean that I have control or to
be us close as possible to the optimal one
they want that is provided
by this inequality there is
a fundamental difference between looking
to the classic problem like this and
looking to suppress the problem
through they nomic programming and
they now mean programming the form of
control emerges from the optimization
but you need to find the value
function here the form of optional
control is pre-specified up an audit
because you have to but I mean that Isaac.
Right but
then I'm I'm trying to push this but
I meant that I had control of the be as
close as possible to the optimal measure
which relates the dynamic programming for
specific classes of the classic systems.
And so it is here is a very.
Simple but I mean there is
a show that we have been using
just essentially having it that
exactly a controlled executive.
And optimizing within spec to
the value of this control.
And this is going to be a lot of the
important sampling that we will have to
use because sampling in the missional
States basis from this is is hard
in particular to evaluate to
evaluate the scale divergence what
I will meet is I will meet their issue
between they did start and they did Q.
based on the definition of equal but level
of divergence right this is a ratio that I
need so
I will have to plug his rhetoric here and
take the creating of this
expression with respect to you but
I don't have that explicitly what I
have though is I have the optimal
measure that is given by the fact that
I have minimized this expression and
I do also have that I do not go in
there even if between the control and
the on Control day not mix which
is something that I can open
a book in stochastic calculus and
I can get the expression.
They've achieved and it is so that I can
evaluate this they give out the of on
sample that of on samples of objectors so
this is something I can actually compute
So if you multiply this right you're
going to go back to the study Q.
This is the change in the chain rule for
the for the case of of measures and so
then you do that you plug
this to the EVA Dave's and
then he will take the gradient with
respect to you and you're going to end up
finding an expression for you that there's
nothing else on it just an avid aging
of the noise that you used to
generate the sample to exact it is
weighted by an exponential of course and
this course will be this J..
That is one more extra step
of importance sampling but
we have to do with I'm not showing
here but essentially schematically
this is what happens you sampled
exactly this right then.
I store all of the noise profiles
of this project that is and
I will average them to get their optimal
control sequence and they weight it
weights in the average single be
the exponential of the state First
you find the off more controlled sequence
that's going to have the same time or
eyes on your plate just only the first
part and then the sample again but
you can carry over the control
sequence that you found from before
you will use this idea but I don't I
could invariably begin with is this
important something good quality important
or likelihood ratio depending on which
community you're coming from and
then you can essentially.
Repeat this process so the process
becomes something like an N.P.C.
process a sampling based book
Rafiq optimization method
right now this process
can be used also in.
Setting right if I don't move
the system physically but
I'm just sitting always on the same
state and I'm sampling always from
the same state I can be model free
I don't have to know the day nomics
of the system because the dynamics of
this is them do not appear in the actual.
Control equation.
And so here is what we have done
with the system with this book I
think policies we work here with with we.
Collaborate with James Ray
group on this project and
that's a very good for us to demonstrate
the ability of our algorithms
all the competition is on board here
there's a G.P.U. the localization of
the vehicle is with a G.P.S.
right so we have a fully observable.
And the cost that we're using for
this track is that there's only it is out
of the last visit is out of the last three
and of course we have cost in terms of.
Keeping the boundaries but
we don't have an explicit state
course that we have to track.
And so
the vehicle will go around the truck and
it will do very aggressive and
I drive maneuvers of course we
have done some system on to defeat cation
we have identified the dynamics of
the vehicle in order to be able to
generate samples of executives.
And I think that.
We were able to go very far in terms of
how much we can produce the performance of
the vehicle with face stochastic
based of my ization techniques
then when you get something out in
the scientific community people
are getting sometimes upset or they try to
find out ways to criticize your work and
so one critic of our work was
that we over to him or over.
To the small truck right but
because here we had a Georgia Tech and
we have a good colleagues like.
Cerberus and Magnus who we
were able to have a new truck
a bigger truck with this was
very educational experience for
us because now we can test our
audience for a long time and
we can actually see behaviors that we're
not able to see on a smaller truck so
then we move to the new truck and
now here I have two videos and
I want to show you what what is the impact
of the model if you have a data model.
What what can really happen
with his with his method so
you need to have a good model you're right
you need to do your homework you need to
have some system idea but obviously
this model cannot capture everything so
there is some robustness level
that comes from the fact that we
are sampling projectors so
let's see what happens here right so
here is a situation in which we have
a good model of the vehicle and
this is where the model is not
that great and you will see
that there are some interesting behaviors
here it's actually a funny video.
So there are two stead of us is right and
the vehicle is getting pushed around from
these disturbances do we model
all of these disturbances no but
again the fact that we
can sample allows us to.
Be a little bit robust and other people
will go in the opposite direction that.
So it is important to do your
homework right in terms of system
identification but.
At some point it's going to grow so OK so.
We have still.
A long way to go here right
in terms of robustness and
since I am in aerospace engineering.
Space Engineering We have many
safety critical systems so
it is important to be able to robustly
fi the performance of this algorithm so
there are three things that we
have been doing one is essentially
working with robust sampling
based techniques this is
some form of I will talk in
the next light that is more work
on them assuming side essentially
incorporating online adaptation so
the first bullet will give us robustness
the second bullet will give us more
more performance now we don't
really do any online adaptation and
their third bullet is really going
outside the paradigm of we need noise and
Gaussian noise and working with
representations that allow us to capture
suppressed history that is beyond
the case so classic courses fusions or
all of the processes approaches that
we are really interested in looking
identify them based on data and
use them in the existing framework and
here is some preliminary
results on Cubase M.P.C.
this a paper that we have submitted
under review in one of the greatest.
Robotics confidences so the idea is
as follows You have your yard here
you sample trajectories and you have it
you have found a controlled sequence and
now suddenly there is a disturbance right
and the vehicle is getting pushed and
is that actually in the way it's
all the prosecutors go outside.
Of the truck so you have to import them
someplace to use right one important
something will be the nominal one right
because you're you're going to carry this
control sequence to the next to sample
again to executive So the question is
which control sequence you want to use you
want to use this control sequence which is
the nominal one already or you want to
use vis a vis this control sequence so
we are so this is the ideal one and
this is the actual one so
we have two versions of this Him P.P.I.
I'll go away from that who ran one for
the nominal state and one for
the actual state and we will accept.
The control of the call
total solution from there.
From the.
Actual state for as long as it's cost
is good but obviously we need one more.
Controller to push this vehicle
to go back to its normal for
exactly so there are two levels of M.P.C.
up in my ization.
Labor We have two N.B.C.
controllers other are running in.
For the nominal and the actual state and
then there's another victory based upon
my ization technique that who tried to
push the vehicle to go back to the normal.
Projected So this is called Q Based M.P.C.
here we have to combine.
Look at picnics with more traditional.
Methods in robotics and
here is one example.
A You have a system and
you want to go out on the circle and
you want to be able to go around
the circle without really Davey a thing so
here is what happens if
you apply the N P.P.I..
The old version right and now you will see
what happens if you have the two base.
N.P.P. I control or
they actual vehicle will always be
within this bounds we have tried
that on the actual vehicle to.
Some very recent experiments and
I think we were able to go out
on this track which is again a long track
we were able to do twelve I think laps.
And then this is the reason is that
it's also funded by the very very very
lift rotorcraft center.
Here are.
Two cases in weights we want
to be able to land this this
helicopter this is the case where there is
no noise the old version of our I'll go to
with Noyce the old version of I'll
go if and this again with noise but
with IS TOO based robust something
base to connect So what you will
see here is that as you go for a landing
this vehicle is getting pushed a lot so
you are going to Cras Well here you
are going to be able actually to learn so
this is ongoing research I
think that there is a lot of
interesting research questions that will
come from this I have done a lot of work.
Before I joined Georgia Tech on
the reinforcement learning site but
I'm not going to be talking about that.
Right now so
going more to the multi vehicle case
the Multi is in case so
even the Multi is in case.
What you want to do is you want to be able
to control vehicles in a cooperative or
non noncompetitive fashion but
you want to be able to eliminate
the exchange of information
between the two vehicles and so
one way to do that is by using this idea
of best response dynamics or basis both.
Ration which works as follows There are
very very simple if I have to cooperate
with another vehicle I will create
a copy of the other vehicle and
I will assume that the way
the other vehicle will make
the decision is by using the same
optimality principle so every
I will double essentially the state of
presentation whatever vehicle and the only
thing that I will in this way I will
have an opportunity to actually predict.
How they are that a vehicle is about to
move the only thing I need to know is.
Its position but I don't have to
change anything objective OK So
we have done some work on here too and
this is all.
Very Recent this work will appear in.
Two thousand and eighteen and this if we
do in a way it's the vehicle that will
have to go around this track invited
speeds and the only information that
they would have to share it just the
position of the other vehicle right but
all that all the competition is
actually decentralized right so
OK so for the purpose of time I would
I would have to keep this video.
Let me show you something that is more
who this is the case where we want to
do tasks in the no property fashion and
we want to actually race against a human
right so we'll apply the same principle
here it's just that one a vehicle
in this case the first one will be.
Controlled by the human right and the
other one would just pass is actually our.
Controller so this has some
preliminary results not experimental
results that we have right now but
we are very much looking forward into.
Improving what we have here in terms
of algorithms theory and experiments
OK so the human will pass again and
obviously there's a lot of crashing
right this is actually robotics so
you will see next that.
We do cross.
Quite a lot especially when you
have two vehicles in the same track
so for example in this case this
is our vehicle this is the human.
You know this crossing it's very hard to
get course functions here to optimize it
right because there are a lot of
criteria but you have to meet so
we believe that these ideas of to base
N.B.C. will actually fit very nicely
into this racing scenarios
because we have ways to abstract.
Object if.
Yeah so and
then some more work on the simulation
site I'm going to skip this people asking
us if the framework scales we have this
video here one hundred forty four states
we use the same something basic needs.
And we want to control all the vehicles
this is all centralized we think about
these vehicles as one big system.
And there are some other work here but
unfortunately I don't have time to talk
about it let's talk about the perceptual
control case we do a lot of M.P.C.
in my lab so.
This is a classical that we
have in terms of the autonomy
stuck right you have some sort of trick
that if you do M.P.C. and systems but
since that task is actually repetitive
it's actually in some sense C.
to have to resolve a problem that you
have generated all of these data you can
use these data in order to
learn a policy and so that's
one thing the other thing is that we want
to be able to do these tasks in a G.P.S.
the night environment when there
is no G.P.S. and we have a lot of
data from all of these experiments we
have a lot of data that were created so
in fact we have a sense only but
also observations camera us and
will speeds and then we have what
they control spills out in terms
of Fraggle and theory right so
what you can do then and
this is in collaboration with Byron is
that you can use this data to learn a.
Policy and this policy because we're
dealing with visual input we're going to
use a convolution on your network who have
couple of millions of neurons in order to
be able to learn from just throw
images and will speed the throttle and
sitting so that is the benefit of that
is that if you can successfully learned
responses you don't need to have
access to G.P.S. any more and
also you don't need to use the G.P.U.
in order to do OK my ization right so
you can essentially use the G.P.U.
for other purposes so
this is what you find this with
mapping Now this what these videos so
is that yes you can do this and
the vehicle can go around the track and
this is essentially that the picture
of that that that we have been using
a experiment to learn this
policy takes about a day.
All there on the truck right so it's
not easy to get these experiments done
on an actual vehicle and
have training the neural network.
Outdoors but people ask the question
does it generalize right
because it may just only
have memorize the same so
here are some plots you know
an effort to address this question
here we so essentially we have
a way to map all the high
dimensional data from the visual cameras
from the cameras through to the space and
we have the train and the test and
this is the case of imitation learning and
this is the version of imitation and
what you can see is that four
wheel speed the test and
the said are on top of each other that is
not surprise because the signal is
actually very low their mission but
when it comes down to when it
comes down to actual visual input.
You see that the training test is
not the same as a test set and
there is a way to see that also in
demonstrations the vehicle can actually go
around the track even if you
change the lighting conditions so
if it is close to night OK so
there is some level of generalization.
So.
The other thing that we have
been doing is instead of
actually going completely end to
end completely end to end right
we want to be able to actually put
some structure in your network.
Architectures motivated by the fact
that we have structure in.
Decision making in these strategies
any decision making would have to
have a course function some optimizer and
they now mix and
this is another project in
weeds essentially we learned.
The course map just based on raw data.
Images and
we were able to perform the same task.
And so now one of the questions
that we are trying to address is
which architecture is better or
more optimal What does it mean to
compare this to you it on network.
Architectures and we think that we answer
to that is in terms of free things
one is the performance of a car well it's
one of these methods is going to work
the other one is robustness and
generalization and then the next which is
the more important for us is efficiency
of the underlying training I'll go to and
that is something that we believe
there is a lot of food in
the can with a nomics look after that with
a nomics that you can use to actually
compare the fish and
see of the training on these two.
Architectures so that's ongoing greasers.
So I think I'm out of time I just
want to show this plot here and
this plot is motivated by the fact that
all of my background is actually C.S.
and E.C. but I'm in an aerospace
engineering department and so people
talk a lot about uncertainty right but
there are different ways to be present.
And save the day that the that
come from different communities so
in the aerospace engineering community.
It was very very popular is very very
very popular right and it is a way
to be present and said that the especially
if they are said of the east but
our method but then on the muscle building
site and more on the computer science
science side of thinks you have these data
driven methods right you have none but
I'm a big metal semi but I'm at
the metals and you're on the rigs and
pretty much in this forum to have years
at Georgia Tech we have done work on
each one of this blocks here I just
want to show you one example that these
are related to the unit that I show
you before we enter in policy he never
space systems you want to have backup
systems right you want to be able to.
Have backup systems that make
the systems to be safe so.
The question is how can we detect
that these polices will be robust
how we can detect that these policies
that nothing follow with these policies
that's a very fundamental question
to us and so let me show you a video
so this is a video in simulation of
the vehicle going out on the track and
this is the violence that comes out as
an up with from from then you're on a trip
OK So this is this is the control so
that you don't it will spill out.
Control and
divide about that control estimate so
what so so so so what happens is
that if you present to the network
some new data that it has not seen before
you get a spike in live audience of
the control of the control policy right so
that is then
a way to say OK then you don't network has
seen something that it hasn't seen before
should be I should be able to go
back to a expert in this expert for
us is going to be an N.P.C.
controller that's going to be fully.
Observable So the question is therefore
how do you how do you how big
this how big this variance should be
how do you find that threshold and
so there is a control
through the question here and
there is a reinforcement learning question
here you cannot there is this question
through reinforcement learning by
doing learning in order to find peace.
But there is also they control flow to the
point of view of looking into these things
which I doubt that is going to be able to
come up with a general theory that would
solve this problem for
any system and for any task right so
we have done something like that here and.
This is a paper that has gotten to.
Review So
let me skip and go to the end there's
a lot of work that we have done on.
Uncertainty estimation I think I
have another five minutes or so
so what is next and I know that
some people are in the audience are really
eager to see this light of what is next.
So once you understand vis
a vis this this connection.
Then you can ask many questions so
one question is out of any information for
a tick problems that have
no dynamic programming or
presentation and vice versa is that any
They nomic programming representation of
the class to control that there is
no information for a the equivalent.
With are two different questions they fit
question is the there's another axis here
can I take this area of overlapping
between the do disciplines and
see how much I can push it forward
like are there any other systems.
For with this this connection is true and
the as it is yes the there are cases in
which he can go beyond his the A's B.
just across the financial equations and
he can still so that's the classical more
control theory and information theory they
still collapse of the same solution for
a more general class of
the classic systems OK.
So just an example here you
have a jump to Fusion right and
you want to control the jump to fusion so
now I don't have only the winner noise but
I have also the noise you're right and
the noise is going to be
actually very general in in
the sense that there is no
neophyte in great when there
is no only a spike but
also the absolute get over that Spike
might be around them viable too so
how big that Spike is going to
be these are called Double X.
the graphic processes so
you can actually do the same thing he
only thing that you have to worry about is
how is that I don't I could in theory but
leave he's defined between the
uncontrolled end and control dynamics but
once you have this expression you can
plug this expression back to the K O that
bridges many ways ation new control this
is the case where we treat this because
this is something bad right you want to
just reject the cost of the settlers but
there is another point of view of looking
into this problem how about if you want
to control this because these say
that the rate of the process is one
of your control parameters right let's say
that you are interested in neutral morphic
forms of doing stochastic control so
in these cases you talk about spikes and
the thing that you can control
there is they're fighting very.
The same principle can apply here
he can use again this information
political presentation and
you can do you drive control that you
can actually apply on an actual system and
then more on the data
they not be programming we have
done some work the last few years.
With promise to give this idea
of a nominee enough time and
got three months this
is an idea that goes.
On the direction of of
the NOMIC programming.
This exponential to us formation and
this doesn't work.
For so there are interesting
problems in this area
that cannot easily be cut through by
the information for the nation problem.
Which is typically used in aerospace.
Applications of the past to control
just because of the fact that
you don't want to use
a lot of thrust right and
so by that I will thank you and
I'm happy to take any questions you.
Can ask me any question will be yes.
So there are two forms of
uncertainty writing it on
networks as far as I know one is their
little toric and the other is that.
I'm seventy.
Percent of these what about.
If I recall is what about.
Lack of knowing the environment
is one about like zero zero or.
Beta.
So.
In theory I think you can get both but
the question is can that
can the operation of the.
People form in real time all right so not
something that we are still working on.
I have no answer to that
you know we just I mean
the work that we have been doing right
now is for the last six seven months.
On this I.D.S. it will be great if
you can actually become them and
you may be able to decouple them if you
have a loss function that looks like.
Maximum likelihood.
Loss functions.
In which you have also the Sigma
that you try to come up with.
But addiction about these variants but
I don't have a.
That I can say here is an algorithm and
it can scale and it can work for
many many many cases.
People ask me all of the above
we are new into this.
Business how to question.
Great thank you.