so when you get an invitation to come to
a department that's good you don't say
no and then you think oh wait a minute
I'm not even in a chemical engineering
department anymore so in my case I know
that when I used to be in a chemical
engineering department and people in my
area would come and talk then all they
would do is this talking mask and it was
really really annoying so apologies to
Radiohead but that's all I do is I buzz
like a fridge right and so do my
collaborators so what I'm gonna do you
have your option now you can walk out
and I won't get my feelings hurt look
what I'm gonna try to do is mostly talk
in pictures because I know if when
you're looking at a mathematical area
that's slightly different from something
you've seen before it's a little bit
challenging to get the symbols really
fast so this is work that is primarily
done by Simon Olufsen he is getting his
PhD in computer science and my
collaborator in the machine learning
discipline is dr. Mark Dyson from mark
is also at Prowler i/o that's where he
is right now and then hopefully
depending on how many questions I get
asked watch how quickly I'm going I hope
I get to mention the PhD work of
Johannes pheeba he is working in
collaboration with Anessa Sileo who's at
leveraging research so I really like
this kind of research where basically
all the students I work with also work
with industrialists and we try to make
sure that our problems are quite
relevant in general you all were really
very kind to see me still as a chemical
engineer
I started in chemical engineering I am
trained in chemical engineering and what
you can think of is there are
applications that my lab has in
manufacturing systems biomedical systems
etc we also make limited contributions
in mathematical optimization but mostly
we're just between and then we're happy
to be whatever anybody's going to call
us
to be um one of the things when working
with computer scientists and when I
train computer scientists is that we had
better make everything that we do
available so what I'm going to be
presenting today is three different
papers but already on github are is all
of the material that is necessary to
reproduce the papers hopefully from a
chemical engineering perspective this is
useful for specific applications also
hopefully from a computer science
perspective the computer scientists just
want to beat us right you know they want
they they honestly don't have enough
problems right they don't have enough
new and interesting things to look at
and so they'd like to possibly use our
test sets so what I'd like to do just to
get us all on the same footing is start
with a very brief introduction to
Gaussian processes now anybody who works
with the new book of Allah or with AJ
can check their phones for the next ten
minutes because I know you all already
noticed but just as a sort of brief
introduction here it goes okay so in
Gaussian process regression this is also
called creaking in in other literature I
call it Gaussian process regression
basically because I work with this
machine learner mark doesn't mine what
you're doing is you have a set of
observations that is corrupted by noise
in some way and so what you assume is
you have these inputs X they can be
multi-dimensional and then you have an
output prediction which is f of X this
could be our mechanistic models or
something like this then you have a sort
of unexpected error and we are assuming
that this error is going to be normally
distributed in this case in this picture
it's with zero mean and then the the
variance
okay but what you might be able to see
from this picture first off is when
you're close to data points where you
have already taken a measurement you're
more certain of what your function is
like and when you're farther away then
you're lesser so this just makes some
amount of intuitive sense and indeed
what the Gaussian process regression
does is that it says we have some sort
of predictions we have the data points
that we have measured and then in gray
what I'm showing is for every point X
this is two standard deviations away
from the normal that are the mean that I
am XI so what this basically can tell us
is that the closer we are to the data
points the more we trust what we've seen
before so just as a quick illustration
for how this might happen in practice I
have my prior belief about the function
now this is one of the ways that I think
chemical engineers can really contribute
is that what a computer scientist will
often say is they'll start with sort of
prior mean zero or they'll build
normalized functions that their prior
mean zero what we might like to do is
use some sort of information that we
already have about a function and use
that as our prior information or our
prior belief then we also have this
special covariance matrix K now when
we're building the model when we're
training the model when we're learning
how the function looks what happens is
that we start off with having these
functions that are possible within the
our set of prior beliefs right so these
are three functions that are pulled at
random from the prior and as expected
they're mostly within the gray lines but
then I told you it was only two standard
deviations away from the bold line and
so as expected sometimes we have some of
the functions going outside of the gray
lines
this makes a lot of sense now what we're
going to do is we're going to take
measurements and when are measured when
you have a measurement and we have a
sort of realization of what happens when
we have that input then we end up with
now our prior conditioned on more data
that's just gone in and this is our
posterior belief about the function so
as we go along we can develop a better
and better approximation of the actual
function everything's going great now
there's a couple flies in this
particular point meant just because I
mean this is a computational problem so
if I don't talk about trade-offs and I'm
probably lying about something or other
right and in particular the thing that
limits these particular Gaussian
processes it's hidden in this k plus the
the noise variance term the identity and
then the minus one term so basically
taking the inverse of a matrix is the
thing that sort of limits this
particular sort of method so what is
happening is that if we take lots and
lots of data points then we are
inverting larger and larger matrices and
so what you might imagine is that many
of my colleagues in this area mostly
they're studying matrix inversion right
which is which is a little bit funny to
be studying in 2018 but but there you go
okay so then I can also pull
representative functions from the
posterior distribution okay
so why do I care about this particular
class of a sort of machine learning
method well it actually been shown
several times over the last ten years or
so that this is extremely valuable for
Chemical Engineering now about ten years
ago now professor Grossman at Carnegie
Mellon was showing how to use this kind
of stuff and honestly been able cavallo
who's working here in this
apartment made a lot of really big
contributions in this area I mean
there's a reason that she gets to to
come here right is that she's sort of a
star in this area so what happens is
that all of these authors have shown how
do we somehow hybridize the the sort of
functions that we know and love as
chemical engineers right things that we
can explain with data-driven models
right so this is sort of the first set
of contributions that my group actually
does in this sort of area but we're
really excited about it okay so the only
other one I should mention just in
passing that a lot of people work with
is this Bayesian optimization in
addition to predicting a function I can
also optimize it while I'm predicting it
this becomes important so what is it
that we've done and what are what are
what are we excited about the first
thing that I want to mention is design
of experiments for model discrimination
so we all sort of work with biologists
or chemists sometimes and I think when
you get ten chemists in a room and you
ask them what is the mechanism for
reaction you get at least 15 answers
right and so I might have people in
pharmacokinetics who disagree with one
another I might have people developing
metabolic pathways or reaction
mechanisms etc etc now the the
mathematical setting of this problem is
that basically we have and expensive to
evaluate system right so we have the
design space that's the input and then
we have the output space that we can
measure and there may be many latent
variables in between here we don't we
don't really know what we assume that we
have some collective data and the most
dangerous assumption that we're going to
make here is that we're going to make
the
action that the data we have collected
why is normally distributed that's the
dangerous bit right with respect to some
underlying function some underlying
mechanism Plus this this this covariance
term now the danger of course is that we
don't have sort of a normal distribution
here the machine learners do have ways
of rescaling in these sorts of data so
we have competing models these are
competing mechanisms and in this
particular setting we assume that we do
not have enough information to
distinguish between the models so what
happens is that there is uncertainty in
the model parameters and basically we
are not these are all parametric models
right because we're assuming that we're
working with chemists or with biologists
who want to write down reaction rates on
these sorts of things so we have these
parameters but then we don't know what
the parameters are and we cannot
distinguish which of the models is true
right so what we're saying is that if
you have M models each has probability
of being true about 1 over them so what
are we going to do in particular we want
to know the next experiment that we're
going to take now this is an extremely
well studied area and what people do in
this particular area is they develop
these so-called design utility functions
and what they do is they take in all of
the previous information that you have
about the function and then let's say
that I've done five experiments or a
five data points where am I going to
take my experiment well I would hope it
would be where the red and the blue are
the most different and you know I would
hope that my design utility function
would tell me to take a experiment in on
the far right I guess so
all we have to do is if we have a
well-designed design utility function we
maximize over over the function and
that's where we can take our next
so that's that might be useful okay so
so these have been developed over the
years for mostly for storing tallying
ability I won't be citing people
constantly just to know that this is
this is a well-established
area of inquiry but what I would argue
is even though there's been so much work
over the years there's basically only
two approaches
there's the approach that is sort of
analytical in nature and there's the
approach that's sort of data-driven in
nature the approach that is analytical
in nature says let's linearize around
the are current parameters that we know
about this is assuming a sort of a
linear approximation there are there are
some people who do different
approximations or make no approximations
but whatever it's basically the same
type of approach according to me and
then what we do is we assume that the
parameters are normally distributed with
respect to our sort of general belief
about the parameters and we develop a
closed form expression for the utility
function basically following classical
literature so this particular approach
has stretches back to Box Hill this
particular approach is 60 years old
right this is a really nice approach and
when you can use it it's basically the
best thing you can do so you get
closed-form expressions we can evaluate
those really quickly and the downside
here is that I said we needed linearized
models that means we need derivatives
right so if this is a an analytical
function we're doing well if this is
like a PDE that we have to evaluate many
times and it's expensive for whatever
reason that might not be so we need a
alternative approach or rather the
competing approach in this particular
area is the thing that's more sort of
followed by the the systems biologists
so what the systems biologists do is
they say well we don't know what these
different these models are so what we're
going to do is we're going
to use Monte Carlo methods to sort of
estimate the the different functions so
we'll estimate the competing mechanism
mechanistic models and then using
sampling what they'll do is they will
determine what is the expected value of
the utility function now this is great
because you can have maybe one person
wants to work with a model made based on
a PDE for whatever reason one person
wants to develop a model in Excel I
don't know what but it's very
computationally costing in particular if
you know what K nearest neighbors is
it's an expensive technique in computer
science you have to do that at every
time step so it's not going to be
practical so what Simon thought about
and the contribution that he made was he
said well why don't we hybridize these
two approaches right so we want the best
of both worlds we'd like to be able to
generate we'd like to be able to work
with the blackbox models like with the
hybrid approach we'd like to be able to
use all of the really nice easy to
compute functions from the analytical
approach so what he does is he evaluates
the models a lot of times
hopefully fewer times than you would
need with the money Carlo method because
one of the advantages of these GPS is
that they do tend to approximate a
function well early on he trains the GPS
and then once he's trained the GPU can
just apply the analytical approach right
so it's it's kind of a teen way of doing
it of basically saying well we would
like to use all of the techniques from
the the analytical approach but we would
like to who use blackbox functions okay
the downside of course so we get all
sorts of nice things now we're dealing
with GP scaling issues and I showed you
right at the very beginning that
inverting that that matrix is not going
to be the easiest thing in the world
okay so what Simon is comparing to is
he's comparing to a number of analytical
case studies so basically how much worse
are we than the analytical approach
right because that's what we liked it
easy
and then basically can we also look at a
non analytical case so what he does is
he's considering a number of these
utility functions it's if you go into
the design of experiments for model
discrimination literature there is a fun
and sort of over 10 years debate about
what is the best design utility function
but of course I don't want to get
involved in and so these are several but
what you'll notice that is different
between these is that you want to
maximise the design utility function so
each one of the three design utility
functions will suggest different places
for your next experiment so we have four
problems with competing models and then
three discrimination procedures so we
have 12 tickets X okay so what time in
dots is that now we have these twelve
case studies and then he takes a hundred
random trials for each of the case
studies and this is the number of extra
experiments needed for the analytical
approach versus the number of extra
experiments needed for the hybrid
approach and across his his experiments
looks like this this is the the mean and
the standard error of the extra
experiments he needs to do and so
basically at least for the problems
we're testing he is able to do roughly
as well as the analytical approach okay
that's great now we need to compare
against this non analytical case study
this is where the the the systems
biologists are making their
contributions so we are looking at four
competing methods and twenty initial
observations
and what's going to happen is that we
took we emailed this guy and we said you
didn't even put your case study online
can you put your case study here we have
it and we ran it and all we required 20
to 40 extra experiments to be able to
distinguish models which was a little
bit concerning for us and our success
rate was terrible so our success rate
was we were almost never predicting the
correct model now success rate being low
isn't horrible right but what is
horrible is when failure rate is high
does it means that we predicted the
wrong model right and so in particular
this method of pussy Ferraris which is
based on a chi-square distribution if
you know that then you know it's a
little more safe it's not failing but
it's not exceeding either and then we
have all of these these times where the
the method just doesn't work so we were
a little bit discouraged and then what
what Simon was able to do is he took a
look at this particular model that shows
up in literature and it turns out that
the model is in discriminable so what
happened is that so in the systems
biology literature they say they're
doing designed experiments for model
discrimination but the monte carlo
method takes them so long that they are
not doing multiple runs of their own
work so they are not pushing it out
there they're doing a few things and
then saying oh look we think we know
which model it is they are not actually
proving which model is correct and they
were posing a model in the end where
their f1 and their f2 are actually
indiscriminate with respect to the error
that is in their parameter so actually
this was just us not being not lying so
at least we're not claiming that oh we
can discriminate models when they are in
discriminable just to check ourselves
what we did is we threw away one of the
two models that we felt was quite
similar and
we are all of a sudden able to predict
at a much higher level okay cool so
that's one thing that we can do is that
basically we can explain what is the
best model and why right so when you're
in these sort of low dimensional spaces
of the Gaussian processes explaining why
is is something that that is is fairly
straightforward so so that's something
that that we like about this particular
method and we're much much faster than
any sort of Monte Carlo method we have
made our code available online it's been
forked a few times and so I guess we'll
see what happens next
with respect to that there are some case
studies that Simon has been doing with
our industrial partner buyer that seemed
to be working well but I think I don't
have their permission at this point to
present what they're doing so okay the
next thing I want to look at is multi
objective optimization so I work with
tissue engineers and when I work with
tissue engineers the tissue engineers
have to tell me sort of exactly what to
do because I am NOT a tissue engineer
but what they have is they have this
bioreactor where there's only three
degrees of freedom so what they can do
is they can pump material through them
through the bioreactor at this flow rate
that's and then their medium is going to
sort of get eaten up after after a small
amount of time and so they're going to
change a percent of the medium every
hour hours so there's three things I can
change I can change the flow rate I can
change the percent of the medium that
I'm going to switch out and I can change
how often I switch out the medium and
then what they're doing is that they are
looking at flow past the scalp they have
this neat setup where they're growing
sort of bone neo tissue and then they
have this partial differential equation
model that they think is well modeling
what is happening in the scaffold so
it's a combination of sort of creeping
flow when you're near
the when you're in tightly constricted
areas and then just sort of normal low
Reynolds number flow now what my
collaborators mocked as they want
everything right so what they want is
that they want to put as much material
in that scaffold as possible right so
they want to grow as much as possible
they also don't want to pay for it right
so they don't want to have this be as
cheap as possible so the interesting
problem that we have here and I think
this turns up frequently other places if
we have one thing that's extremely
expensive to evaluate say like even
doing an experiment in our case doing an
experiment is solving a PDE and then
what we have is we have something that's
really cheap to evaluate which is cost
so cost is really you all I have to do
is is count up how much medium I used
and that's how expensive this thing was
so what I have is I have an input space
so these these x1 and x2 would be my
three variables that I can change I can
put values in and I can get out what is
the result with respect to my object of
functions in my particular case I want
to be minimizing both objective
functions so I want to not be paying
anything and I want to have as little
void space as possible by the end so
I've tried this combination of x1 and x2
I've tried this combination of x1 and x2
I try a lot of combinations of x1 and x2
and what I want in multi-objective
optimization is called the Pareto
frontier so what's happening here is
that that black line to get better in f2
you would have to get worse in f1 and to
get better in f1 you would have to get
worse enough to so that is the
approximated efficient frontier ok so I
want both I'm not going to get both but
I want at least when we asked the tissue
engineers will which do you for her
they said 100 we want to know the whole
thing right we want to know what are the
trade-offs right okay now okay so there
are there is one work seriously funny
three before we got this out the door
but we still did something new so
basically what beneath contribution is
to do is that she's looking at weights
on these parameters where she's taking
all of the objective functions f and
she's combining them with weighting
function and then you could try
different values of the weights and you
could get some convex combination okay
now that's a really good idea it works
really well in some cases there exist
functions for which it it doesn't work
as well and so but there's it's boy
there's also going to exist functions
for which the idea I'm going to present
isn't going to work as well either so
they're they are different from one
another and have different advantages
but what we're going to do is instead of
looking at the scaler ization method
we're going to be looking at the pareto
method good so this is a lot of Max and
so I promise to not to buzz too much and
so I will not boast too much but
basically what you want to do in this
Pareto method is that you want to be
evaluating your next combination of x1
and x2 so that you are driving both
objective functions in our case in the
good way somehow now you could either do
that with improving a volume or
improving this maximum thing with
respect to improving the volume what we
want to do is we want to find this new
point with with the cross in the in the
upper right hand corner labeled Y we
want that shaded region to have as big
an area as possible right so we want to
have that area be as big as possible
that we want to find an X 1 and then X 2
that push down as much as possible into
the
right now the way that this works and
this is this so far is a well-known sort
of thing is that you basically tile the
space with rectangles you count up the
number of rectangles that you're now
going to shade and then the reason that
this looked so ugly is that there's this
probability of Y given X thing that is
sort of my probabilistic understanding
as to whether or not there's actually
going to be an improvement here so the
contribution that Simon made here is
that this it's well known how to do the
expected hyper value volume improvement
on if you have two black box functions
or if you have two functions that you
can write down he's doing it now with
one function that you can write down and
one function that you can't the black
box so that is his new contribution here
the other of the two sort of methods I
don't want to talk about quite as much
basically because it's just a little
harder to explain but basically what it
is is if you're looking at that point
labeled FX one you take the smaller of
the two dimensions with the dotted lines
and you say that's how much my
improvement is so it's just a different
way of measuring improvement they are
different there exist a lot of different
sort of metrics in this particular area
the multi objective optimization people
have developed these over the years but
in our case they're not so very
different at all and so what we're
working with right now is we're working
with a reduced order model of sort of
what our experimental collaborators have
developed it's basically in a low
dimensional sort of OTE that's quick to
evaluate and what you can see is that
basically the plot here more or less
indistinguishable right so in our
particular case these it's very nice
that Simon has developed both this
expected hyper volume improvement and
the expected maximum maximum improvement
but they're not so different what he's
done is he started with these ten
initial
and then his methods are finding 25
extra samples that are sort of
predicting what to do next now if you
really squint you can see that the
expected hyper volume of improvement
happened to sample more points on the
top and then the expected max min
happened to sample more points sort of
on the side between that 60 and 70 not a
big deal right however and what I think
is exciting is that he is doing
significantly better than genetic
algorithms which are commonly used in
this sort of area and in machine
learning what you often have to ask is
am i doing better than random right
because you know you go around and you
develop all these fancy things but then
am I better than just sort of a monkey
throwing darts
he is better than a monkey throwing
darts so these are three metrics of
whether or not your multi objective
optimization method is working well I
don't know which I believe except that I
believe whatever is saying that my
method is the best rate in the first one
I want to be as low as possible in a
second one I want to be as low as
possible in the third one I want to be
as high as possible and so there's no
understanding in the literature about is
there the best metric so I just use all
three because what else am I going to do
right at least we can we can show the
trade-off that way so what's happening
is Simon starts with ten function
evaluations this is both this is for the
random methods for his specialized
methods which are in triangles and red
and blue and then the genetic algorithms
as well and then over time what you can
see is that his method are performing
fairly well so that's nice it's still
only one problem
so not probably interesting enough yet
but he took the
back to saying plots and took basically
every test function we know in the
literature and basically the plots look
the same so what is what is of advantage
here is he's taking advantage of the
fact that he has one black box function
and one function that he can get these
derivatives out of really quickly right
okay now one of the things that I
mentioned earlier is that Gaussian
processes have an annoying tendency to
degrade when we go to many samples they
also have an annoying tendency to
degrade when we go to many dimensions
and so we did try to test this a bit
there's a lot of stuff that's normalized
here but basically the GD the MPF e and
the vr are the same as before except
that we are normalizing the results with
respect to random right so when you take
when you are looking at an increasing
number of dimension and here what I mean
by dimension is input dimension right
because in the problem that my
collaborators have they have an input
dimension of 3 what if I have more
design variables that I'm able to put
into my optimization problem what am I
going to do then I still only have two
objectives I still only have a blackbox
objective and an objective that I can
evaluate but I am increasing the input
dimension right so as I am increasing
the input dimension I absolutely expect
that my performance is going to degrade
right so I mean as as the input space
gets bigger my predictive power is going
to get smaller and so what we are doing
is we are taking the predictive power
that we get out of our methods and
dividing it by the predictive power that
we get out of sort of random so
basically what's happening is random is
also degrading very fast right so that's
why you see in this first plot as you're
increasing in dimension the two methods
that Finan has developed are
staying there they're performing
relatively well compared to random
whereas the genetic algorithms are
degrading and actually not very
different from random the second math
shirt we're showing just to basically we
mentioned that there are trade-offs
clearly in the second measure we are not
improving over random right and then in
the third measure again y'all so what
Simon has here is he has a method where
he is able to predict what is the next
point that I ought to take as my
experimental measure which is which is a
really really valuable thing for him to
be able to do so not only it Simon able
to say where should I do my next
experiment to try to figure out where
this entire Pareto curve is but remember
early on with these Gaussian processes
what we also have is we have a
probabilistic understanding of how good
our approximation is right so in in red
we are showing the true Pareto frontier
this is the dotted line so this is what
we want to get you can see that the the
probabilistic approximation with only
ten data points is not great and yet the
red mostly falls within the uncertainty
regions as we would expect right so so
basically know the model isn't very
accurate but the model also knows that
it's not very accurate so that's kind of
fun we increase the number of data
points that we're measuring and
basically the noise is going to
significantly down and we get better in
better approximation so that's great
okay so then what we can start to
thinking about doing is looking at the
expensive to evaluate problem we I think
one of the really interesting things
that I think all of us has to deal with
in this area is legacy code right so
we tried and we tried and we tried to
get this PDE model to run on any cluster
but the one that they wrote it on and
basically incorporated basically every
person in our computer support team in
Department of computing until they all
hated me and nothing so we can only run
on this one cluster in Belgium and
basically what Simon's result says is
that we only we only were given time to
run an extra like ten points or
something like that he's basically able
to show with fairly high guarantee but
actually they already had a wonderful
point right they already have that red
point in the upper left-hand corner
anyway you can try out this work his
works been accepted to I Triple E
transactions on biomedical engineering
it's pretty cool okay I want to be
respectful of your time of course you
know I get an invitation like this no
what I tell you everything that my group
is doing right and so what I will only
mention rather briefly is work that
johannes vive who is another PhD student
in the group has been doing with
collaborators at Schlumberger so this is
much if you've seen other process
systems engineering talks this is much
more traditional process systems
engineering little less machine learning
but basically degradation matters right
so what we have is the famous state task
network where we are trying to produce
assorted products using these sort of
unit operations and we have all sorts of
choices to make right we can schedule
processes on the machines we can operate
them that sort of slow normal or fast
sort of operation speeds and then we had
better choose when the maintenance was
going to be now here what I mean is I
mean preventative maintenance right
because if you choose not to do
maintenance well then you're going to
have to do maintenance later it's going
to be much more expensive and it's going
to be corrective maintenance right so we
don't
actually hair just about making
preventative maintenance as cheap as
possible we actually care about the
trade-off between preventative and
corrective okay so this is as
mathematical as we'll get in this
particular section but basically what we
have is we have process variables right
these are basically balance equations
etc etc we have maintenance models and
then and this is fairly similar to work
that was even happening in the late 90s
we're gonna have a health model so the
health model is going to tell us about
how our particular process is is is
going just to mention that previous
people have fought about this this much
at least for since since the late
nineties okay so this is this is great
but what we want to do now is that we
want to combine this process level
scheduling and planning with more
sophisticated degradation modelling so I
don't know if you know this but you all
in addition to having an amazing
chemical engineering department also
have an extraordinary industrial and
systems engineering department just
across campus and the people who are
there have been doing a lot of really
great work in degradation modeling for
quite a while now it's it's amazing
stuff but for whatever reason these
fields don't really come together all
that often you don't often get the sort
of entire systems view of saying let's
look at scheduling planning degradation
and modeling and sort of all together
okay so what is degradation modeling
this was less clear to me right if I
come from the Operations community so
I'm less knowledgeable about what what
degradation modeling is but basically
what you assume is that you have this
degradation signal and over time as
you're using the Machine the
the machine will be great until you
decide to do maintenance
if you pass over this sort of s-max
well you're in trouble you know I mean
and and in various amounts of trouble
right like either the unit is down or
something worse happens or whatever okay
but what these degradation modeling
people have done over the years is that
they've developed this really nice
technology that is using sort of
stochastic processes to predict when
you're going to pass over this s max
with some probability distribution so
what they're able to stay is they have
all sorts of they have a lot of really
neat mathematics that they've developed
they are able to tell you if you can
posit a sort of stochastic process for
them when they are going to pass over
this this s max okay so that's really
cool but I've got a bit of a problem
here I've made this look awfully simple
where I have this flat line right but
this is a scheduling problem and I might
be using my unit in different ways over
time right sometimes the unit is off
sometimes it's operating in a way that
maybe the unit won't degrade very
quickly sometimes it's operating in a
way that the unit will degrade very
quickly indeed and so although if I
already know what is the sort of history
of the machine I will know maybe what
the the degradation signal is
probabilistically
or at least that's what the assumption
is in the degradation modeling community
um I might not know it from a process
operations point of view so we are only
using what is the simplest thing in the
the process operations literature which
is so called robust optimization where
we're saying we give ourselves some
probability of failure and the way that
we give ourselves some sort of
probability of failure is we introduced
this new tuning parameter alpha where if
alpha
is say one half then we are only looking
at the the very particular point and
then basically changing alpha is going
to change how conservative my estimate
is so we're looking at the state task
network and what has been done already
is that people thought about how to
integrate a unit health and maintenance
scheduling and then look at multiple
operating units what we're going to do
is we're going to use robust
optimization so that we can sort of
guarantee probabilistically whether or
not everything is going to be okay
we're also going to Metra sighs just how
expensive it is to robusta fire selves
against you so what's happening here is
I mentioned that if I already know what
is the sort of sequence of operations at
least in the degradation modeling
community they do assume that I can
predict what's going to happen over time
um however what we're going to end up
with is a bunch of different jobs in a
row and then what we're going to do is
we're going to evaluate sort of over and
over again for lots of degradation
modeling signals how many of these
signals are going to end up so this we
can do many many times either with
historical data or just generating lots
of possible signals and then what
Johannes does is that he says well what
is the price of being conservative right
so the more robust I am the more
conservative I am - let's do just
maintenance ASAP because maybe something
will fail and I'm afraid of that for
something like
so as the Alpha parameter increases we
get closer and closer to the
deterministic answer the deterministic
answer is going to be extremely brave
right to think that we can just sort of
push things to the very limit so the
probability of failure is going to shoot
up so that's that peak so as I increase
alpha I'm going to be modulating my
probability of failure and then I can do
what's sort of considered in this in
this field to be talking about how
expensive was it for me to robusta Phi
my process
so as my probability of failure are
increases then the cost of doing the
preventative maintenance only if it
isn't right I mean this makes a lot of
sense right so so basically if I'm going
to have an extremely expensive process
where I'm doing maintenance constantly
I'm probably not gonna fail that's
really useful so what what yo.hannes
actually ends up doing is that we don't
we don't know what is this value of
alpha what how we should tune this
parameter right except that we know what
is the cost maybe of preventative
maintenance and we know what is the cost
of predictive maintenance so what we can
do is that we can just solve an
optimization problem that is saying what
is the cost of preventative maintenance
versus the cost of predictive
maintenance now this is honestly a very
long-winded way of saying that we get
right back to the same sort of Bayesian
optimization the same Gaussian processes
as we were using before because what we
have on the the right hand side is the
cost of failure
and then on the left hand side we have
the cost with respect to doing the
preventative maintenance as we're
planning to do it so these are too
expensive to evaluate things and
basically what we do is that we go right
back to using Bayesian optimization
again right so Bayesian optimization is
doing the Gaussian processes and at the
same time it's sort of predicting what
is the next best place to sample from an
optimization point of view and so what
we're here in in blue is that we do many
many runs and consistently the the
Bayesian optimization method knows where
knows how to balance between the
corrective and the preventative
maintenance this is also available I'm
brave enough to put the name of the
journal before it's accepted because I
don't normally get that bit of comments
in first around with you but also his
code online so basically what he's doing
I flew through his project and
apologized for that but basically he's
taking historical data he's developing a
sort of typical process operations model
he's also developing a sort of typical
degradation model and then he's he's
melding them together using robust
optimization which is quite cool so
thanks so much for having me I want to
end now in case there are questions or I
hope there are questions and you can try
anything that we've done so everything
that we've done should be reproducible
if you can't hit Bo on those on that
code and get exactly what we get in our
paper we have something wrong and please
let us know
[Applause]
thank you yes yes yes right there's
nothing bad about f2 there they are
models so this is I mean this is the
classic in discriminability problem
right is that so
f1 and f2 within the noise that was
within the system are identical problems
I mean from an engineering point of view
what would I do well from an engineering
point of view either I would find new
maybe input design variables that I
could tweak or I would find new
measurements that I could take or I
would find hopefully I would try to
lower the noise in some way or another
or I would develop the model in
different in different domains I am in a
computer science department and so all
of those conversations are not things I
mean what my students are experts at is
developing methodologies and so the
reason that we we cheat and we just drop
f2 is to check that we are not going
insane and that it's not that our method
is just performing like wacko right so I
mean yes if I have a if I have us if I
have a system that is indiscriminate
ball I'd better do some really clever
engineering right about then we
only trying to design the computation
system that's going to tell the models
apart yep how how is it how is it at all
reasonable yeah so there's two so it's
it's a good it's a it's a it's a good
catch right because there's two answers
to that the first is we're not actually
doing all that great we're just doing
better than random right so so what I am
doing here is that sort of wine under
the GD the wine I'm under and PFE I'm
sweeping a lot under the table and what
I'm doing is I am comparing to randomly
deciding where is the next points that I
want to evaluate so even though I scale
really badly in terms of dimensions I
don't scale as badly as random scales
that's one answer the other answer is
that there's two types of dimensionality
here right one is the input dimension
and one is the output dimension now what
we are using here is we're using a very
particular test case function that's
well known in the multi objective
optimization literature and the reason
that we're using it is that it's the
only function that we know of that
scales really nicely with respect to
dimension where it's dimension is just a
tunable parameter and then the problem
gets bigger so there's probably some
sort of special structure in there as
big as you make this function there's
always going to be two outputs but there
might be more and more input so is the
the GP finding some sort of pattern in
the data possibly right so so I I like
this but I do not claim that this
happens all of the time frame
please yep okay that's a really
excellent question right because right
at the beginning I I said that with the
GP that where we are going to make this
this normal assumption um with like
heteroscedastic heteroscedastic data
they can do some amount of normalization
but it's very very hard whether it's a
reasonable assumption or not is an
excellent question from an engineering
point of view right the results show
that for at least some of the test cases
that we have tried it's a reasonable
assumption but that's a dangerous thing
right because machine learners regularly
get themselves in trouble this way right
you know like oh this seems to mostly
work and then I'm predicting really
weird stuff so it would have to be very
problem specific it's a good point I
mean it's something that we have to be
looking at right yeah
aha
this is okay that's that's a really
that's a really good question could I
use it to find new models at least at
this point the best I can think with
respect to that is the kind of work that
happens often in chemical engineering is
we have one model that seems to work
often and then we want to move to a very
slightly different scenario right where
we want to change a few of the input
sort of few of the inputs and then we
want to see what is different the best I
can think of where Gaussian processes
might be able to contribute something
new is if we use that previous model as
a prior as our prior knowledge and then
what we're doing is with each new sample
we're figuring out how we deviate from
things once we know how we're deviating
from things from from the old model
using our now posterior distribution we
would have some knowledge about the new
function perhaps that that probably
because GPS are nice and you can do all
sorts of nice analytical things with
them you could understand in some sort
of sense that's but I agree that there
is an awful there's a lawful lot of big
questions you have a lot of work to do