Thank you.
So this is my first visit to Georgia and
really excited to be here and
give this story.
OK so.
Today we're talking world language
it distills mean product and
fiance language it bestows
the same problem which is studied
since nine hundred seventy
two it's a generalization of
string added distance where the world
is more in the dark means less product.
Meant less Mattocks multiplication this
is a valley and if you can solve mean
plus mathematics multiplication fast and
even have a faster the Guardians for
computing albeit shortest but even graphs
So look at this fundamental questions and
look at their connections and
many implications but
before I get into the main part of the
talk let me do a little bit of backstory
as sunfish mentioned divers
are between two labs in Maryland and
they're like many labs went through some
transitions and similarly for the D.N.D.
and I was made to do a lot of
data crunching for to look at.
And there were my Some of my colleagues
from data was grooves they also mentioned
this issue that they were facing So
the issue was they had large X.M.L.
files and for some strange reason those
experimental phase will not come by.
So we started looking at what is happening
and this is what exactly we found so in X.
Some of documents and also in H.T.M.L.
documents are the home pages that you.
Do so in H.T.M.L. documents.
You get to see these open specs OK So
this is an open barrack this is a closed.
And if an exam and document has
to be valid These are open and
close tags they must match our
we know we are formed to be.
Their problem that we've
been facing it in T.
was that many of the six The MILF I use
they had been gathered from different
sources they have been integrated and some
areas have happened that were bad were.
Going to.
Muting to mismatch in the open and
close spec sequence for
example this is a document which is
well balanced with value is a valid X.M.L.
file you open a title tag.
You open artist that closes
artist you open the publication.
Back So this is well formed but
this one this is not well formed so
here it should have been an open
tag whereas there is a spec and
all external compilers there are very
draconian in nature whenever there
is this kind of error is happening they
just don't process stuff I there is a has
had the first time when they get good to
see and something that is not matching So
this was happening and it didn't even
then we found out that this is nothing
unique to a dainty this has been reported
quite a beat there are lots of X.M.L.
letters and even more H.T.M.L.
for example Google reported that out
of the exam of the file is there for
the second largest cost.
Because of X.M.L.
it is about fifteen percent of the time is
a mismatch between opening and ending X.
so I wanted to formalize this problem so
let's look at this and
example so this is again the next summer.
Which is not valid but how can you
make it valid if you look at this auto
start if you did this Arthur stack just
making one added We can make this X.M.L.
document valid so we suggested
let's do this minimum added distance
Mr Pierce So if you have an X.M.L.
file that if you look at the stack
sequence if it's if it is not well
formed let's try to make it well formed
from a making minimum number of edits on
it what kind of edits we can insert tags
we can delete so you can substitute X.
We also did lots of experiments and
we found out well this minimum I did this
and this repeals this part from the SO
in this plot if you go you have
better relations after that.
Correct it looks more similar
to the golden standard.
Five And what of this though these
are some of the here are sticks in that
different browsers like Internet
Explorer Google Chrome they all have
mechanisms to do this kind of editors
because otherwise many of your H.T.M.L.
files you go to the waves.
Are not open.
They contained Yes Yes And
they do a lot of greedy had to stick so
they had to combine
the grid here to stakes but
we found no minimum at a distant
district is really to perform with.
So formalizing a little bit more so
example data formats or
maybe you just want to correct the X.M.L.
files and look at the content and
the structure together the data
format it can be present and
by some context free in language so it's
a formal language want it's free language
and if it does want to
look at the back cyclones
that back sequence basically its presence
and what is known as Dike language.
So back language is nothing but
the language of real balance parent this
is a particular subclass of context.
And when you're talking about X.M.L.
leafleting we are trying to say well you
were given an X.M.L. file which may not be
valid member of the language the context
free language of the day language so
let's make minimum number
of edits on aid so
that after the edits added that file
will be a member of the language OK
six some later do all the many
kind of data but if I have
all this period academia I would probably
have not encountered this kind of so
I encounter this kind of it by peculiarly
because I spend some time in industry so
let me I would have thought about
some computer biological actions and
computation of a given
someone in a sequences D.N.A.
sequences often you want to
find no underlying structure.
As one search problem which
is very basic to computer
do you want to find the second to
restructure it in Issaquah answers.
Me We are all only very interested in
graphical gardens when many of you
are interested in gravel got of them's.
Shortest but is one of the most
fundamental question that
we would like to answer and we'd like to
have a very fast regarded him for it.
Or maybe just metrics multiplication.
So what I really want to propose is
that all this problems the X.M.L.
in a collection might look very
different from the others but
they're really not this connected
they're very well connected OK so
that's with that goal in
mind let me define the.
First problem language at a distance
that appeared in my data.
So what is language added distance
language added distance it's
a generalization of two basic
problems in computer science
one is going to extreme Grammont by
saying there is string at a distance so
going to three grammar passing nothing but
you have a context of grammar so here
is the graph this is the grammar for date
language well balance parent is this and
you are given a string and you just want
to know whether the string can be positive
guarding to the roots of the grammar this
string can be parsed the string cannot be.
And the string at a distance of
the game here you are given two strings
one reference being one input
string you want to make minimum and
it's on the impulse string insertion
deletion and substitution making this
minimum anything want to make the impulse
string look like very different string So
for example here if you
delete these capital T.
if you substitute C.
and then you insert the here with
three Idiots you can make the input
string look like that if the string.
String at a distance computation and has
been studied a lot in theoretical computer
science but also in Applied areas like
in natural language processing and
computation about the D.N.C.
language it distances generalizes board
this problem by seeing and string and
in this sense how so far string at
a distance we are just comparing with
a single reference ring when input string
comes in language at a distance we do have
a lot of strings to compete against So
women input string we say well.
What is the minimum it is you want
to do on your own input string so
that it looks like at least one
of the string in your sit so
instead of one we have many differences
for the so that's what we mean and
generalize a string at a distance and
how does it generalized by C.
so in passing we are really trying
to answer as you know one question
of whether the string buses are not and
here we are interested in finding the
exact distance so how many of its early
career in your string such that they did
that distance and if the string with us.
OK instead of zero zero and run when Sir
you want to find the minimum distance
from the strings that are generated
by the language with string.
So that with and without it distances
generalize a string it spins.
And this problem this language
that it dispenses tries to
answer questions of the following form
given a moral model which is specified by
a formal language does he or
did does feed the Morton.
And so the our model is
the Lang the language stream so
do you want to know whether
your data fits the model or
what changes are required in your
data to make it fit the model and
this type of question so we were
interested to study it from the data
quality point of view external correction
but it originated from the study of it I
got it in parsers from compiler
optimization it has many applications and
computational linguistics as
well where you want to find and
correct mistakes grammatical mistakes
in computer Chanelle biology where
we can find distance between a gene
sequence and a mallet men disease which
is modeled by a gram only an hour and
if willing and so on.
So has it done so this language your
distance problem it has a very old history
so it was studied as early as in one
nine hundred seventy two by Peterson
who studied it for
the purpose of designing a parser and
they gave a dynamic programming
algorithm that ran scene in Q Where N.
is the streamlet with a quadratic
dependency on the grammar side.
So this time there is dependency of
quadratic dependency was later improved
by Meyers in one thousand nine
hundred five but the Q.B.
dependency on the stream enjoyment since
this problem has lots of applications
there have been many fast you listings for
it to those who listings did not come
with any kind of performance count so
what we are interested in we are
interested in to design a faster faster
than in Q It should return either
an exact answer to the minimum distance
on an approximation to lead for a lag with
it distance problem so that's our goal.
And this is the first set of results
that I will briefly mention will
show you the garden and
how to analyze it very briefly for
this set of results so the so
we gave in the first sub cubicle got of
the foreign language at a distance problem
this is from two thousand and fifteen
which fire which computes the distance
approximately near optimum so
we can give an M one plus minus abseiling
approximation and the running time
is indeed the omega are divided by
their sum of silent dependency So
what is Omega only guys the exponent of
us metrics multiplication which is about
two point three seven three also so
there is another parallel result in
seem to cause and fifteen which shows
that there cannot be any non-trivial
multiplicative approximation guardian for
a language at a distance problem.
In time lives than interview.
So in that sense this interview my god
the god of them that we have is up to.
Yes So Herc So this is a million
metrics multiplication so
it says that if you can have faster and
faster language at a distance problem with
non-trivial metrics multiplication you
will have a faster guarding four
billion matrix multiplication and
the same to me could give someone plus
minus Absalom approximation algorithm for
computing all shortest but
density graphs and as a.
Result of many other distant best
optimization problem one graph so
that with a connection from language
there distance to grap problems
also we get the fast software we
go guardian for maximum likelihood
pursing us took us to context free grammar
so what is maximum likelihood parsing of
the request of context to grammar will
still gusty context free grammar it's
a generalization of context where we each
production rules so we have some rules for
every grammar so here we do every every
rule you have some conditional probability
how lately you have to apply this rule and
then given a string
you want to know what is the most likely
parts of article listing according to
this given distribution so
this can be solved in N.Q.
dying by modifying the willow and see
where you go going to them for parsing.
And for our dignity see if the person but
there has been had been no progress
enough to make us up to legal guardian
I don't exactly approximate for
us because the context place.
Struck us you're going to me it tries to
answer a question of the falling foam
here given a problem generating model is
took us to conduct three Grammont can
we view it as a generated model so
given a probabilistic generative model
what is the most likely explanation.
Because we are trying to find a past
that is maximum likelihood pursing So
what is the most likely explanation for
your observation.
So these kind of lies of the foundation
of natural language processing so
struck us the context remarried
generalize us here in my government to
generalize us different types
of adopting modeling Lake.
Has been used a lot in biology and
in other communities.
So we give the first give we go to
gardening for it giving me an optimal
performance approximation OK so
any question.
So let's look at Q We go gardening for
language a distance that was given
by discern more than forty years
back in nineteen seventy two.
Service over there is really simple so
you have an N.
square sized to make programming table
every instance in the dynamic programming
the presence of particles so
you have the string length N..
And every entry in the dynamic programming
tell it basically the presence of
particulates substring you want to know
whether this stuff string from I DO J.
Gagnon we parsed are what is
the minimum distances minimum and
if we want to need to do in
that particular substring.
So if this sale presents X.
that means well OK so
if this I do this the prison see Z.
Z.
That means starting from
particular entries see
who can parse the substring I
do GA With an eighty deg C.
with a minimum number of eight its of C..
So how can you compute that so
you will look at all possible
break points like I do K.
minus one and K.
two G.
of the string A to G.
and then you look at OK so
I took a minus one I can see it
starting from A with the editor of X.
I can parse K.
to j starting from B.
with the editor of Y..
Then of course like I can parse I DO J.
starting from C.
because it says I took a minus one B.
buses K.
to J.
so I can I do J.
starting from C.
with a deed of how much.
Explicitly.
And if there are many scores
here then I will be taking
the minimum of expressway because I
want to find the minimum added distance.
In that's a very here that's a very
good question in any context for
example you can convert it into
the stream scheme normal farm and
that is your right hand side we have to
non-terminals and that is essential for
the standing program.
OK so far X.M.L.
It's a special type of grammar move
with this a diagram on the day.
Showed a few slides before.
The X.M.L. back sequence and when so
this was the grammar so you can convert
it into a Chomsky normal form and then
you can do that and I think programming.
Then we have to read that gives you
the same that generates the same language.
City directly yes.
Yes So we deal with it
directly on the X.M.L. Texas.
I mean you know.
What I mean holiday and
it's a happening yeah.
And.
Yes I will look at all speeds and
then you can then I can manage it yes.
With us because.
So.
Some of the tags this is this father
follows this grammar right so
you have different open parent this is and
OK So
let's look let's look at this
grammar I want is this grammar say.
OK what is grammar saying so next time I
fail you have opened and closed so each of
these open back you can think about it as
open parent this and close stacks are you
spelling this is so this grammar is
basically saying you have two types of X.
two takes up and that this is this one.
And this square one but
you can have many others so
what is the saying forty yard external
file to be well balanced You must have one
something well balanced then followed by
two matching balances open and close.
Here again something you need
something to be well balanced
here Richard we have an open open parent
this is followed by a close Panther says
well what you can have you can have
something that is well balanced and
something that is well balanced.
OK so this can be shown that this grammar
basically generates all stree all balances
and strings that are well balanced So
this is a very typical question in
undergraduate I'll gardens where you are
given a sequence of parent hisses and you
ask can you tell me where the decisions
of balance is as well balanced or
not so as a very typical answer that you
just use a stack you insert open parent
this is in the stagger anywhere there is
a close parent this is take the top of
the stack match it if you can match it all
the way down there in the strict sequences
balanced and this is not in John skin on
my farm because these are kind of the CMOS
that is generating in the string and
these are the entire mills but
you can convert it into Chomsky form for
the Johnson on the phone we learn to look
so in duty which will be more complex but
then you can use the dynamic programming
to find the exact sequence of
changes exact sequence of edits that
I declared on the old X.M.L. tags.
To remind you.
This was one place.
For this part so there are a lot of us
would say is that if you want to use
dynamic programming even for these grammar
you cannot do anything better than in.
A century.
Well.
Yeah so if I want to recognize
computer aided distance based on this
grammar welcome to Day to disinvest on
the John Johnson form of the grammar
I didn't read the same language.
So there's a density question
OK so this was the dynamic programming so
for computing for
every idea I'm looking at all break
points I took a minus one and K.
to J.
just combining using the minimum scores
you were just adding up this course X.
and Y.
and then here taking the minimum and
you get a combining this non-terminals
the envy because you have this production
see you saying I took a minus
one can be generated from E.
and j can be emitted from B..
So why is it and kill because you
have in square size table for
computing every every entry you have
to look at possibly order of and
break points so the total complexities
are in Q And here I am not showing you
the dependency on the grammar where do you
do have to go through all the grammar so
there is a dependency on
the grammar errors OK So
that was a dynamic programming of guarding
OK I forgot to mention one critical
part of the standard programming.
Do you see this pink I learned.
Being colored productions So this being
colored productions are some air producing
rooms whatever they are producing
rules this way or the normal rules for
the grammar if you would have
just used this rules that means
you that means you haven't gone daunting
to do any edits but maybe you have to
do some edits so these rules are allows
you to handle these edits for
example in the original grammar this
non-terminal Capitola that was generating
the scene will be now I'm lifting
this non same non-terminal Capitola
to generate the scene will be as
well what does that mean I mean I'm
allowing substitution so
I can substitute if need be but whenever
I'm doing substitution I have to be added
cost so that's why the school is one.
OK so here this is a this is for
substituting a with the similarly We'll
have it was originally generating B.
but now I allow will to generate is well
and because they allow that I have to
pay for the substitution cost so these
are production rules for substitutions
Similarly I can have admission
reproductions for insertion and deletion.
And then at the end we
are really trying to find
the finder we are seeing such that
the total score is minimized.
So I originally productions
all have school zero and
this admission and productions have the
cost which reflects the added cause that
you will be paying for
substitution or insertion or deletion
Well even even for normal parsing there
will be dependency on the grammar Ses
because you have to go
through this productions but
this admission rules were
causing the dependency from
Lynyrd required reading in the grammar so
for that.
Improved by my own.
OK so
now this is an every programming clear OK
so now we'll view this dynamic programming
in alt and it we still view those than
I mean programming computational as
a way of computing transitive closure or
special paper metrics multiplication.
So basically I'm going to say the same
thing as a fit for the dynamic programming
you'll have a metrics Initially you'll
just have the diagonal entries they will
keep multiplying and you will compute the
transitive closure and at the end whatever
your transitive closure metrics looks like
that will be exactly the same as if you
have computed the dynamic very program
to the end and have looked at that
at every programming table so
what does those metrics multiplication so
every entry is the spear one non-terminal
and one score so saying this is X.
means basically a front string substring
from I took a minus one it can
we part starting from here
with the school of X.
kid to this one is sinking into J.
can we starting from B.
with a score of Y.
when you do metrics multiplication
from N B U General C.
because there is this production and
from X.
and Y.
you generate expressway and instead of X.
and if there are multiple
search Lake Effects one excuse
me while you were on the way to you
will be taking these last weight.
OK so it's all donated we're viewing the
same dynamic programming a beautician So
why does this alternate view is
going to help solve goal is to be.
Alone.
Yes.
Yeah.
So I.
Saw I had through some of the same or
similar saying so this this is America
are dismal numbers mean after Of the six
months the annual compute the transitive
closure so why does this viewpoint
really helps in this viewpoint has
to help us in designing faster algorithms
then we have to answer two questions so
fast we have to show you how to compute
this matter explored at first and
then I have to show you have to complete
the transitive closure first OK So
let's look at how to compute each metrics
product first so far that first of all
it can be shown to have already stressed
weight of the that we are computing mean
our class so this particular type of
magics multiplication that we have
it can be computed in same time as
computing mean class metrics computations.
So remember there was
a mean plus in the title.
So what is mean classes easier product
metric so then go to compute digest entry
of the product metrics you look at
all the breakpoints and then look at.
In your normal logical multiplication
you would have a multiplication here and
you will be summing up so
now Sam is replaced by mean and
multiplication replaced by less.
I'll be assured this but it's basically
you want to go from I to move to the node.
You are adding up and
you're taking minimum that's why the mean
plus product is basically can valid if you
know how to complement less product
first you will have a faster.
Yes but so what do you know about it
we know that we can even can cause
we can compute meaningless Magic's
multiplication in time which is in Q..
Or to discredit or logon and
they're missing results and we Liam's but
this does not give us any sort of
cubicle going at them so you get.
Savings over and kill but
this does not give you any truly
self cubical going to them and
if you're wondering why you cannot have
Fosamax multiplication Because mean plus
brought up it's not a world when it's over
say meanings to all the techniques for
Magic's multiplication does not apply here
so means less productive basically you so
far what we know is order of it and
you will call it basically order in Cuba.
And this one now standing up and question
whether me call computing means less
product suppose all the entries are indeed
does and they are between one and
in when is the dimension
if it can be solved.
Because if you can solve it you can solve
all the shortest but in the love in Q.
and A bunch of other problems OK So
we haven't really improved turn our
director and COO Well then maybe if we was
approximation it can help and indeed
there are some of the Guardians from one
thousand nine hundred seven which gives
a good approximation of gardening farming
class product and that garden
proxy missional garden ran scene
into the Omega time when we made eyes
exponent of metrics multiplication so
maybe we can use approximation for
computing each of the product first.
OK So let's look at
the second question computing
transitive closure just generally if you
can compute each metrics product for
us computing transitive
closure is not a big B.
because it will just grab
the metrics in your dick and
then it's great again Square
again logon times you repeated
if you can compute each of the product for
another factor of log and
often you can even remove that you
can compute the transitive closure.
But that's not the case here very strange
because here the metrics month that
we have is not even associate the.
So because of this non diagnosis so
when you
are multiplying these two matrices
because you have this production C.
goes to A and B.
you can multiply these and we get C.
and then maybe there is
an other metrics with as D.
and now C.N.D.
can be combined and you get some E.
But if you try to directly combine B.
even that would be there may not be
an introduction which combines B.
and D..
Which means that this matter it's
multiplication is not associated and
that's a big problem because not
associated with the means there is lack of
flexibility you do not have
the flexibility of just grabbing for
example if you have a very long history
that will imply you have to say you have
to multiply the matrices in a very
particular odd you might have to multiply
the matrices in Bain's your simple logon
time squaring is not going to work.
So what does that mean that means
transitive closure computations
mean it's a long time.
It's not easy to say that
the transitive closure time is same
as metrics multiplication time and
the second thing is that.
Even if there is more literally
an approximation we are trying to use
that proximately approximation algorithm
for each product even if there
is more there only an approximation
that Iran will propagate maybe N.
times and that will blow up OK So
what do we do.
So back in nine hundred seventy four and
is the valiant in his Ph D.
thesis develop these I missing algorithm
for context free grammar parsing where
the improved for POV scene and the running
time from N.Q. to into the Omega and
there he showed basically he
considered this kind of non associate
the products and he showed how to
compound a computer transitive closure.
For metrics multiplication and that this
kind of non when you don't have the CD
will be in time same as computing
single metrics product so
it's a very nice guy there are a simple
very nice garden just based on divide and
conquer but now in divide and conquer
you generally divide the problem into
these joint parts here you see divides
into some overlapping birds and
then overlapping mattresses is very
important because all the savings
are coming from this overlap so if you're
interested I can talk about it later.
He does that so that this one hole so
here we have more complex
operations because we are taking
mean it's not simple passing but
we can use a valiant garden we can extend
it and we can show that time to completely
compute transitive closure is basically
same as dying for a single product.
So that was good but
then what about the error propagation
of an error propagation is cannot
be handled by balancing it so
if you want to plug in some known
approximation I'll go to the information
class product and use violence I'll go
out of their legs tensions of L.A. and
sell garden you can construct examples and
sure that world that approximation alien
propagate maybe not in times of out into
the like order into the silent times and
so on so it's not going to give you
in one place if silent approximation.
So here are some of the key ideas so
first of all you can.
Compute single metrics multiplication
of the type that we have
fussed if the scores that the edits
are the scores that we have and
integers which is the case but
the integers which are bounded by C.
capital are in that case you can
use these old guard of them by.
That and also a subtle where you
can compute the product in our
brains into the Omega.
OK but this does not help us
because the number of it is
the score is not bounded it
could be as high as order of N.
So this does not apply but
then we also have served a single metric
multiplication you can do it fast your
interest does not need to be bounded but
as long as you do not have too
many distinct scores distinct Now
if that made it a distinct
schools if there are no not
too many distinct course you can
still do the products faster so
far that if there are distinct
scores then we can create.
Numbers natural numbers in the range
of one to Our Square which is known at
noon a C.
then sequences This is known sequence or
has a property that if you take beers from
that created numbers in that range one
square they have distinct sound and
that property that they have distinct some
basically help us to get an Al Gore or
them.
That has a running to model our
square into the Omega OK so
we can still get faster
algorithms when they're
not when the entries are not bounded by
there are not too many distinct entries.
So you want to know where we so we're
going to stick you use that property so
that this will be our final garden so
we've tried to follow valiance method for
computing transitive closure not
directly the valiance method but
the extension of it right after doing
each of the metrics multiplication
that is suggested by Valiant.
We do some depend on
rounding on this course.
So we'll do the dependent learning
to make the number of distinct
schools that apiary in any of
the metrics to be not too many so
that we can imply that in
the construction of C.
Don't sequences and we're going to
multiply those matrices first and
how do we keep these chords to be
not too many distinct values so
so if we have if they fail to mention B.
is and why is so this a non-terminal
in the school is why and the why is C.
Suppose one plus Delta to the our class K.
this is a score then I will round it I
will now need either down to one president
or to the are all I will rounded up
one class builder to the last one and
I will do this mounding by choosing
appropriate probabilities so
this probably destroys
a problem it is will ensure.
That even if you have a long bust like
long stream long past his originally done
one which is more difficult to handle we
can sure the expected cost of the street
is same as the actual cost because of the
choice of the probability distribution but
more importantly
the variance lumens bounded.
So once a variance remains one of then
eventually you can employ a leg is
fine and then we can employ want to show
that this timid cost estimate cost is very
close to the optimal cost and
that also gives approximation
a garden not really for language at
a distance but we did all this but.
Also similarly force took us to contacts
with them and pricing and so on.
Switches Yes the variance is one
that lets them in you have no.
Difficulty showing Putin you
would smile but it just works.
OK so now you can ask really not doing
very well with it and so in other then ask
why approximation Well there are some
reasons for using approximations for
example for strong gusty took us to
context to grammar but I think we could
show conditional Hardman's that if
you can have a faster exact I'll go
in first because we can't experiment but
I think it will directly have a faster.
Spot OK So
that kind of gives you some hardness
implications first across a context.
But here comes an interesting part I was
trying to prove simply lurk on dish in
the lower bounds for a language
at a distance computerization so
I had a high hurdle about which shows
that if you only allow insertion
as you already did operation.
Then yeah you can sure hardness you cannot
compare language to dispense Exactly.
But when I was trying to modify my
proof using three all types of it is in
session deletion and
substitution that Rufus failing so
obviously that raises the question
is language and a distance Q.B.
card when all insertion deletions and
substitutions are allowed.
What is happening and
why is the lower bound failing.
And then we are served basically it's not.
So when you have all the street types
of edits insertion deletion and
substitution if you look at the metrics
multiplication that we read to be.
This quarter's have a nice property and
the property is that it's has
a property a bonded difference what does
bounded difference means is basically say
do if the elite number eight
it's too far substring I DO J.
is excellent for substring I DO J.
plus one it's been known to
be much different from X.
it will be either explicit one or next leg
the difference could be at most one or
Similarly if you look at
the column like I do G.
I do G.
and I plus one two G.B. Then again
the scores will not change much.
So this means less productive years we
are reducing it to computing the spatial
matrices by the special matrices if you
look at the scores it has this bounded
difference property if and you have is
bounded difference property only if
we allow all this free tips of it if you
just follow insertion this matrix is will
not have the bounded difference property
and that's why we are getting this Kiwi
curtness So if we are all the street a for
it it's those metrics as we have bounded
difference property and then that
this was our main contribution we so
we should that mean close metrics product
with bounded difference it can be computed
in less than cubic time soon going to
have a truly stop here we go guardian for
computing means last product when
the matrices have bombed a difference.
Into the two point two four four.
It has where I and that was the complexity
we reported in our paper but
then there was also the.
Why is a strange number with me because
it's coming from you have to use
rectangular metrics multiplication and
it's coming from there but
then since then there has been a recent
paper which improves the rectangular
metrics multiplication so that will have
some implication for our complexity as
well a bit from two point eight two four
four it will go down to two point eight.
So what do you have we have
stopped you we've got a garden for
boundary difference means less
product it's under different
properties a significant generalisation
a bounded integer properties so
you do not mean the are metrics to have
bonded indeed there can be as high as N.
you doesn't need to have bounded
difference so if you move along
there are also three if you move along the
columns their values will not change much
and how much it can do those values
change basically we saw we show
that that's the difference in any
one dimension for any one metric so
one of the metrics can be completely
arbitrary as long as form and
other metrics and it's also not required
that it has mounted difference property
in what the rows and
columns does in one dimension.
If it is that most end of the three minus
will make on mine if it silently get us
up cubicle garden so if you substitute or
may not be true if you are optimistic
that matter is multiplication
going to solve the main square this means
you can handle bounded difference of two
into the one minus Epsilon and if you can
get rid of the sign and then basically
have solved the open question like getting
a meaningless product at a garden for
one to anyone to do about it and
that would be fantastic so the same thing.
Gotten in far out any folding because I
don't if well you can view it as a spatial
case of computing language I did
distance and we also get soft quarter
to go girl names for what is known as
mean plus convolution monotone means less
convolution this problem a study even
start with thousand and fifteen by Jan and
Lou and Stein and they gave the order to
go to God the man using on Me Tell You can
also get us a order to get them for it we
don't do using so they use this nice here
they mumble examine A.D.N. dollars from
additive Combinator expert reviews we can
give a very elementary proof for getting
some idea what radical gardening for
me Ask them Lucian I have ten minutes.
So let me try to give you some
glimpse of the proof OK so
what is well no difference again if you
look at consecutive elements in the rule
for cons if you do elements in the column
The difference should not be much we can
handle much higher difference but just for
two this sick cause either the difference
is one in fact for language added distance
or out and if folding these differences or
not and if willing is differences to for
language at a distance as differences one.
Similarly let's assume that what the
matrices have been a difference though we
can handle just one metric to be bounded
difference and just in one dimension but
you may again for to this it lets consider
the two matrices about a difference and
just one of a difference.
So here are the stakes for
the past year in sub give it time
we will compute a match
Ricks see price that really
be an added People approximation of
garbage in the production metrics.
You want to compute see the product
metrics Instead I will compute see play
what does seem from what
property does the frame house
if we look at any idea back to entry and
the entry in the sea plane.
They are bounded by their absolute
differences bounded by some form
into the upselling epsilon is apparent So
you get an additive approximation fast so
once you are labor approximation.
We will call a triple I K J to be bad.
If.
So wonder why do you need trying to
compute we are trying to compare C.H. And
so which triples contribute to see
ID all the CIA King and we get so I
get it triples contribute to C.A.G.
it so we'll call these I.Q.
to triple to be bad if you look at I.Q.
players be good if this is very close to
see playing my age I wonder what does that
mean it means the I.Q. players big it is
also very close to see ID we can
see play my just very close to CA.
For that means maybe this year I
could be good this is the lawn
that is giving you the minimum value
that is contributing to compute CIE
so bad means we cannot ignore it we have
to consider this I get a triple because it
might be upon Tendo for computing CA did.
I get is not good it will not be
bad if I get is not bad then is
good because we don't have to
look at it it's far away from C.
play my idea so far away from C.
idea we can ignore it and
idea is if the number
of sides back triples us up to week and
we know which are these bad triples
then we can vary and will quickly go to
only this back triples we look at it and
will compute the ID first and
if not if there are none if there are many
better both then what we do into some
non-trivial partition and we by doing
this non-trivial partition will
reduce is bounded difference mattresses to
compute mean plus product on matrices with
bounded entries we know matrices with
bounded entries and can compare them fast.
So maybe just one non-trivial
partition will not help so we'll do it
i United Nations some of the entries
will become bounded and after a few
idealizations we showed that the number
of bad entries are really stopped Kilby
we can find there is quickly and then we
can just look at them and compute the age.
That's the main stage.
Let's look at it in some little
more detail five minutes OK So
the first of us to compute additive
approximations since we are considering
bounded difference computing
additive approximation is not hide.
So just look at these great
points these big points
are separated by into the absurdum.
So if you just compute the multiplication
based on this great performance
all those entries here are very
close to this great point and so
just compute the metrics multiplication
based on these great points
you know and once you have those values
this value of truly approximates all
the others in this book so
you do good by doing just this man's magic
multiplication of the good points and
get the property that we are looking for
additive approximation.
So we get a good approximation for C.A.J.
might look strange that all we're using
There's been a difference property for
getting this additive approximation but
let me tell you one thing
that if you have an arbitrary matrices
you can do it of approximation.
It's not as simple as this but
we do have ways to handle it.
OK so once you have this by additive
approximation that we don't know how to
find back triples we have to find bad
triples because if you can find bad
triples we'll just look at them and
we're going to consider this contribution.
Yes.
Yes.
I wanted exact.
Yes and that is additive approximation.
Yeah so now I want to find the back
triples How can you find a back triples so
suppose I know OK when he is
I can say is about triple is
a plus because he is very close to
see three major then it's a battle so
if I know it's a bad trouble then says
these are very close to I care this is
very close to good did I basically
know this entire part is bad is bad.
And conversely if this AI can play
as big a deal where not about triple
then I know indeed this block is not bad.
Some boundary difference property helps
us to find this about troubles by
just looking at discrete points and
there are sub cubic number of good points.
So if we have only our dive into the three
minus a little stop giving number about
troubles and great exhaustively a new
method the back triples are numbered them
in place product appropriately and
additive approximation is of giving
completing over the battle of the stuff
you will get us up cubicle guarding
Otherwise you there are many
battles that Riemann and
many battles means there
are values very close to the mean.
And that's a property we're going to use.
OK so now what we're doing to do we're so
far now let let's assume that all triples
are bad OK It's all in good triples
are bad so we'll do our non-trivial
partition on the matter X.
and our goal is to transform the matter
eccentrics to small indigenous because if
we can make it small and it does then we
lose that Alan go in and model it'll cause
a lot of them and we'll compute it fast so
far that left speaker law X.
on column wide randomly
from the Great points
you have bounded difference how you
make some in interest to be small.
For example if I subtract the consecutive
rows it will become small.
Because the difference is not that much
let's try to do something similar so
you have the idea let's take this go X.
subtract Xscape.
One seriously since your substracting
you want to maintain the same E.I.
cable as the gauge is so
you add extra here Sam part I mean
I'm just creating this new entries I'll
be back a day and doing the partition.
New mattresses and.
Yes I picked X.
and Y.
randomly so I subtracted X.
case I want to Minton Samal fi K.
plus B.
decade it was same as a I get as big
a J So I've added excrete to it.
OK.
So now what is this a I came minus
a skid Samus saying a cable as big a Y.
A minus biggie right minus six K.
but then what is a I get less became why I
came out is about triple everything is bad
so that means that I could lose
became a is very close to C.
C.
prime I want which means it's very close
to CIA
because the prime I mean is ridiculous to
see anyway here what is this this is a X.
can we can write and why do we know X.
K.Y. is about triple So
that means this is very close to C X Y.
C prime explosive close to see X.
Y.
So this is within this similarly here
what is this extension as big a j
we know Xscape j is about triple.
So this is very close to see extra.
So then now subtract CIA write subtract C.
prime I write our science
abstracts the premix way
where do you get you get an entry of
rallied in class minus order of into
the silence because this
is very close CIA and Z.
frame I.E. are very close to each other
see X.Y.Z. premix where you are very close
to each other and compute them in class
product and once you've completed the mean
plus product How's your product looking
like like I'll take a day is the I.K.
plus begin to minus the prime
I you write less replying X.
Y.
minus the pre-mixed So
it just depends on the stakes and
why that you have chosen an i and j so
you can then convert it and
get back to element list.
So this is what happens when you have all
triples are bad and obviously you may not
have all triples to be a brat in that case
will transform many of the metrics entry
is too small and dangerous and we do we
do night editions so for example with X.
and Y.
randomly we compute the decades it is not
smaller in city to infinity because there
might be some other fake it which is not
small the decade here again if it's not
small instead of doing then we'll compute
the mean plus progress for the entries
are smaller than in Sept give it time and
after that we'll check if the number
of bad triples that only meaning to be
called Bodies have morphed
into the three minus ten or.
So after every step we are taking
all are there somewhere
in the back triples that remain
to be called Bodies have to be
if in fact given the new stuff we
look at the material that's it and
we can show that the number of these
I did ations is really small and
very interesting in that taking
time is into the ticking time for
about triples as I mentioned it's cubic so
very interestingly.
There's a number of repeat Asians
suppose A N B R I return to matrices and
you are following the steps you
can show after this few weighted
ations you are left with a bad
number of bad triples that is sub Q.
weak.
Even funny I return the matrices and
you do not need to have
bounded difference property.
So the boundary difference
property does and
then also to find those back triples
fast and that's a part of and
we're stuck now for our if we traded in
matrices we just don't know how to find.
These back triples though there are a few
bad triples there among the same Q.
triples we need a mechanism to find them
even though there are a few battles left
we just don't know how to find them far
beyond a difference when it is exactly
where you think about a difference
Robert in North Wales
adjective approximation you can do it and
these I did ations everything works.
OK So this is how close we are to us
getting a truly Sub Q We go to go to
the informant less so this kind of
the looking and mostly in these days and
we have some extensions of it where
we can get mean we go gunning for
our generalisations I mean less
product using this framework and.
We are hoping maybe this is not to
really kick this will of this week
so I haven't really talked
about a lot of problems.
For the.
One Above four hours of so
gusty context we got our buzzing language
at a distance with the insertion these
are all of a P.S.P. hard which means.
This we get this breakthrough and
last product they really meant to be card
here X.M.L. repairing a distance out and
if we can show it.
Lower bounds of its shares and we'd say
is there as hard as computing Woolley and
metrics multiplication.
For approximations well for language
a distance we do not know is going to hit
that hardness yet we can solve it is up to
reclaim but is it into the two point five.
One we don't know really don't know for
approximation of Guardians
here we have a fast metrics my using
fundamentals multiplication we have
one plus main is absurd an approximation
for a bag at a distance of a B.
Indeed they wouldn't have allowed me to
use for as much as multiplication X.M.L.
appearing so far that I do when we do
have our linear payment darling giving
close to optimal levels so
that proclamation when in the early entry.
So there was this question which.
Obviously was in fast metrics
multiplication is not the most practical
thing to do if you are looking for
practical applications and language at
a distance does have a lot of practical
applications and we also have a lot of
bones would say is that you if you have to
get any non-trivial math multiplicative
approximation you have to use fast metrics
multiplication so do we do any of what
Magic's multiplication metrics
multiplication and maybe get additive
approximation since the multiplicative
approximation is not working so
we do have some recent progress there so
we do get a quadratic time
deterministic Gogarty them that is gone
when it really means it does not do as far
as metrics multiplication for language at
a distance are in a folding approximately.
The ship ticking and languages and so on.
We'd gives you an additive approximation.
And we also have some space subs
Linnaeus Bizzle guardians for
language at a distance but for other
problems as well like long as increasing
subsequence string at a distance and so
on and their Vista and general technique
which you call amnesty dynamic programming
basically looks at a dynamic programming.
Relies on again this bounded difference
property in relying on this bounded
difference property it gives you
an additive approximation which directly
improves the space and
then complexity of Benami programming so
what all of this works feasts into the
area that I'm really excited to work on
which we call fine grained approximation
algorithms so fine then complexity is
about understanding the complexity
of polynomial game problems and
here we are trying to understand their
proximity how fast that understanding
the tradeoff between quality and speed for
volume problems and also understanding
the hardness of approximation and
that's a year that we are calling find
an approximation and go in there as
organizing a workshop in this area.
So if you're excited if you
are interested to know.
If you're interested Well this is
a good doctor me of life OK And
there are many future directions so
thanks.