it is joint work with Chi Chi Kok in
tightly and actually one man chandran
thanks for the invitation so I want to
talk to you about a person problem to
introduce the problem to you and stay
some motivations and then the proof has
two parts the first part is about
continuous up where the scaling is a
continuous version of the alternating
scaling algorithm I would explain to you
what is the algorithms and to maybe more
detailed analysis of the first part and
then the second part is the smooth
analysis I just give you some high-level
ideas of the proof and then I would
conclude with some discussions ok
so let me first introduce what is a
frame a frame is just a collection of
factors even up to V and human after un
in d dimensional space that span span
out e and then it is called econom fame
if all the vectors have the same length
it is called
a possible frame if some of the outer
product is equal to the identity so you
should think about this condition as as
a generalization of an orthonormal basis
when X is equal to T sum of UI UI
transpose is identity is just a normal
basis in this talk you should think
about n is much bigger than B then it is
an over complete basis and people use it
in in complication theory you still have
the property that if you have a vector
you project into those basis vector and
then you can recover the vector so
people send those coefficients to the
other parties and because it's over
complete you can lost some coefficients
and still be able to recover the
original signal so there was some
motivations for studying frames and they
have applications in communication
theory quantum information theory I
don't know much about those
so what is the motivation for the person
problem we now we want to construct a
frame which satisfy the economic
condition and also the parsifal
condition and it's not so easy to
construct them and they only feel
algebraic constructions of those frames
for given n given T there are not many
known constructions I think it's harder
when maybe n is smaller and for all
range pause and asked this problem at
that time he even wants to construct a
frame called grassmannian frames not
only econom possible frame but also
satisfy the constraint he wants to find
a frame such that for every pair the
inner product is small economist if old
frame which minimized the maximal inner
product and those frames are even more
difficult to construct but on the other
hand it is easier to construct a frame
which approximately satisfy both
conditions approximately equal naam
approximately possible for example we
can generate random unit vectors and
then by matrix concentration you can
show that it is almost possible and and
you also have good angles between
between them if you generate a random
frame so then he asked a question if we
have an approximate frame approximately
economical frame can we just move the
vectors a little bit and then it becomes
economical and hopefully it doesn't
change the angle of any pair of vectors
and then he you would be able to
construct a good frames and those angle
correspond to efficiency of signal
processing applications they have in
mind okay so that was the motivation of
of the patent problem
so formally what is the plasm problem is
we are given uh frames they
approximately satisfy the econom
constraint each factor is between 1
minus epsilon and 1 plus epsilon times T
over N the norm Square and you
approximately satisfy the possible
constraint is between 1 minus epsilon
and 1 plus epsilon identity and now he
asked what is the best function that you
can prove in terms of NP and epsilon
such that you can move this you into
into a frame B such that we satisfy the
two constraints exactly and then you
want to pound a movement so if you're
given such a you can you find a V such
that you satisfy the two constraints
exactly and you want to pound the
distance between the movement the
movement is defined as some of the UI
minus VI square the norm Square and you
want to find the path function to to
pound it okay so that's his formulation
was the best function that you could
prove you okay
so what are some PVS without there were
two previous work before before without
the first one has an interesting
assumption and also an interesting
conclusion you need to assume that T and
and they are relatively prime but then
they prove there is a polynomial bound
on en and the interesting conclusion is
at lon Atlantis at leat epsilon square
is a it has a good dependency on epsilon
and the idea was to define a dynamical
system which always keep the econom
constraint and then improving on the
passive all constraint getting closer
and closer to possible but always
satisfying the econom constrain and then
there's another without which doesn't
have the assumption that the relatively
prime
and you also have a polynomial
dependency on TNN and dependency on
epsilon is much worse
think about this we thought is worse
than the first without maybe the whole
thing
two to the power one one over seven
something like that and they use a
gradient descent which oh I think I mix
up one keep the passive oh and improve
on the econom this one keeps the economy
and improving on the passive oh and then
there are some examples some simple
examples showing that you need the
movement you need is at least a times
epsilon okay and then the open question
is can the function be independent of of
n so this is the worst example maybe you
can hope to achieve P epsilon is the
right bound but so that was the open
question okay so we we prove we answer
this question positively we show that
you can always just move the factors by
a factor of e to 6.5 times epsilon and
then our proof has two parts first part
is we define I would talk about what is
the operator scaling algorithm the
alternating algorithms and we define a
continuous version of that and then we
can show an upper bound a square times M
times Epsilon
I have a possible frame and then your
sample would be close to pass and then
in the second part we would to a smooth
analysis we would take the input
randomly perturb a little bit and then
we we use the dynamical system to move
tobacco's so second part we would to the
input you a little bit before applying
the continuous operators Gaming
algorithms and I saw that convergence
would be would be much better if we I
don't know we we don't analyze their
algorithms I think it's algorithms
specific it's not a generic and analysis
and I should mention this without
recently the next Hamilton and anchor
Matra improved this without they prove T
square times epsilon and also the proof
is much shorter than us much simpler and
much shorter and with a swank amount so
so you can relax in this talk you don't
need to take this talk very seriously
and then in the end I would discuss
their proof I would still show our proof
anyway and
in the end I will discuss their proof
and then compare the two approaches and
there's still some advantages of our
proof and I want to please cast that
okay so let's see the continuous
operator scaling algorithm okay so if
you are given this question how do you
move an a-plus mainframe to satisfy the
two conditions exactly this problem is
only difficult because you have to
satisfy both conditions simultaneously
if I just want to satisfy one it is very
easy if I just want to satisfy the
econom condition I can just we scare the
backers and if I want to satisfy the
possible condition is also easy I just
need to do this linear transformation of
the factors and then you can check that
some of the outer part at what becomes I
so this is a standard linear
transformation so and then a natural
algorithm is just to alternate between
these two steps so if you if I'm given
some input first we scale such that they
satisfy the econom and then after that
we apply this linear transformation to
satisfy the passive oh and any column
constraint would be violated and then
you fixed and then you keep fixing
alternatively and and want to analyze
when it would converge so our first
observation is this is actually a
special case of the alternating scaling
algorithm for operator scaling for more
general problem and this algorithm is
analyzed so we can hope to apply there
without tool to understand the person
problem and then how to how to apply
there without first idea was I keep
track of my input I apply this
alternating algorithm and and then it
would move and then I am interested in
suppose I can rescale the backers such
that the economists evil
I want to measure the distance between
them but instead I can bound the total
movement so I keep track of the
algorithm how much the backers move I
sum them up and you see as an upper
bound on on the distance between between
the input and output so that was the
idea
but before I explain further let me
introduce to you what used to operate a
scaling problem which is a
generalization of the frame scaling
problem so the operator scaling problem
is the following an operator is just a
collection of matrices u 1 up to UK
could have many many matrices each of
them is M by n and so this problem was
introduced by covers so now given those
you as input I want to find a left
scaling matrix M by M right scaling
matrix and by N and and if I scale each
matrix UI by multiplying the same
scaling matrix on the left and right now
I want the resulting matrix to satisfy
this doubly stochastic condition sum of
V I VI transpose is a scaled version of
the identity and sum of VI transpose VI
is a scaled version of the identity so
this is the operator scaling problem for
for some constancy okay so sometimes
such scaling matrices don't exist if it
exists then you want an algorithms to
find it okay maybe it will be more
abstract I would explain to you how to
reduce same scaling to operator scaling
and I will say as operators satisfy
these two conditions are doubly balanced
okay
and also maybe would be good to motivate
this problem by showing you some
applications a simple special case of
operator scaling is
matrix scaling so matrix scaling is
useful in preconditioning for linear
servers to solving the optimal
transportation problem you can also use
matrix scaling to find a perfect
matching in a bipartite graph although
it doesn't give you a fast fast
algorithms faster than than the best but
it's also useful in designing
deterministic approximation algorithms
for the permanence it would be an
exponential approximation but it is
deterministic
so frame scaling is another special case
the case that we are interested in
finding an econ impossible frame is same
scaling so it was used earlier in
proving actually communication
complexity lower bound giving a lower
bound of the SCI rank of the hardiment
matrix and it was used in machine
learning robust subspace recovery and
now we use it in the person problem
there is a common generalization of
matrix scaling and frame scaling they
don't give this name but I just call
them PSD scaling and it is useful in
approximating the mix determinants of
matrices and then the most general form
is operator scaling and an operator
scaling was study again in 2016 their
motivation is income computing the non
commutative rank of a symbolic matrix so
before this operator scaling algorithm
we only have an exponential time
algorithms for this problem but now we
have a polynomial time algorithms for
computing that non commutative rank
computing the Brassica deep constant and
and also recently was used in an orbit
intersection problem in invariant theory
and actually in their paper they use our
without on on the more general version
of of the person problem we have a
version of the person problem defined in
the operator setting
it's just a brief introduction the
scaling problem is a simple problem but
it turns out to have many applications
and if we want to design an algorithm
for operator scaling we have we can
generalize the simple alternating
algorithms by just repeating these two
steps if we want to satisfy the first
condition you are UI transpose to be
identity we can just we scale as in the
frame case and then you can easily check
that if you be scale like that then sum
of UI UI transpose would be identity if
we want to satisfy the second condition
we can we scale in the right and and
then you can show that you will satisfy
the second constraint and then a natural
algorithm for operator scaling is to
alternate between these two step and
hope that you would converge so in the
reasons paper is not obvious but then
they show that the operator scaling
problem has the formulation which is
geodesic the compacts not convex in the
standard way so a natural algorithm is
just to alternate between these two
steps and hope that you would converge
so let me show you why the famous
scaling that we are interested in is a
special case of operator scaling it's a
very simple reduction I'm given a bunch
of vectors UI in P dimension for each
Packer I I just create a matrix UI such
that the i's column is UI all the other
columns are 0 and then these conditions
if you look at it or the serial carbons
doesn't matter
and then this condition just becomes the
possible conditions you are you a UI
transpose and then the second condition
UI transpose UI all the rows would not
matter and then you would have the the
norm square in in the I diagonal term
and then UI UI transpose UI you would
cut this matrix some of them you would
have the norm in the diagonals and if
you require to be identity you're just
requiring it to be to be econom so the
frames scaling problem can be reduced to
the operator scaling problem so in the
first part of the talk we would actually
solve the problem in this more general
setting we would actually move an input
operator which is which is close to
satisfying the two conditions to exactly
satisfying the two conditions so we
consider this operator person problem
which was also used in in in a recent
work we are given the input u u1 up to u
K and then they almost satisfy the two
conditions they are almost identity and
almost a scale identity depending on the
size and then we want to move those
input into VI up to VK such that they
satisfy the two conditions exactly and
we want to bound the distance bound the
total movement the total movement is
defined as some of you I minus VI the 14
years norm square instead of 2 norm
square it's a natural generalization and
then what is the best function that you
can upper bound it by M and K and
epsilon so in the first part of the talk
we would prove that it is always upper
bound by M square times n times epsilon
for this more general problem okay
so let's come back to this idea that we
want to keep track of the distance
between the input and output by the
total movement so this idea doesn't
doesn't work directly because for
example the examples if I'm given this
input you would not converge
so when you scale for the econom you get
those backers and then when you scare it
back to the passive all you would get
back to this and then they were just you
won't converge to to the solution and
even though so those examples are rare
but even though you don't consider those
examples you can easily imagine that
there would be an input that is scalable
but then if I if I measure the total
movement the path was 6 AK a lot and you
would be a very bad upper bound on the
final distance so this idea doesn't work
because the path was six AK a lot and
and we would not be able to give any
meaningful upper bound so to fix to fix
this problem our idea was to to define a
continuous person we don't make a full
step alternatively instead we would do
both step simultaneously and we also
just both continuously instead of making
a large step ok so let me first state
the algorithm and then explain how we
come up with it the algorithm is just
defined by these differential equations
how much we change the I matrix depends
on this equation so this is the left
scaling matrix and then that is the
right scaling matrix of of the UI and
and then we do it at the same time so we
just add them up and there's a new
parameter s here as is defined as the
size of the operator which is
just some of the phobias norm square of
the matrices topic is is the is the
input so to come up with this we just do
the following in one step of the
alternating scaling algorithm we apply
this linear transformation and then what
we do is we first scale this matrix now
if we do in this discrete version we do
this step if we do the continuous
version we just want to move a little
bit and what we do is we just scale this
matrix we multiply by this factor M over
s such that this matrix is close to the
identity matrix and then once it's close
to the identity matrix we just use the
Taylor expansion of of the matrix so
instead of writing it as an inverse we
can write in this more convenient form
and then we do the same for the right
staring and then we just add them up and
and get this dynamical system so that's
how we define the continuous version of
operator scaling yeah that's a very good
question we can realize this connection
but now we understand it it is actually
a gradient flow off of natural potential
function that I will introduce in in in
a couple slice
if it is not scalable
I would also actually mention in a
couple slides if it is not scalable we
would just rank it to two size zero
I would so now the high-level idea is
instead of bounding the total movement
of the discrete algorithm we just found
total movement of this continuous
algorithm we want to keep track of the
path length and use it as an upper bound
of the final distance and then this is
how we define the distance and now we
have a system which matrix is changed
over time and then we want the input is
a time 0 the output is at time infinity
and we want to measure this distance and
then we can write in this form and then
we can just pound it by triangle
inequality so this is just the path
length so if you don't look at the
equations I'm just saying what I said in
the picture now we can bound the final
distance by the total movement at each
time so I want to pound it I call this I
call this time the local movement at any
local time what is the movement of the
matrices so now this is an important
definition which is going to answer your
question on gradients flow how do we
keep track of the progress of the
algorithms is by this potential function
potential function that error measure of
the current solution so this is the
first term we want this time to be zero
such that this thing is a scalar
of the identity so this the first matrix
is the era of the first condition the
second term is the era of the second
condition and then we just look at the
four piñas none of these two matrices
and then we add them up so it is a
quantity called Delta so before for the
person problem we measure the era by by
this epsilon and now to analyze the
algorithm we we use this data so first
give you some intuitions in a person
problem we say that input is close to
identity by this measure Epsilon you can
think of it as L infinity norm bound on
the eigenvalues we want the eigenvalues
of this matrix to be between 1 plus
Epsilon and 1 minus epsilon is an
infinity norm bound this Delta you
should think of it as an l2 norm count
of the eigenvalues so this is a most
outer which means sum of the eigenvalues
minus 1 square is at most Delta so
instead of working on the our infinity
norm bound which is more difficult to
deal with we we work with this Delta
bound
the l2 norm error so by the way this is
a definition which is defined by a curve
s and it was also used in previous
algorithms previous analysis of the
operator scaling algorithms so we look
at this data instead of epsilon so from
now on we won't talk about epsilon too
much we would just focus on Delta and
and then there are some simple
properties Delta is 0 if and only if the
operator is doubly balanced the two
constraints are satisfied exactly so and
then you can easily show by this
standard L to infinity
arguments that Delta is at most M square
times epsilon square you can upper bound
Delta Phi Epsilon
losing a factor of M and then now we
will focus on proving the total movement
from now on we were both away from
Epsilon
is much easier to work without her we
will prove that the total movement is MN
Times Square without her and M PI this
bound it would just be M Square and
epsilon what we want to get in the first
part and and now in high side we can
understand the continuous operator
scaling algorithm is doing gradient flow
on Delta so at any time I look at the
Delta and I want to move in the
direction that minimize Delta and that
is exactly our algorithm so we didn't
know that at that time but now in
hindsight we we know and then it also
simplifies some proof a little bit by
knowing this connection so is a is a
natural gradient descent algorithm
minimizing Delta so now we want to
analyze these these continuous
algorithms and we have found some nice
identities so the first one is what is
the change of the size is just equal to
minus Delta so if the error is bigger
than the size would decrease would be
creased more so the size would always be
not increasing and then the second one
is how much the total changes this is
the quality that we care the most we
want the error to decrease a lot the
change of the Delta is equal to the
minus of the total movement the local
movement so if as some time we move a
lot we would also decrease Delta a lot
so now with the connection to gradient
flow this lemma you can understand it
using gradient flow and we don't need to
provide a proof so first I want to claim
that the dynamical system would always
come
to adopt a balanced operator and then
the proof is simple we just show that by
level one the size is decreasing by
lemma two we would say that the second
derivative is positive it's non negative
so it is compacts so it would eventually
converge to a point when Delta which is
equal to the change of the size and then
change of the size would become zero at
some point and then Delta at that time
would also be zero so in the case when
the input is not scalable we would we
would show that the size would become
zero and then we can argue that for
those input Delta has to be very large
and then you can do whatever you want
you can just output any econom Parsifal
frame and then and then the bound will
be within M square n times Epsilon okay
so this is the important lemma because
it shows us how it ties the change of
the error to the movement that we made
so remember we want to bound this term
this is the distance sometimes you don't
need to look at the formula this is just
the distance and we round it by the
triangle inequality of the local
movement and then by this formula
instead of bounding the movement we just
need to bound the change of the Delta
square root of that so now I want to I
just take square of both sides I want to
pound this term and then the next idea
is instead of bounding from time 0 to
infinity square root of the change of
the Delta we found we define what is the
half time half time is the first time
when Delta becomes Delta over two and
and then we can just use a geometric sum
argument to say that if the total
movement at that time from zero to T
four times of that would be the result
of movement from time zero to infinity
it's a simple geometric sum argument so
we can focus on the time pounding the
half time then you would be easier to
pound this time and then to pound the
term on the right hand side we just use
a simple Cauchy Schwarz inequality it
would just be and then that would be
bounded by t times delta so we have a
pretty simple form to bound it oh by
first reducing to the half time and then
for the half time we just use a simple
couch this was and then we upper bound
the total movement by t times delta so
now what is important is to pound this
half time we want to pound the time for
delta 2 becomes Delta over two is small
if we can show that we can pound the
total movement of of the algorithm ok so
how do we pound it it is not it is not
obvious and then we have to use
potential function again define pile
covers so curve s come up with other
right definitions and I think this one
is the most non trivial one he divides a
capacity of of the current solution as
this minimization problem which I don't
have a easy to explain
intuition about this definition
which is related to some definition from
quantum information theory based on
divergence but yeah so it is an
interesting question what is this would
it correspond to some maybe KL
divergence of quantum entropy or some
time so for our purpose so for their
purpose so this function is nice it
satisfy some nice properties and we know
exactly how you would change if we scale
by left and right matrix and you can
keep track of the change and for the
purpose of this talk we just show that
the capacity is unchanged over time this
potential function is unchanged over
time in their applications yeah yeah our
input keeps changing but the capacity is
unchanged in their algorithm they use it
as a potential function they prove an
upper bound of a capacity they proof a
low about of the capacity and then they
show that each step the capacity
increases and then they use it to bound
the number of iterations of the
algorithm so let's skip the details the
keep on is because these two matrices
they have to trace 0 and then and then
you can show that capacities unchanged
and then we can also prove upper bound
and lower bound of the capacity upper
bound is by the size lower bound is by
the size minus MN x squared theta
basically this MN squared Delta is
correspond to the total movement of off
of our problem so this proof is also not
new we just adapt to prove in their
paper to to our setting in the second
part we have some new methods to to
prove improve bound on the capacity so
one implication for example is when
a time T T infinity then delta T is
equal to zero so then we would have as
the size of infinity is equal to the
capacity in infinity and capacity
doesn't change so what is the capacity
of the original operator is just equal
to the size of the output so we would
come back to this fact and talk more
about capacity later but for now we just
use these two identity capacity doesn't
change and then we have upper bound and
lower bound of the capacity to finish
the proof to pound a half time the
algorithm is a little different so in
the discrete algorithm the size doesn't
change and then capacity would increase
in the continuous algorithm the size
could decrease but the capacity is
unchanged it's not exactly the same
algorithm so now we can come back and
use capacity to pound a half time so
capacity is unchanged over time and we
have lower bound and upper bound of the
capacity so it implies that if we apply
the upper bound at time Peck T the size
of T upper bound the capacity at time T
capacity doesn't change so is capacity
of time zero and then we apply the lower
bound which is at least size zero minus
MN square with data so what does it mean
it means if I combine the two without it
means the size wouldn't be creased too
much is upper bounded by MN square root
Delta if the initial error is not too
much I know that the size of the
operator won't decrease too much so now
I can use it to bound a half time
because on the other hand I also have
this the change of the size is equal to
Delta minus Delta so suppose for a long
time
Delta hasn't decreased without her
- then it means that the total decrease
of the size is lower bounded is T times
Delta over two so it means the size
would be creased by at least this so
then we just combine the upper bound and
lower bound and then we would say that
the time T at time to MN over scroll
without her Delta would becomes Delta
over two otherwise we would have a
contradiction so then we also have an
upper bound on on the half time and then
the total movement is just t times delta
which is MN times crowd data
ignoring some constants here so that's
the proof of the first part so let me
just summarize proof we want to bound
the final distance we use triangle
inequality to found the path length and
then to pound the path length we have a
nice identity to relate it to the change
of the Delta and then we use half time
and then geometric sum and then crush
this was and then just say this is at
most T times Delta and then to bound
that half time we use the argument
incapacity we used without in capacity
to pound a half time and then the final
without is just our L to vs L infinity
the two error measures so then we have
we have this without for operator
scaling so that is the first part I
think maybe it's too technical the
second part would be would be would be
in a higher level so first I want to say
that smooth analysis would only work in
a frame setting we also want to make it
work in the general operate operator
scaling setting but we don't know how to
do that yet if I want to summarize the
result in the first part I can say if we
if you have a good low
about on the capacity which is at least
s- some function of TM tara proof in the
in the first part shows that that would
be an upper bound of the total movement
so if you can improve this time you can
improve the total movement so now in our
second part we focus on improving at
this time we proved before we proved it
was TN x squared data and now we want to
prove that if we slightly Pro to the
input then we can prove a much better
bound on on on the capacity so the proof
is a little complicated but let me try
to explain the high level ideas so our
intuition was just it's very difficult
to find examples with small capacity so
idea was if I just randomly perturb the
input then the worst case would be gone
and then the dynamical system would
converge much faster so is what we are
going to do someone give us an input of
the operator who we just put up a little
bit and and then we apply the continuous
algorithm in the first part and then and
then we found the final distance by the
sum of the two part of the movement of
adding noise and then after that moving
to to an econom possible frame okay so
for this analysis to work we need to
prove three things so after I get to
this part of the input from the first
part the movement in the dynamical
system would just be a function that we
can prove in the capacity FPN and then
the new data so I want to prove that I
don't need to add too much noise
because if I add too much noise I would
already move too much
in the first step so I want to
upper-bound the perturbation movement so
this is easy to pound and then the
second part is a little tricky we want
to prove that if we add noise that Delta
won't increase by a lot because if the
Delta increased by a lot and the second
step would would would move a lot and
then the third part is we want to prove
that after we put up there that has an
increase too much and at the same time
capacity has increased a lot and then
using the analysis in the first part
capacity have a good lower bound and
then the total movement would be much
smaller so that is the plan first part
is easy second part let me just quickly
explain what is the perturbation process
and and explain how we deal with that
outer usual so basically how we add
noise is just generating random Gaussian
backers so I have input UI I want to add
some noise to UI I generate error vector
GI in Rd each entry is independent
Gaussian noise with variance Sigma
square and then to deal with the data
issue so this is a technical part we
want the noise so first we generate
random noise and then we project the
noise into this subspace that the inner
product with UI has to be 0 and then
have to has to satisfy that conditions
and then we just set the new vectors to
be this UI plus the Gaussian vector
project into the subspace that satisfy
these two constraint I what what why do
we need to do this projection because we
want the data to be roughly that always
no data because if you expand the new
data then there would be some cost term
then there would be some quadratic term
quadratic term would be small
and then this two cars Fang is to make
sure that the cross terms to be 0 that
is what we are trying to do and and then
we have T 2 T times and random variables
and then those constraint is just n plus
P Square dimensional so we still have
enough freedom we just project into it
and and let's just say we pretend after
that we just pretend the noise
independence Gaussian so this is how we
deal with the second step the third step
is it's the most important one how do we
prove an improve capacity lower bound so
how do we work with capacity there is a
reduction from operator capacity to
matrix simple matrix scaling case
there's a reduction from this capacity
if you look at the eigen value
decomposition of X and then this is the
output you can reduce to the matrix
capacity case the capacity of this
matrix is equal to the capacity of the
operator and then when we were working
on this problem a lot of the time we get
intuition from from the matrix scaling
problem so now I look at the matrix
scaling problem and then I want to say
that after I add noise to the matrix
then the capacity would improve a lot so
we want to use the techniques in our
dynamical systems to pound the matrix
capacity so in the first part you can
think of it as we have a capacity lower
bound the capacity lower bound would
indirectly argue about the convergence
of data would pound a half time Delta
would becomes Delta over two we cannot
take too much time in a second part we
would directly argue about the
convergence of data and then use it to
say that we have a good capacity log
around so how do we prove it we want to
say if the instance has some random
property
then we can directly argue that the data
is exponentially is its linear
converging so it's geometrically
decreasing we can directly say that the
data decrease of data is at least some
cousin fashion of that are not constant
but some some fashion of data it won't
be a constant in the end and if we can
do that we can use it to argue about the
capacity because what is capacity as I
said earlier capacity would be equal to
the size at the time infinity and then
we we can show that if Delta is
decreasing so quickly the size wouldn't
decrease too much and then we can use it
to prove capacity lower bound so
capacity doesn't change and then I know
that capacity is equal to the size
that's something that I mentioned before
and then I can tear actually bound a
size change what is the size change is
just the change of the size over time
change of the size at each time T is
just equal to the power of Delta of T
and then I know that Delta is X is G
magically decreasing so Delta at time T
is at most Delta at time 0 times this Z
to minus mu T and then you would just be
at most Delta of a meal so we can we can
use it to bound the initial capacity if
we can show that our our system is
converging linearly so we use this and
we use it to show that there's a improve
capacity lower bound and then how what
is the condition that we can guarantee
this the condition that we use is in the
matrix case is a comedy toriel
definition pseudo random property we
need to guarantee that we have a fast
convergence of of of the continuous
algorithm is this that most entries of
the matrix
is at least segment square that most
entries are nonzero and cannot be too
small if we can show that then we can
show that matrix scaling would actually
converge faster then the convergence is
Sigma square times n times Delta and
then roughly speaking this factor of n
is where we get where we gain to remove
the dependency on n of the total
movement so this part is a is based on a
company Torrio argument to say if our
matrix satisfy this property and then
you do this dynamic on the matrix
scaling on those input Delta would be
creased quickly and then Delta would
decrease quickly would implies that the
capacity of this matrix is high and then
it would correspond to the fame
corresponding to that matrix also has
large capacity so let me just summarize
proof I know that if I have a capacity
low about of the fame by the previous
part I would have the total movement
amount but I don't know how to directly
argue about a capacity lower bound in a
frame is that we would do this reduction
from fame to matrix and then if I can
argue about a capacity lower bound for
this matrix and the same lower bound
applies for the frame what we do is we
took the input of the frame and then we
follow the reduction from the capacity
from the frame to the matrix and then we
say after we follow this reduction the
resulting matrix would satisfy the
pseudo random property most of the
entries will be at least Sigma square if
we ask Sigma Sigma square noise in the
frame and then once we have this pseudo
random property we can directly argue
about the convergence of Delta and if we
have the convergence of data we have
the capacity lower bound water for the
matrix and then we would have the
capacity lower and fall for the frame
and then you implies the total movement
is small so that's that's a little messy
mainly because we don't know how to
directly argue about the capacity on on
the frame we only know how to argue
about capacity in the matrix so if we
put in more concrete pounds this is what
we can prove after we add Sigma square
noise to the frame and then we would
have the pseudo-random property for the
matrix we can have these conversions of
of Delta in the matrix case it would
implies in a matrix cake capacity lower
bound is this same lower bound for the
frame and then the distance would be
Delta over Sigma Square and and and now
we just need to choose a sigma square we
choose Sigma squares to be this and then
you can show that a balance that two
times the movement of etic noise in the
first part would not be the significant
part and then the second part dominate
and if you put in this bound you'll get
T to one point five square root hour but
then we for analysis to work we have to
make some additional assumption better
to be small enough and to be large
enough and then and then and then put it
together we just get D to six point five
so that's that's the proof of the second
part so now I just want to wrap up the
talk
so so our analysis of the continuous
algorithm it's not only useful in
solving the possum problem it can
actually be used in analyzing some
mathematical quantities of for for
scaling problems so what are some
quantities that people care about one is
for example bounding the condition
number of the scaling solutions if an
instance is scalable you want to know
what is the condition number of the left
and right scaling matrix and in some
cases if you want to analyze your
algorithms for example there's a new
algorithm based on this geometric convex
optimization to solve this problem and
then their running time depends on the
condition number so we have some
technique in bounding the condition
number of scaling solutions and also as
you see in a second part our result
would also give bound on on the operator
capacity the capacity of a frame and the
capacity of a frame is closely related
to the price conduct constant of the
frame so our techniques would also give
us some ways to bound these mathematical
quantities in our case we showed that if
we satisfy this pseudo-random property
then we have better pounds so if we can
identify some nicer properties that we
can get better bounds we there would
also be some implications for those
problems so recently we are thinking
about replace these pseudo-random
conditions by special conditions so for
example if I'm given a matrix if the
matrix has a special gap then we can
show that matrix scaling would converge
faster and and then there would be some
some implications for photos problems
then you would implies condition number
is very small and we have a very quick
algorithms to find a scaling solutions
some open problems I also want to talk
about the recent without one open
problem is proofing the optimal balance
epsilon now we are already very close we
have P square epsilon want to know
whether it is any efficient algorithms
to solve the plasm problem which is also
banned by the recent work the t square
epsilon they also give an efficient
algorithms to solve the problem so what
is the they are proof instead of tracing
the total movement they just directly
found the total distance between the
input and output and then they have some
very clever idea of using a nice
distance function and then just make
everything works very nicely and in
general we also want to do that smooth
analysis of operator scaling so our
approach would also work in a more
general operator setting I think it
would be interesting to see whether the
shot
proof would also work in the operator
setting as well so thank you for your
time