We spend a lot of time in our research
group thinking about a very particular
problem but it's a very general problem
it's like how can we sort of meaningfully
solve systems of equations that
have a few observations and there
are no it's right and so there's sort of
mention there's been a kind of an entire
body of research that's kind of built up
around this problem this has the clever
moniker of compressed sensing attached to
it and sort of the motivation for that for
that name is like you know a lot of the
work is sort of developed in mathematical
signal processor you know which I'm a part
of and there we kind of tend to think of
this thing as the you know we have a long
vector which is some you know very
high resolution or very high bandwidth
signal or image that we want to acquire
we think of this matrix as being some
type of like data acquisition device and
then we think of what comes get spit out
here is the data that we measure right and
so there's sort of a cost to
each one of this matrix is
another sample I have to take another
sensor I have to put into my system except
OK So you know meaningfully
solving the system of
equation over the space of all
that there is is you know.
The first day of linear algebra but.
What we can say is if we assign X.
a type of structure then we can meet and
meaningfully invert systems like this so
I'm going to talk to about two types
of structure and then in doing so
I'm going to just point out two nifty kind
of mathematical applications for which we
have sort of very strong mathematics very
strong theoretical results OK So first
the first of a search I'll talk about I'll
just mention really quickly is sparsity So
basically this means that X. has many
entries in it but only a few of them
are important so this is a problem model
we use all the time it's signalling trust
in this is a terrible picture to come out
but here is you know how we use it so
here I have a form of thing that
is a vector of length four million
there is something called the wave of
trance or it's not too important what that
is with Basically it's a semi solitary
which maps four million pixels and.
Four million expansion coefficients and in
doing so it tends to take things that look
like photographs and compact them into
a compact the energy on to just a few
coefficients right and so
this energy tends to be located.
Sort of in correspondence to where
there are edges in the image right so
that says OK Not only is this good
sort of map this this image to.
Another vector which has only a few
non-zero entries as I don't really
know where those nonzero entries are going
to be right because these sort of edges in
the images are going to be in different
places at different times OK so that's.
What we mean by sparks or
some transformer can put on my vector
that compacts the energy OK So
here's the upshot So there's four million
pixels here I take thirty thousand
of the biggest expansion coefficients
zero the rest out reconstruct and
that's kind of the projection on the forty
dimension thousand dimensional space and
you don't lose much OK so that's what I
mean by sparse so then you know we can ask
you know when can we recover signals that
have a certain level of sparsity say that
our sparse and it turns out the answer is
very simple so what we need is kind of
what you would expect we need this matrix
to just keep a sparse signal separated for
one another if I look at the distance
between to us for signals I want that
this to be approximately preserved
if I look at them through thought.
OK So that's this is something which is
called the restricted are some of your
property but really what it's just saying
is like these two different sports
that will look different when I
look at this matrix for OK And
then what if I have a matrix
which has this property.
So this lower dimensional projection
there is almost by definition you could
give an observations why you could recover
enough for a signal simply by saying
OK look you know let me give me
the sparseness possible that there that
could explain these measurements that
I've observed so you could set up.
Program which looks like this and
you can have confidence that this program
will find the right answer because if
there were sort of a sparser vector which
explain the measurements that you had
there'd be a way to move to that Dr
from the from the from the real solution
that lived in the no space of five that
you know you from this kind of disallows
to aspire specters from
being in that space OK so.
You know one of the primers alters
we can actually replace that
that sort of what's the sparseness factor
with the program we can actually solve so
there's a nice convex relaxation
it's based on linear programming and
instead of say looking for the sparse
Sparks's factor explains a set of
measurements we can look at one which
minimises the sum of the magnitudes and
so it turns out this is not only
convex is just a linear program and
the reason like just really
quickly you know why L.
one find sparse things you think of
are constraints as being a hyperplane and
then solving this program is blowing up
some type of geometric ball the ball is
like pointy along the.
Directions where we have sparse vector so
it tends to bump into that I play along
those along those axes OK so
there is still the first part of
this press sensing theory is that
I can if the matrix has a certain
property sparse it was separated I can
recover spore signals from under the chair
measurements using linear programming the
second part simply says you know what type
of major cities obey this property
turns out they're hard to construct but
that they exist everywhere so basically
if I take a random matrix filled.
With random entries then the number of
rows here that I need to recover in a spar
specter it scales linearly
with us would get a question.
Right.
So formally It just means this if I look
at every pair of a sparse vector so I.
Really look at the sum of the squares
in the after post their side and
some as with others nose are almost
the same so that that's all and
that's all I mean by something that's kind
of what it means to be a well condition
map you know no matter what So
it turns out that that is really enough to
talk about OK so that's what you
need a matrix and it turns out that
random major C's do this very nicely so
I mean by very nicely so
if I have kind of S. S. entries in my
vector I'm going to need at least S.
you know observations to figure out what
those S. numbers are so what do you say is
even if I don't know where they are you
know I can both find them and reconstruct
their values from a number of rows out of
a random matrix that looks almost like.
Factor so it's sort of like random H.
or C's give you this nice
compression over the space of.
OK So let's look at an interesting
application of this and
this is a little bit.
Not not commonly what we see
in talk of this nature so
here's kind of a stylized
channel coding problem right so
I'm communicating something
from point A to Point B.
as this my message is traveling through
space somebody is going to come along and
change some of the entries right it might
not be somebody it might just be due to
some interference that I have
when I transmit my message or
it's this kind of like a stylized version
of the communications problem OK so
to protect against errors being made
we're going to do is I'm going to take
the message I want to send passing through
an encoding matrix all this is a Paul
when we are betting right and so then you
know I protect it a couple of the entries
get changed and now I want a way to
decode it on the back side OK So
this is in some sense a simple
communications problem the structure
we're going to use to solve this
code this is in this case an L.
by and
code will say this code book is random and
the message that we're sending
is arbitrary just an arbiter.
Vector and R.N. and we'll say this factory
is sparse in the sense that the person or
you know the thing that changes my entries
and only changes you know some percentage
of them doesn't get to
change them all OK so
when can we recover from why.
Of course we everybody knows C.
but I don't the nobody knows.
All right so here's what you do basically
you say OK look let me find try to find
a message so it's in code that message
the difference between you know what
I observed and and the coded version
of this candidate message this has
small one on my We've already kind of you
know roughly gone over that one's a good
good sort of for sparsity and I know
not too many the answer been changed so
maybe the source that says turns out
to be exactly the same problem as
reconstruction from their term
measurements and here's why.
So they can do as I can
choose to know since C.
is tall has some Annihilator know what
I mean by Niall or some matrix like
a multiply on the left of it such that
a time C. is ear right and so what I can
do is I can just take this receive
message that I have acquired a to it and
then this part goes away so then I'm just
stuck with a time some sparse vector So
now if I can you know I can recover the
air so I have a set of linear measurements
of a sparse vector and apply this
technology of compressed sensing to do
this recovery so do I come up with a code
book whose corresponding Annihilator
is random that really just means that
the code book itself is random and
then once I receive a message I compute
this is called the syndrome and
then I plug it into my my
linear program software and
I thought.
Well let's let's So
here here's here's what we know.
So you mean how many errors I can correct.
No Right there's no risk so
in this right now there's no
restrictions on the magnitude of the.
And we can talk for geometric reasons now
of course that stuff those come into play
and I start talking about like
stability of this and then that kind
of I don't know for just this like when
can I do this period it doesn't matter.
OK so.
What Ok so where how does this fit in so
we have again I mean we have sort of L.
minus N. by by L. matrix applied to.
A sparse vector so
I know when I'll be able to
do this decoding successfully just for
applying results from before and
it's basically when the number of rows
that I have to send is on the order
of the number of errors again
times as long acting by and so
then just if you just choose the code book
to be a multiple of the message like root
this comes down to is the the length I
need my code to protect against S. errors
is just a message like Plus at us nothing
is some sense is the right result right so
I know you know I can make my code kind of
short and it's optimally short like I need
to make it at least as long as the message
otherwise I don't preserve the message as
it goes to the coding Matrix and
I just need to add a number of degrees of
freedom that sort of proportional
to the number of Arizona current.
Practice is works beautifully can do
things like corrupt thirty percent of
the entries and it will cover the nicely
and it's also state OK So when we look
at and that's kind of a nice sort of the
sort of community stylized communications
problem that's how we apply one kind
of result from this field of solving
on the from systems but we can actually
think of another type of structure and
solve a different problem factors one
that might be a little more practical So
there's another way things you get
perturbed as I communicate wireless from
point A to Point B. right now it is
you know when I talk on my cell phone
it goes directly to the base station but
also bounces off buildings a bounce off
course it can bounce off many different
things so what receiver sees her it
is not necessarily the message that you
sent it is not so the message you sent it
with a couple entries change really what
is the message overlapped with that said.
Couple of different ways which might be
known or they might be on the right and
then those delays are multiplied by
numbers which we certainly don't know.
OK So that's that's you know that's kind
of like well another model we can use for
how things get there from
from point A to point B.
So how can we set that up it's like well
I have a message I do the same thing
I put it through an encoding thing and now
what happens is not that some entries get
changed or rather this in gets convulsed.
With.
A signal so we can combat the what's
called a channel response so this can be
all that unknown vector it's a vector that
desolate and then my decoder once
the untangle the message and this.
What this channel is from
from these observations OK.
So what we're going to say here is the
thing that we get involved with we could
say it's either sparse right maybe with
the with the with the known support or
it's short or but it has some structure
OK so what I would do what is this
problem is like mathematically this turns
out to be a non-linear inverse problem so
if I just write down you know
if I don't know H. or M.
basically multiplying the end when I sort
of write down what it is that I observe
each of these vectors Why is really
a non linear combination of these H.
and put together there really are is
a quadratic measurements in some sense so
if the channel is not known these are
essentially nonlinear measurements makes
the problem hard but there are two sort of
sources of structure we can exploit here
so first of all you know the channel
that we're looking at it's.
So this is.
Get sent in as a vector in R.
L. right we get spit out a vector and
are all kind of the thing we can both
come forward with that will just say as in
some known subspace States the first K.
elements of an L. like vector and then
also we know there's a lot of structure
on the other vector work involved in this
vector P. as we know lives in the column
space of this coding matrix the same
structure we had before OK so.
To shorten a coded message
lives in a known subspace So
those are the two things to disappoint but
how you know we still have this problem
of non-linear measurements how do we.
How can we apply what we know
about solving regularized you know
linear systems that are under determined
to this case the nominee measurements so
it turns out just by a clever rewriting
of the problem we can sort of recognize
that you know what we see here even though
it's non-linear in M N H So again M.
is the message and H. is this this channel
response is actually linear in their cross
product right where I cross product I
mean the rank one matrix formed by H. and
transpose OK that's just a matter of sort
of writing down what it means to observe
delayed shifts in a sense what you get is
I have a vector over here which is just
my channel response repeated multiplied by
different entries in the message right so
if I were to take these and stack
them next to each other this would be
rank one matrix so this is like zero rank
one nature which has been collimated and
then the sort of system that's acting
on this matrix Well actually this is
the first column of my code book and
this is just the top of the matrix
form not call my form that for
every column in my code book so this is
I have an entries in my message and
I have K. taps.
Possible delays in my
channel this is an end K. L.
matrix and is operating on a vector and
is a vector which has known structure
it has rank one structure OK So
there's kind of been another.
Right OK So just to be clear I mean
what we have now is instead of
nonlinear measurements in H. and M.
really what we have is we have linear
measurements of a matrix formed
by the cross product and
this is form by cross product we know
that Major to special structure.
OK so there's been.
Skipped this other.
Sort of work very recently in this field
of solving under determined linear systems
for actually almost exactly this problem
so instead of a vector that you're.
Not making a linear observation of
say you're making linear observations
of a matrix that's all it
means you have a matrix and
I observe different like weighted
sum is of centuries right and
so then you say OK that's you know I have
again I can set that up as Y. will say X.
and so now instead of like looking for
the sponsor specter while one is that
the matrix is low rank we can say OK let
me try to recover this by looking for
the matrix of his lowest rank which
explains the measurements that I have but
others just like there's a nice relaxation
for sparse is also a nice relaxation for
the if things being low rank and it's
almost exactly what you think it is so
instead of you know looking for a certain
number of singular values to be non-zero
just minimize the sum of the singular
value so that's turned out
to be complex program it's
a seventy definite program.
You can solve that you could even solve
it a large scale if the rank of the thing
you're looking at isn't too large and then
you can ask the same type of question so
before when I had a sparse factor I said
well how many samples in an optimal sample
are needed so we sort of solve S time
some some log factor here you know how
many degrees of freedom are we have a rank
our matrix that we're making different
observations about we're going to need at
least our times you know the maximum row
or column like there are times and plus
case if you will to within some constant
OK So it turns out just this sort of
a random mapping gives us an efficient
way to send sparse vectors of random
mappings also give you an efficient way to
sense low rank major cities so
in fact if you do observe a number of
different random linear combinations of a
rank are matrix the number of samples for
which this program is effective for does.
Scale the right way it's really the rank
times the maximum of the number of
rows of the number of columns as kind of
the essential number of degrees of freedom
the problem OK And then there's another so
that's around a project there's another.
Set of results which talks about OK say
I don't observe random when your company
said just observe entries of
the matrix and it turns out under some
incoherence assumptions basically mean the
singular vectors are spread out I can do I
can get a basically the same result OK so
that's nice but that's not our
problem our problem is a matrix which
looks like this it's random but
it's not right it's random because
the first column here is random and
the first column here is
manifest communism but
everything after that in the matrix is
determined right so we have kind of this
matrix with structured randomness that
we that we have to deal with OK So
nevertheless you know we can
ask we know empirically for
our sort of rank one system we'd like
to be able to get away with with.
Code length that something like the number
of bounces we wish to protect against plus
the length of the message or
time set numerically we see that happens.
And then in theory.
This is also what happened so let me
explain what the theoretical result is and
then I'll explain this what the key
parameters OK so what we can say is you
know if we have a message of length and
a chain a length of K. right so
what that means I'm trying to send any
numbers in bed them into an L. The mention
of space which is my code length and
then that all of that gets overlaps K.
different times right and it gets weighted
by different amounts of added together so
it simulates a process
of me sending it out and
it bouncing off different places and
people recovering so the question is like
how long do I need to make this code to
protect against K. of these these bounces
and here's the answers the answer is you
know that basically looks like the maxim.
OK And then time some more
exact arrogances work nor
Constance you could think of this as K.
plus then I'm some logs that
now we do have a little bit one sort of
fudge factor here we have a parameter and
basically what this says is you
know we need one assumption and
that's essentially that the chain
all that we're sending it through.
When I look at it's for each transformer
it's not too concentrated and
the only reason for that is when I
send this message to the channel like
this this process where I receive
overlapping copies that's convolution so
I can think of that as multiplication and
the frequency domain and if I sort of
you know the channel sort of pulled out
one frequency in the all the others out
right that would sort of kill most of
what's that right so we need time this
to work for kind of generic chance OK So
the upshot at the end of the day is the so
even though we have a nonlinear problem
we can turn it into a linear one of
a structure that they're right and then
we can prove that kind of the efficiency
of this coding scheme at least optimal
to at least within loft factors
a really scales is a freedom of
the things we're figuring out.
And that's really all we can ask for.
OK so the T. key technical issue when
you're so solving a problem like
that is really like you know we talked
about random nature she's are good for
Spore Spector's because
they keep them separated so
that's kind of what we won't show that
random nature sees which look like this
with this type of structure randomness
they keep basically rank rank one
major you separate actual needs
to be ranked two major cities for
a technique that keeps a low
rank major sea separated OK and
that requires no applications
of various concentration and
the qualities about sums of
independent random agencies but
that's kind of what the type of
mathematics we use here OK So
who here so this is kind of a summary
of what we know we've looked.
You know we look talked about two
different types of structure we talked
about observing a sparse vector the random
linear map talked about observing a low
rank Matrix or in this channel protection
case this a rank one matrix there
are random linear map right and the idea
is that even though these maps are under
determined in the sense that you know
the things I observe are many much
smaller than the degrees of freedom that
I have I know that they're structured and
I can use the structure to set up a convex
optimization program which not only works
in America but it's probably
affected OK So that's well and
then of course I'll be happy to
take any questions you might have.
Thank.
You.
So this this particular set in
America results we just used.
It was small enough scale we just
use an off the shelf S.T.P. solver.
With a nucular norm right now there is
there is since you know it's not low
rank you know richer after is really
rank one there's actually a whole other
class of algorithms you can use which you
can basically scale these numbers up to
a factor of one hundred more than
they are and you can still get.
Your competition's finished in
a reasonable amount of time.
But right so.
But that's a good point like S.T.P. is.
Basically it's squares
the number of variables
we have to reconstruct as well but
but actually if you know that what
you're after is low rank you
don't have to do that in
practice.
It's possible right so it's like.
I mean there's lots of ways to estimate
the channel if you send the pilots.
Sequence right that problem so
the problem is like the sort of
the fading coefficients these
donate the fact of these bounces they
can change pretty rapidly right and so
it's possible that this kind of stuff will
help I mean we really did this in kind of
a you know linear inverse problem sampling
theory let's prove something framework but
it actually works beautifully so
I don't know.
What what the possible impact there but
it's yet so I could and I will say to this
idea that the code book is random That's
not a terrible one for wireless comm I
mean basically channel codes look random
they're almost designed that way.
For other types works and.
Yeah.
Right right yes.
Right right right OK that's right so
we that that's a great question too so
really like the type of results
here are really different
than what you talk about information
there in the reason is is the problem or
attacking is different too so you know you
might you know very fairly asked me Well
what kind of capacity you could achieve
with that code and would versus S.
and I could you know say I don't know but
see we have this other aspect where you're
really are estimating
the channel as well and so
in communications this this problem of
estimating the channel is not talked about
in terms of information theory all talked
about in terms of squares basically so
so as I go I can get the child a certain
accuracy with a certain like pilot and
so both of these things like those
the messages being the code and
the channels being estimate at the same
time so that's kind of hard to map you
know with this set of results
on to like what we already know.
By name information there so.
Yeah.
Right so that.
Well that's a good question.
So the problem is actually to take it's
a completely practical perspective
like you of course you know every time
you communicate the cell phone or
modem it does this channel estimation
periodically just because it can really
use like that information to not only
like help keep from making errors or
really help it and in doing the decode so.
It turns out that you know you can
spend a decent amount of your time
not communicating but
as the main channel something like
thirty percent is not reasonable so
like all the energy you spend trying
to communicate from point A to Point B.
you know there are some there is a lot
of there's a lot of overhead there and
trying to correct for this type of there.
Yeah.
You know.
You're.
Going to yeah yeah yeah yeah.
I mean.
It might.
It might be I mean.
There's been a lot of trouble reconciling
this could this what compress
something tells us what traditional coding
theory slash information theory does there
of a couple connections being made but
nothing that talks about like.
That that gives us this result of like so
the you know the people really interested
like how many errors can I correct for
a certain line and then you know then when
you think about processing it says almost
the same thing but it was real numbers but
connecting those particular results
I hasn't been done yet and
it's not clear how to do it exactly.