[00:00:10]
>> OK thanks a lot for the introduction so this is my last talk. So this is something actually I feel ready to. Talk up all actually a few months ago in this department so I hope them not going to bore you too much so this is during work with.
[00:00:30]
2 of my students home our eating yen. My collaborator from Carnegie Mellon and Jensen fund from Princeton. OK so there was the problem with the following so I feel many of you know what his mission is completion so I have already mentioned this multiple times so basically imagine that you have a known low may choose to recover and you observe partial entries about it and you want to make some estimation of all all of the missing entries step OK So and you have found a lot of applications from sense and there was localization to a recommendation systems and the estimation computer vision a lot of them if you're interested in them I'm happy to talk about any of them off like what you saw Now one of the most popular algorithm to use in practice is perhaps say Sion So basically you try to minimize some empirical loss of function except there you may you're adding some extra.
[00:01:39]
Norm Pino is asian the in order to you knowing car reach the right structure so this is already something that's very very well known and very very widely used in practice. What I would like to talk about today it's one step further from summation so as to what the in the estimation Taz basically if I give you this set of missing partial entries and then you use some algorithm returns some estimate of all the missing entries.
[00:02:09]
Sometimes if you want to do a more. More informative kind of decision making maybe you want to know a lot more about that for example if I tell you a lie. Like you feeling now I would like to you to tell me how confident you are a ball to the estimate you return to the U.S..
[00:02:31]
And in particular for example I might want to ask you how likely this is going to be close to the ground troops and also maybe how how different it is from the ground. Probabilistic Coley this might be modeled by saying that maybe if I have some noisy entries there maybe I would like to read the reports some kind of distribution about any of the missing edges so this is kind of like coming back to us that is called problem of all how to report some confidence interval that is likely to cover missing and this turns out to be very important if you want to make an informed decision for a lot of the toss that I just mention recommendation system like computer vision usually you need to tell the uses the uncertainty about the estimates you have.
[00:03:25]
And the certainty it can be caused by a lot of different things for example since we have noisy data so there is a distribution. In the cold and sweet noisy distribution may be because we actually sample only a small subsection a small fraction of the end of the main.
[00:03:49]
Due to these incomplete measurements what might also have some kind of uncertainty happened that maybe you have some of those are real noise maybe you have some other kinds of. Expected effect but basically this is a toss about how to assess the uncertainty how to quantify the uncertainty for the estimates that we have.
[00:04:08]
And this is a problem actually posed a few years ago in the work high dimensional statistics bookshop by and basically I think mostly boundary Manzanar posed. As an open problem for the community so we now know a lot about how to produce confidence intervals for. Recovery problem now the question is there how are you going to do it if you care ball what you care about is mages completion this is significantly harder than those spots are covered.
[00:04:42]
So what other challenges there. OK so. Imagine that I want to. Assess the uncertainty based on disc all of us estimated. Now the 1st thing that you my count is actually does Congress estimate there is usually highly bias and the reason it is bias is basically because you're adding some regularize it so it sort of bias you towards some smaller not because the Duke in the.
[00:05:12]
Regularize a encourages something that has small the OK so you'll have some bias so you might want to take care of the bias in the careful way. Next thing actually the most probably the most challenging part is very difficult to track the distributions of the up 10 estimate so if I just give you a disguise which is solution to a very sophisticated complicated ecommerce program how are you going to.
[00:05:42]
Identify the distributions of disk so days almost in those distribution though theory claiming what happens for this particular case. Though in fact even forget the ball the distribution of theory actually even if you only focus on the estimation part of the estimation error that we know about for this part attack to the already highly sub even if I ask you to characterize not the distribution for single entry but the.
[00:06:14]
Reasonable bell for the old 2 estimation there are even for that part we know very very little. Or maybe we know something that's very conservatives about you we have a very conservative estimation error abounds if you just want to use it to build confidence intervals even though you do have some value to confidence in to vote you will usually mean that you will get something that's overly white and if you have a very very wide confidence interval it's not going to be very useful in practice because we really want to kill as much as information as possible to the user but if it's a very wide saying much less.
[00:06:53]
So all of these pose a lot of challenges for this particular problem. So in this talk I'm going to split the problem into 2 parts 1st of all I'm trying to find a way to improve upon the estimation product at least we tried to tell you something that's much better than what is known Empire literature.
[00:07:14]
And after that I'll tell you how are we going to reinforce this theory to get something better so this is my plan today I'm going to 1st show you how to improve upon our estimates made as to mation guarantee in terms of the O 2 maybe not very precise but we need to 1st get something much better than the prior art and after that I am going to boost Dravida all reinforce it in order to make sure that we can actually do something very precise for every single entry either missing or observed OK And when I say precise I mean I'm only going to tell you the precise rates and also the precise pretty constants and hopefully this these all of these things can match the information if you're at the limits OK.
[00:08:00]
OK so I'm going to star with some stability analysis which is basically about how to improve the estimation guarantees OK So this is a problem. This is these are my notations I have for the. Star which is are OK. And I observed some partial entries seriously this is some standard notation for majors completion only goes my sampling sets I have some noise so I observe partial entries and each of the observer inches is corrupted by some noise in this talk I'm mostly focusing on I see a noise you can be significant Sujal eyes and the goal is to estimate the.
[00:08:47]
So called this relaxation for this hard the most natural way to do call this realization is to look at this problem so I minimize square loss we respect to all the observed entries and then properly penalized by some new here in the. Case of this is one of the most common way of doing this.
[00:09:09]
Now I would like to say a little bit about what his known him impiety teacher about this very very simple but powerful paradigm OK so in order to say in general this is a hard problem but if we have some nice statistical models things are not that difficult OK So let me make the following extremely simple assumptions and in order to make sure that I have something concrete to say OK so 1st of all let me assume that these random sampling model each entry is observed independently with probability P. piece by sampling rate.
[00:09:47]
It's a huge. Piece my sampling rate. In my the dimensions of the matrix. A new order to make make sure that this problem is. Destructible they have a very these I need to make sure that P. is larger than. Logan. Otherwise you will have an disconnected graph which means that you will never be able to estimate to the Matrix OK so I have calls to rank matrix the rank is all the one independent of and I assume that this is incoherent if you don't know what to sync coherence mage's basically means set the matrix and the localize So there is no single entry that is very Spike So basically the energy are spread out across all of the entries OK so this this is the same very simple model I have and then I have ID Gaussian.
[00:11:00]
Noise with variance. And 0 me OK. OK So this is this is what is known in prior literature so impiety teacher I think this is was 1st started by. The plan 10 years ago and they have this estimation guarantee So basically if you look at the Frobenius no merit and this is the belt that they have.
[00:11:27]
Sigma times and to the point 5 Now we would like to understand whether this is the right thing to do this is something terrible. So these are many Macs that is low a belt that people have identified which does not take into consideration any of the computational constraint but suppose I'm allowed to do anything this is the mini mass lowballed you.
[00:11:51]
So under there my assumption that P. is larger then longer in the end. And you can compare them and you can see that actually this seems to be a square root of a gap between the low A belt and the belt achievable by this result. So this seems to be a lot to get motivated by this logic out.
[00:12:18]
Propose that OK let's try to modify. The they all wear them a bit by adding some extra constraint that try to control the size of each entry. And then they get the following bound OK and the balance looks like this guy so I'm basically if you compare you to them in the mass limits I mean this part of that depends on and then P..
[00:12:45]
The only difference is that the sum Max here and then extra Max. Happened here. Which basically says that if the size of the noise is much larger than the size of the corresponding entry. And then this is tight OK so there is a region where this is tight otherwise this result will say OK the flat so it does not go down money goes to 0 so we just says that even if I don't have the noise.
[00:13:19]
This spell does not give you exact recovery OK this is this is the problem of the part even if sigmoid 0 which means there's no noise dispell does not go to 0 so so they said this wrench where we have some more sub optimal results. OK And then coach.
[00:13:40]
I mean they they have the have a different algorithm which is this more like a spectral algorithm for doing the same thing but doubt with them also achieve roughly the same performance guarantee they don't need difference between these 2 is that I believe therefore when rise to results is because the analysis is not tight so this can be improved.
[00:14:02]
I mean because they're using spectral algorithm this is the tight result of this algorithm is not taught. This my understanding about the. OK So all of them put together we know that we have some understanding if the in the high as. Small S.N.R. regime but the A lot of lack of understanding in this regime which prompts probably the most important regime in Prague.
[00:14:34]
OK So but it turns out that the extremely simple algorithm proposed by this actually even though they are theory it's not doesn't seem to match. The mass limits empirically seems to be doing extremely well they have in the a paper they have done some simular numerical simulation basically based on them although that I mention and then they compare you with some kind of about.
[00:15:02]
And for all of the parameters they have trite. The. 2 error is within the factor of most 2 away from the A Some Kolo about on the right. Hand So You somehow says practically This seems to be working extremely well and put the knot in another way the theory proposed by Candy's paper seem to be too loose which does not seem to match practice which does not seem to explain the effectiveness of this algorithm in the best possible way in fact in the paper they also mention that the analysis seems to lose a factor of square root of end compared to some Colo about so basically this talk we're trying to say that actually discredits often facts that can actually be removed so in fact.
[00:15:59]
The performance of this simple calmness relaxation paradigm it's all almost all Tamil to some cos OK So this is what I'm trying to illustrate in this story. OK So what are the challenges there OK show the challenges so you know about this a little bit of all these Congress sation the teacher usually if you want to say something concrete we start with the K.K.T. condition and we say OK since this is a call this program sometimes they're controversial if you start with some 1st old optimality condition like 80 and efficient and then if you are able to find the dual satiric it and you can sort of guarantee that the solution you know the right thing OK.
[00:16:49]
Now in the noiseless case this is another difficult one because 1st of 4 in the is this case if you are expecting exact recovery and your primal solution has to be the ground troops OK so if you are expecting exact code. The only difficulty becomes how to find reasonable dual certificates and these days a guy David Gross was a physicist who placed a trial by developing an extremely smart.
[00:17:18]
Procedure called offering a scheme. To take care of this part so they vary widely applicable scheme to all construction scheme that allows us to find a dual certificate that justifies the optimality of this album if there is no notes. But if they snow always everything becomes suddenly becomes so difficult and the men reason is that if you don't have noise we don't know the primal solution depends in a very complicated way in the noise we don't know the primal solution and if you don't know the primal solution you cannot use it to use the coffins came to be with a dual certificate so you don't know the primal you don't know the deal and if you don't know anything at all that are you even going to start by saying that you know this is what this is off to most of which.
[00:18:10]
OK So this is just one page. 1010 eco slides of all the difficulties OK. OK So the summary. Analysis technique based on deals that if occasion seems to be very difficult and you has bother the community for maybe almost 10 years. In this talk I'm going to follow a different result I'm going to take a detour by looking at a completely different algorithm paradigm non-coms optimization and then I'll try to come back to say actually this is something that can actually help us understand.
[00:18:54]
What is the non-conscious paradigm. Paradigm is the following so in saying this we are doing low run recovery and we already know that we are looking for a low rank solution so why don't we just start by reprise renting the matrix variable by 2 factors OK so I'm going to represent my matrix.
[00:19:18]
Times of fast matrix with the corresponding rank so this sort of gives me. You know sort of give me a low representation of the problem. And then I tend. Again soft empirical Reese minimization problem except I know I'm replacing my mates presentation by the multiplication of fact. This is basically a.
[00:19:50]
Standard a way to start with dissonant culpable for migration though sometimes I would like to ask regularize if you have basically most the purpose is to promote. The algorithm in so basically you get the better small get better stronger. But this is some typical paradigm that. Using practice.
[00:20:13]
And this has attracted a lot of attention from the community actually. Maybe one decade ago the some. You know some heuristic works there and then and 2 on who basically start pioneered this direction although at that moment commerce for us Asia was so popular so he. Didn't advertise the non-coms part that much but you have seen some recent revival here actually because mostly because there non-coms optimization was something for this particular problem was much faster than call was received.
[00:20:52]
OK So a lot of things happening in the recent years. I'm going to give you just one example of what we can do you know a lot of different non-coms all with them that can solve this problem just one probably the same place method for solving this OK so what we're doing is 1st before you try to find some go initialization maybe using spectral math or something of that and then you run grade in the Senate OK I think this can not be seen simpler so.
[00:21:25]
In this sense choose a proper step sites until you come virtuous and we can guarantee that. An algorithm as simple as going in the set is already. Sufficient to give us a lot of great performance guarantees OK. OK So let's look at the performance of this simple algorithm again that's focus on the randomized to run them although I have run them something I do galaxy on noise.
[00:21:56]
And. And to make choices which I assume to be. Should be constant ranking coherence. OK So you tell us how that's just dismiss very simple grade in this an algorithm for not was back to. The non-college function. Allows us to achieve many max limits throughout all of the entire region OK so this somehow allows us to close a gap even though when the signal is extremely to go to 0 we have it's a recovery if it's large you always matches the mini mass limit up to some cost OK.
[00:22:36]
So this seems to be something that's very attractive in practice because seems somewhat surprising the noncom is all with an easier to allies the performance compared to call itself so this is somewhat surprising message. all right ok so now the let me give you a role map calm a sort of us sation the has been the dominant paradigm to around the lies about 10 years ago lot of people are tried to May progress toward towards understanding this this paradigm to 10 except for andrea month on mountain are is group they focus on non-coms optimization at that moment but recent the most of that people focus on the none colas part full computation though purposes now since of would now we understand a lot more a ball nunc comus optimisation now the question is how where their weekend return to calm a service station and to see where that we can't read these say something useful for the are the seemingly pod distinctive algorithm me pair of them ok so i would like to make this connection if and in the next few minutes ok so my starting point it's a motivating experiment this is not for us observe by os i thing it's you has been observe few years ago impure coty by other people too so if i'm going to run these 2 different algorithms the call of us relaxation one and the non-com wish one ok no i'm picking a regularize the in the various specific way the reason i pick this it's sure make sure that you looks more closer to a new york here in though fwiw this one because days a formula say in their new kira norm of the z e it's a equal to can be read and the in this the in this form so this motivates us to be use and this regularize are in the non-coms formulation oak ok they ahd to it did ok so here i most show me are these snow only him put a prior eba in practice you can estimate the using spect from with using a lot of our women's for using doing So basically you you compute the composition of the data make sure it's how you see there are some some point it's a gap this is spectral gap you figure out the spectrogram.
[00:25:07]
All right OK So and also for this talk for the purpose of this talk actually this is so this is as I'm going to illustrate this is going to be a fake So this is an element that will never be run in practice so now I can tell you that some key need to know you already tell you what he said.
[00:25:28]
All right so I have these 2 paradigms and then I run them some simulation and then I compare OK So this is what I'm doing. I'm plotting. L 2 arrow has so function. Of. The standard deviation of the no it's OK So this is slightly. S. able to estimation error for the call this.
[00:25:59]
The. My estimation arrow for the noncom is optimization So you see that they can match OK so at least from this plot you can't really see the difference. OK now I'm also going to plot the distance between these 2 solutions OK I'm plotting the distance between column A send noncom a solution in this screen line so you sort of see that maybe even though this is not 0 the the the distance friends is several although from acne true is smaller than this to Maisha.
[00:26:31]
So this somehow says that they call various And conversely lotions seem to match empirically in a very very precise manner. OK So this is this is my key message that. They seem to be extremely close. And if they are extremely close and then since I know. So if they are really extremely close since I already know the non-coms optimization algorithm I know this is the minimum.
[00:27:07]
And then maybe I can just propagated to call this algorithm and say OK maybe this is the distance also optimal So this is going to be the part of the. OK So this is a man result on the the assumptions that I have with the random sampling random noise you know.
[00:27:27]
Rank is cost well conditioned incoherent in these kind of stand and conditions if I look at this column as program this is the fear of no we can say OK so I have 2 messages this is to call this realization algorithm this is my message. 1st of all the comma solution is almost rank are OK even though I have not really use any of these ranking information in this algorithm by can sort of guarantee that the solution is in the only run car how do I So do what what this fear of me saying that if this is my solution if I projected to a rank our space I look at the difference the pasta Maisha Ira it's much much much smaller down.
[00:28:13]
My estimation. That this is the 1st message she seems like the solution is really nearly in the early to the right and if that is true and then we can propagate what we know from the non-conscious theory to Congress eventually we conclude that the Congress solution has an estimation that matches the mean the mass flow of.
[00:28:40]
OK so you vent truly from this picture base if you do what we are saying we are able to show that young for Congress simple call us with our social paradigm we achieve the max performance. OK So just one more thing in case you know what. Algorithm is and they basically except for this standard columnist realization they impose an extra.
[00:29:09]
Condition which controls the spike in this we want to control the S. This is true why is it not and so basically the maximum entries size of the entries if they imposed this they can say something nice. Very large what we are able to say using all theories actually this par is probably never in the you never need to enforce this guy because your converse program automatically controls the spike in the so the solution even if it's not explicitly implement the inn in the.
[00:29:46]
This is something that we call in pretty soon you don't need to add this to regularize or that the the algorithm automatically bias you towards a solution that satisfies this. OK So this is. Mostly talk about the case when ranked his or the one rank becomes larger you have something similar but now becomes suboptimal in terms of its dependency on the right the 1st before I need to post something re to be larger than this guy which is already not optimal So usually you don't square the information here at the Codey are should be fine I know I need that extra are factor if that is true and then I can still is justified you know it is nearly low rank and you have performance that is in the muscle.
[00:30:43]
But again the sample complex it's already suboptimal and why this happened is because most almost all the non-color is Fieri. Suffers from this optimality in terms of R. and this probably requires you to do something smarter in order to improve upon this. All right OK So this is this is the estimation part.
[00:31:10]
Now I'm coming back I'm going to come back to say OK so now you venturing My goal is to say how are we going to quantify the uncertainty or how to build confidence intervals for all of the missing data OK So this is this is the 2nd part of the talk that I'm going to.
[00:31:30]
Show you how to do it so the way we do it is the following OK 1st of all I'm going to start with an estimate this is a HUGE of the of what people do I start with an estimated. Let's say the calmest optimize. And as I mentioned this.
[00:31:47]
Optimizer is biased because we have it in force. In there so we want to 1st remove the bias because we want to get the best possible confidence intervals so let's try to be biased estimate. And hopefully after we do the pop of the biasing the distribution become some kind of approximate Gallus in distribution the if that is true and if we are able to perfect to rights what this distribution is then maybe we can use it to perform so that the quantification.
[00:32:25]
OK So this is going to be the paradigm we're going to 1st be bias it and then we say OK how to derive the distribution of theory for this. OK so how am I going to do with the biasing So this is the 1st step that I have is this is the 1st thing that come into my mind.
[00:32:48]
I have some run the operator peel me go basis the projection onto the sampling so that you don't need to know what this is just one thing that I need you to know is said this parameter on average is identity OK so treat it as an identity so basic you can remove it.
[00:33:07]
And you remove it and then this is the rest I have and some of the terms can sort so each other so eventually the average the M. Store is the only remaining part so somehow this says that if I do this. I'm going to arrive at the bias estimate of the ground true matrix.
[00:33:34]
Now what is a huge issue there the issue is that it becomes a very high rank solution. OK And why because I'm adding this is a little rough but this looks like a noise OK and if this is random noise surely has a very high energy so this user is high right so I'm going to start with a rather low rent solution and adding the high run factor so eventually I get the high rank solution which is really not something that we want statistically this is saying there somehow I'm boosting the variability of the solution by too much even though this might be.
[00:34:16]
Significantly increasing the virus which is not a good thing for us because eventually we want to have the confidence into the wall that is as short as possible. OK so now what I'm going to do is something very simple OK so we know that this becomes my rank solution so why don't we just try to project it back to the lower right space.
[00:34:38]
And this is looks like you know eventually this looks like one iteration of the so-called single about a projection algorithm. Familiar with the literature but basically this is extremely think simple think I'm sort of like doing one step of grade in the center and then do the projections.
[00:34:57]
And this is going to be my day biased estimate. OK So OK so this is my bias estimate and then so no I'm I'm going to show that this really is almost bias and after that I'm going to tell you what this distribute them is and then I can do use it to Butte.
[00:35:21]
Confidence into OK So on the again under the assumptions that we have. OK These are the typical assumptions that we already have let me just introduce one more notation OK So this is my rank are estimated it can be fact to rights in this way and the ground shoes can also be factorize by its start time so why stop because of these are just some standard notation and I sort of like pick them they are the way to do the factorization so I'm going to pick them so they have a roughly balance in the sense that if you look at time 6 star is equal to why start times transpose times why stars just pick a particular way to make to remove some kind of.
[00:36:11]
Ambiguity. And if I do it this way and this is my fear of I can say that for either the X. factor or the Y. factor up to some global rotation choose some global rotation matrix after I root properly rotate them. And the arrow looks like a golf industry.
[00:36:34]
You ever read it looks like a godsend because. It's so and we know this what this distribution is. And the several messages there actually the one thing it says this is Gallus industry abuse and every single roll of this is E.F. 0 are independent basically this is saying that.
[00:36:57]
Your arrows are going to correlate so if you look at the arrow for the 1st row the arrow for the 2nd row they are going to be nearly independent of each other. OK So and this is what we have if you look at a particular roll this is the distribution we have.
[00:37:17]
And this is still not useful in practice because I do not really know why star in their star buy can plug in my my my estimator to get things right so I plug in my solution and this is going to be roughly that this tribute on the ground.
[00:37:34]
So my light load factors if you look at the estimation area it looks like a godsend distribution with the distribution that can be characterized using your estimate. And it turns out that this is asymptotically off to move in a very precise way and on top of this we are able to say that if you look at every single entry of.
[00:38:02]
If you look at this to Mission error it's going to be given by you know something else in distribution with a variance that looks like this so this is exactly the quagmire although about OK so this is optimal including that because it's impractical you can basically just replace X. star and Y. star by your estimate and then you add them OK OK So now we can sort of show that actually this is the distribution we half and then you can use this galaxy in distribution to build confidence in the full you know that it is on bias it is a Gaussian distribution with the known variance and then you can use it to be you and you kind of confidence into what you will say 95 percent confidence interval and I think percent confidence interval everything's fine.
[00:38:55]
So numerically this seems to be doing quite well. I'm plotting the distribution of a missing and tree we respect to the distribution of a standard galaxy and distribution this is a cuckoo plot so if you which means that you feed looks like cuz you and they mean said the 2 distribution match OK So this is so this seems to be something.
[00:39:19]
Looks reasonably tight and the distribution of this guy really looks like something Goss. OK So as I also mentioned that the commas and solutions are roughly the same so you might wonder can I star we didn't call the salt to my eyes or and then perform all the.
[00:39:39]
Inferential results Because they are almost the same you can do basically the same kind of inferential procedure for either calmness and I'll call this everything will be the same and the results almost the same so the same procedure simultaneously work for 2 completely different algorithms the pair of them.
[00:40:01]
OK so. Maybe just let me give you a little bit took the intuition and then I'll conclude OK so. Let me start from a simple case when the ground is one. And let me start with the case when there's no missing data a toy so every single entry of that so this sort of becomes a major noising problem rather than the mages completion problem OK now this is a non-conscious algorithm that I put out to try to show you how to analyze the distribution why is why is it do you even hope to be to be something that has some gallous in distribution so I'm looking at this this one the whole in my go into it and the lies the solution distribution of the solution to this non-conscious program.
[00:40:56]
What I'm going to do is just start we 1st saw the optimality condition OK So this is this is the gradient I said the gradient to be 0 This can be easier OK so. Now look at me look at this one. So if I look at this part this one part that results in some some kind of bias because I mention that I'm adding some regularize it here which gives me some bias to you so this is a bias that we don't like.
[00:41:32]
But it turns out that this bias time can be incorporated into the other terms if we properly did it OK So the way that we do it is so following I'm going to change X. by X. This is a very very simple calculation you can do it in 10 minutes but basically you replace X. by a proper scaling.
[00:41:56]
And you plug them in. And you compare this way the original one Basically I'm removing the biased OK So this is basically what this device in procedure is doing I'm trying to remove this extra factor due to. Penalize a shunt and then this becomes something now looks more like a regional mess and the like us to me it.
[00:42:23]
OK so if I have this guy. And then after a little bit of money in the manipulations I just move some of the time to the right hand side and you can check that. X. T. minus the ground truth is given by this term which involves a noise to ease the noise Matrix this is something related to the noise and also something that looks like this.
[00:42:50]
And it turns out that this term looks more like a 2nd order term. Stars close this looks like a 2nd all the time it would remove the 2nd all the time. And eventually you only have to scale sand is something looks like I'll see it because it's the noise matrix times something that you so this looks like I'll see it this is the next reachable so and this becomes the distribution of approximation we are going to use.
[00:43:19]
OK so. This is a little bit of intuition which can be significant. OK so finally I like to just mention that if you are familiar with the last problem confidence interval for the last 2. Jiang and about several. Years ago they propose a way to do the biasing very similar to this.
[00:43:45]
Basically it looks like something like this so I'm I start with the comma solution I look at a linear map sum of linear function of gradient we respect to the square parts we can also do something very similar as always we choose to selenium to be this particular function which is the projection to the tension space.
[00:44:11]
We respect to. So ts a tennis player was back to my little run cause sure we can do this as well and it turns out that they also match OK So basically you can do you can find a lot of different ways to do the biasing and all of them seem to coincide for the problem.
[00:44:32]
OK So finally let me come back to this is probably one final message let me come back to estimations so I have said a lot of things about confidence interval let me come back to estimation and tell you maybe it was somewhat surprising results that OK So the distribution of theory why is it good see allows us to understand every single entry the performance of every single entry in a very precise way no I think come back to say something very very precise.
[00:45:01]
So now I come back to my estimate or my bias estimate or. This is not my call but this to me that this is a calmness estimate a after proper biasing and they would show that. Even though the I'm not imposing any distribution though. Model there but if you look at the L 2 arrows it looks like this guy is 2 times in times our time signal square over people with high probability discourse and trades around this guy and this guy is precisely what you man you can this show as an oracle low a ball in the picture so this somehow saying after probably biasing you are able to achieve more a low ball in the very precise way including the put cost and everything it's everything so we get to precise characterization of the estimation accuracy in the statistical language just means that you achieve.
[00:46:03]
Efficiency including both their rate and the prickles. Numeric holy he does seem that have to cope with the biasing. This is about and this is the balance where you properly Dubai is the calmness estimator it does seem there you get quite a bit for improvements. OK So let me just come Kreutz the entire talk is trying to breach commas and all commas paradigms and if you can do this it turns out that we are able to establish.
[00:46:40]
Stability guarantees for calmness estimator And also this allows us to. Demonstrate how to perform asymptotically optimal uncertainty quantification the a lot of questions I don't know how to achieve optimal dependency in terms of the rank I don't know how to. Deal with things when the matrix is not well conditioned what happened so you think well chose matrices only approximate to go round and how ball you have more general sampling patterns rather than run them OK so all of these how we're going to leave it for future work thank you very much.
[00:47:44]
Yeah sure for unbiased Yes. It's OK. Yeah yeah yeah yeah. That's a logic. That's a rush. In every iteration of. You to ration of what of. OK So for now call us when actually there is no bias there actually if you just focus I have another commissar with you.
[00:48:40]
This is. Then depends on how you solve it so that. Nobody before come verjuice I do not really know how to do bias. The theory only works when you already reaches the often of the countless programs so in them when you are operating in the middle I don't I do not really know how to devise in the proper way.
[00:49:08]
He might be possible. You would need more work and no I'm not pretty sure that for the non-conscious one actually I'm looking at this program but usually if you only assume you're not going to add things these are stupid things to add but if you just saw this actually does this algorithm is automatically the bias actually.
[00:49:33]
I don't even think that's possible I'd I come prove that but I don't even think critically you never need to initialize if it is actually and you can convert just for effect I don't think it's needed. Yeah. You. Can be perfectly estimated for this case it can be.
[00:50:09]
You can just plug this in so you use another estimate or plug this in and that's fine and so they multipole very. Easy way to estimate so you can and when you do all of them guarantee that with high probability it's accurate to one plus while the square root of and some of you know that it's a very accurate with whenever you need that you just plug them in and.
[00:50:34]
So. Yeah. Yeah exactly so Mike My key point is that if you want to have confidence interval the interval depends on the few parameters and each of the parameter we don't need a lot of these feeling of them each of them can be estimated in a very precise way so now you get to you know if you only care about 90 percent confidence interval you do suffice.
[00:50:59]
Unless you need. Extremely extremely accurate confidence it. Thanks.