[00:00:05]
>> Were fortune have given Reeves a visiting from Duke University. So Galen and I have known each other for a long time we met when we were both doing postdocs in at Stanford I think before that Galen had done I did his undergrad at Cornell and Ph D. at Berkeley.
[00:00:27]
And I didn't know at that time he'd done his undergrad at Cornell or really anything about his background so I remember I think one of the 1st things I said to Galen was I was applying for jobs and. You remember this now so I think I was applying for jobs and I told them I was going to apply really broadly I was sending out like something like 60 applications and I was going to apply everywhere that was hiring except for Cornell and U.I. You see because I just couldn't imagine living in either of those places and do just like yeah that's where I've lived my whole life with Cornell and Illinois so.
[00:01:05]
Fortunately he didn't hold it against me. So it's a De Gale I'm going to talk about community detection I'll just go ahead and all right does this work here all right thanks very much for inviting me here I'm happy to be here I'm happy to see that Mark is neither Cornell nor you are you see.
[00:01:34]
So this is the machine learning seminar since I've been here I've been talked a lot of people are interested in kind of high dimensional inference problems involving matrix factorization and also random networks so we're going to talk about is actually something in that vein. And this if I put this in the.
[00:02:20]
Arts let me start kind of motivating this talk. I think there's kind of a spectrum of levels at which we do research and this is true broadly kind of whatever the topic is. Maybe and not to paraphrase kind of areas too much but the idea is that and network infrastructure is actually a massive community of people that work and do this kind of in the social sciences trying to find interesting structures and networks these biological networks social networks and you know use there's a long history of people using a lot of different algorithms at the same time and kind of the more theoretical areas sort of want to go says radical computer science.
[00:03:02]
People like to study specific models and they like to look at these models as kind of exemplars of problems where maybe there's large gaps between what you can do computation the easy and what you can do if you had on them into computational power so people are very interested in basic fundamental questions about computational hardness and these studies I said a specific problem with some of these problems are motivated by network inference problems community to actions the cast of block models one of them planted clique's another one so these as I said have missed out on what people look at our thresholds.
[00:03:35]
Specific types the scaling limits where you have different behaviors for different algorithms now somewhere in the the middle of this maybe is. A bit more fiscal modeling where people are trying to use more mathematically motivated techniques to kind of work on real were problems so I kind of put that maybe somewhere in the middle and these methods Well you know a bit more practically motivated than the hardcore theory one kind of lacks in the theoretical guarantees so the talk today kind of the goal is to take some of the insights from the far right of this picture and move a little bit to the left and in doing so help to understand some of what makes this community Tom section from interesting and did.
[00:04:21]
So to recap the goals before I get into it will be a broader class of network models and it's been studied previously. Instead of summarizing performance by a single number we're going to come to the multifarious performance measure and allow us to kind of study asymptotic performance and think about some of these computational tradeoffs.
[00:04:43]
So the setting we're going to look at Sun's we'll call the bays off the mill setting and it's what we're given is a joint distribution on the objects of interest so one of these objects is the adjacency matrix of a simple graph this is just a large binary matrix you have symmetric there's a one on an entry if the nodes correspond to that entry have an edge between them the other object of interest is each vertex has a label and given this joint distribution or the basic question might be one how much to impart about the labels upon observing the graph of course the graph is independent The edges are independent the labels nothing is easy but you know in general what that might look at some useful information now if your information theory is to say of course we look at Mitchel information but maybe for not information theory spring if you are it's a why the heck do you care about him Mitchell information that's just some quantity of the dependencies distributions what I really want to know is if I had some estimate or how well those are perform and I can have accurate estimation efficiently.
[00:05:46]
So what we're going to study is kind of the optimal performance in this setting beginner and for sizing that all the distributions are in art so the 1st kind of big example of this I imagine many people have seen before would be the symmetric has the casting block model so what I'm showing you here is the adjacency matrix of a network of a network that has 3 similar sized communities and can.
[00:06:14]
OK So the idea is that people in the same community are more likely to have an edge and people in different communities are less likely the only thing as I haven't told you who's in which community in your job is to permute the rows and columns so as to find a clustering or partition of this network into these community structure and I've given this talk a few times nobody has yet successfully do it based on the slide in the time I've allowed and it's because there is about 1000 nodes here and so all possible permutations is pretty I guess 1000 permutations is quite massive so I knew I generated this I know the ground truth we had actually sort of a court to see this community structure as you can see this is called the block model because we see these blocks and the main takeaway is that this model of community structure is basically 4 numbers how many communities are there 3 What is the probability of a connection for people in the same community and what's the problem of a connection people in a different community and then that summarizes everything OK So in this one you're more likely to have connections in the same community and so as Mark now there's been this is an incomplete list but the mentioned there's been a huge amount of work in the last 10 years of people studying the kind of theoretical limits of this inference problem and a lot of it was motivated by people using tools some of the from obstacle physics to get kind of exact if you want to stick for conjectures of these performance limits and then assuming work has developed this in a number of ways so we're going to scribe now is how to parameterize the symmetric to cast a block model so I can states in these results the parametrization I mean it's it's the we've already seen the staff that block all I'm going to come up with a parameter zation that's slightly different from what's been used previously but it's useful for my results so the 1st thing is vertex labels this is what is your community and in principle this could be anything right you could be in community apples or community oranges.
[00:08:18]
Without loss of generality it's going to be useful to represent these as vectors. So in particular what I'm going to do is I'm going to bed then if there's cake communities I mean in bed then they came minus one dimensional space and I'm going to do it in such a way that they're all kind of equidistant from each other so if I have a 2 community model it's just $1.00 point minus 153 communities it's a quote equilateral triangle in 2 dimensions and also for convenience right after all these are just labels I'm going to normalize these so they have means your own unit covariance now the useful property this representation is that if I look at the inner product between these labels I get one number if they're in the series the same community say a large number and I get another number if you're a different community so it's just leveraging the use or representation of the labels we're now in a product to tell you whether you're in the same or different communities are so that's going to represent the communities and I'll say how do I generate the graph conditional on the labels here the idea is discussed by each entry is drawn independently.
[00:09:25]
Of corn to a Bernoulli distribution and. As I said there's only 2 numbers we have to control here what is the probably of an edge if you're in different communities and what the problem is an educator in the same community here would behave normalized it parameterized it so that there's 2 numbers da da is the average degree per node.
[00:09:46]
So that's the overall density in the network and the other number are are quantified the differences in connection probabilities between. Within and between communities so sometimes people say is if are is 0 there's no structure this is just independent crap are positive scientific also sort of if you're more likely to be from have connections with people in your community or is negative it's called disorder to you're more likely to have friends outside your community so these are a rather remarkable result I think to the desponding at all and what they did is they characterized the asymptotic mutual information in the 2 communities to cast a block mall.
[00:10:28]
So here is divided by the number of nodes that's the per node me transformation and show that the X. axis here is the signal strength remember our equals 0 there's no information there's no dependence and there's a triple upper bound you can get both these upper bounds of minds but what they showed is that this asymptotic limit is actually equal to the 1st upper bound until a magic point in the signal strength is one in this normalization and then it deviates from the upper bound and eventually it asymptotes that log to log to is exactly the entropy of each load name know the label because there's 2 communities here each one equally likely aren't so then mutual from Asia is great but what's the significance and the significance is that this exact characterization of the limit in each information also implies optimal performance measures and estimating the unknown labels so here what I'm plotting is what I call M.S.C. of pairwise interactions This is sometimes also referred to as The Matrix and legacy so you're estimating the outer product of the labels and this is the M.S.C. refers to a base optimal estimate where the under the square last the conditional expectation is the optimal estimate it may not be feasible in a competition tractable in practice nonetheless we get this point where the mutual information deviates from the upper bound is precisely the point where this performance measure deviates from the tribute will case of 11 is what you would get if you had no information whatsoever about the graph and or do strictly better than random guessing you need the signal strength to be greater than our squared so this point here whether that you're doing Strictly better is called the weak recovery threshold and that's what a lot of the theory has has developed.
[00:12:14]
The setting here I should say is asymptotic and it's going to infinity and also the average degree is going to infinity in the case where the average degree is not going to infinity There's also been a lot of work focusing exactly on the weak recovery threshold but the exact performance curve is not known so.
[00:12:32]
Myself I'm very pleased when I see such nice results but then I I also wonder well where does this come from and where where did these blue curves How do I know how to draw them. And the answer is in order to draw those blue curse you have to study a different estimation problem recall that estimation problem the signal plus noise model so the signal plus noise model is much easier than simply says Imagine that you observe a noisy observation of each one of these nodes labels and add to Galaxy annoyance and are going to parameterize this as of Gaussian noise by the inverse noise variance S. which I call signal to noise ratio parameter so keep in mind right the significance of this model depends on the my representation of the labels I chose the labels ahead of time in a way that would make this meaningful So this is a case of typical story communities and I'm kind of emphasizing here that what the signal plus noise model is is a Gaussian mixture model where the actual nodes of the label centers are the possible means and on top of that you have some.
[00:13:33]
Galaxy and isotropic Gaussian and kind of the S. and R. says Is it easy to separate out the clusters or not. So we're going to find from this signal plus noise problem are 2 functions I call them the transformation and the C. functions and they are simply the mutual information and the M.S.C. corresponding to this model because the interests of X. are ID And this is the observations are conditionally independent the entire mutual information between all the labels of all these noisy observations is characterized just by a single The 1st pair so the idea is I only need to look at single one dimensional estimation problem or dimension or came on this one dimensional.
[00:14:18]
And the other thing I want to emphasize that these are not mystical quantities these are low dimensional I can approximate them numerically or using Monte Carlo integration techniques but think about this is an easy problem that I can have a good handle on I can analyze I can take limits anything you want to do you can do with this promise just to go see him next.
[00:14:39]
So now I can stay more formally the results of Despond in all which is they say well if you consider the signal plus noise problem and there's a particular potential function here and you find the global minimizer of it so find a value of X. that minimizes this function this function depends on that signal plus noise model as well as that parameter or then the M.S.C. those pairwise interactions is given by a quantity on the right just the pens on the minimizer of that function so that's where the blue curve came from you saw an optimization problem in America to find the minimum plug in and that tells you exactly what the matrix and the sea will be.
[00:15:19]
Now using this you could also try and obtain balance on other performance metrics and one of these is the ability to estimate the labels themselves so this is a slightly different metric. And in this metric there is an optimization of kind of a nonstandard optimization where you're trying to resolve over possible variances in the labels so in the symmetric problem there is an issue even if you could correctly divide the communities into 2 subsets you would never know which one was which which was apples which was oranges So that's this minimisation just resolves.
[00:15:52]
And here again we see this is characterized by that little M. function which again was the M.S.E. in the signal plus so take away from this is studying the signal post noise problem easy gives you asymptotic limits for this much more complicated. To cast a block model recovered so that was I think 2005 fifteen's when our 1st paper came up and I'm going to talk about.
[00:16:20]
My results in the Star going to say well into the beginning real world networks don't always look like the semester symmetric the Catholic walk all right we don't always have the same size of communities we don't always have the same interactions within and between so here's a little called the Green balance network again try your best to find the communities.
[00:16:42]
Here they are and you can see that now we have different sizes and different density so there is this is a more complicated model the key property here them same degree balance to simply says that for any given know the number the expected number of edges provides no information about your community membership so what that means here is that if you look at this is Jason C. Matrix on average every row has the exact same density of the same number of edges so degree balance is hard in a sense if you not agree balanced then it turns out it's very easy to detect communities just by looking at you degrees if you have a lot of friends here in the the friendly community so degree balance you have to go look at other properties and just the known statistics themselves the edge statistics them selves are unlikely to lead to good because so how do we parameterize this degree balanced spin and going to follow a similar approach as before which is that I'm going to embed the community labels into my came on this one dimensional Euclidean space but I'm going to get normalized them so they have mean 0 and unit covariance So what that means is that the labels on the left is a symmetric one the rights an asymmetric one and so the labels that occur more recent rarely are just going to be further from the origin so it's again I say a straightforward mapping from the probabilities of the size of the different communities to some representation of the labels and the next thing that I'm going to do is say well how do I then quantify the interactions between different communities and this can be parameterized as follows similar to before D. is just the average degree I said a degree balanced network so everyone has the same degree the difference here is a set of being a single parameter R. which quantifies the string there is a symmetrical matrix and this matrix R. quantifies the strength between each pair of different possible communities So for example if R. is 0 there's no dependants.
[00:18:45]
There's nothing you can do are as positive definite as to be a sort of stiff structure every community is more likely to have friends within their communities are is this a sort of negative definite see other way around but in general you could have cases where you have both a sort of end to sort of structures together and definite.
[00:19:08]
And I think one of the main points is that in studying this problems when you have more than this that's highly symmetric ace and you want to do assess performance you can't just have a single number you can't just say Here's my average accuracy you're saying it's not going to using that you're not going to characterize the problem so we actually introduce the Senate we call the M.S.C. make tricks so this is the.
[00:19:31]
Came minus one by K. minus one positive definite matrix that's the covariance of the labels given the graph so this is again the optimal quantity of the optimal covariance matrix on the right averaged over the graph G. and then we're summing over each of the cost of the nodes so to give fuel for this.
[00:19:54]
Recall that I chose to normalize these labels operates that they have 0 mean and you. Start identity covariance So just by the data processing acquires it is Matrix live somewhere between 0 and the identity matrix. Moreover if I take the trace of this matrix This corresponds to maybe the usual quantity which is just because the mean square error in estimating the labels working with the matrix though and advantage is that you'll see the different components of this matrix tell you about the ability to estimate in different directions it may be the case that some community memberships are easy you can differentiate while others cannot.
[00:20:36]
So you kind of state are formulas when you return back to this signal plus noise problem but as before we can't just have a single US In our problem parameter so if you're going to think of the signal plus noise model with we'll call a matrix This is a positive definite matrix that actively controls the covariance of our point clouds.
[00:20:59]
So again as before I'm going to find these Mitchell information M.S.E. functions so the difference now is that their argument is not a scalar but a positive definite matrix and also in the case of this M. function the output is a matrix not a scalar But again these can be computed numerically as long as the number of communities is not too large so it's a more interesting next All right so with these results in hand I can now state a result which simply says Let S. be a global minimizer this potential function and its potential function has 2 ingredients the 1st one is the information from the signal to noise model and the 2nd one depends only on that are matrix which I said before quantify strength and then we have an upper bound on this M.S.C. matrix.
[00:21:50]
Where the upper bound is in the positive semi definite order and again this upper bound holds in the limit were N. becomes large and D. grows so as is and goes to infinity and D. grows with at some arbitrarily slow rate OK So what can we do once you have formulas you can make pretty pictures and compare them to actually what happens when you run an algorithm so I'm talking here about the optimal performance and I have an upper bound on the fundamental performance in a problem where it's not clear if you track the ball to compute the optimal estimate or right there's tons of work in spectrum at the Season 7 definite programming.
[00:22:33]
And sampling methods but. For comparison here what we used was belief propagation So belief propagation is an algorithm that when applied to kind of the theoretical now also seems to cast that block models and it works well in the kind of low degree set so keep in mind our theory is the average degrees but verging B.P. works in the amateurism too big so I think we pick the average degree here of 30 as a sweet spot and the red heat map values are what we get from running B.P. So you should think about B.P. as an upper bound on the optimal performance.
[00:23:06]
And some cases when my conjecture the P.P. is optimal but not always the black lines are the contour lines predicted by our theoretical asymptotic upper bound and qualitatively they they match up pretty well and I think this is what we expect I think that the quality of accuracy would improve if we were to kind of scale D. up and get a better feel for what the often performances on the left is the 3 community some metrics to cast a block model actually it's not it's symmetric only enough so these these X. axes are the eigenvalues of this army tricks so these are kind of quantifying the strength in different directions if you look at the diagonal on the left were both these I can follow these are the same that is the 3 communities symmetric the chaotic blocking Otherwise if you look at the asymmetric cases you have more kind of juice in one type of direction than another and then on the right we have a similar type plot but with a symmetric.
[00:24:06]
Community of non-uniform community assignments so one of the things I want to point out is that this has been observed before is that when you move from a symmetric case to a non-symmetric case you can have differences in behavior so one of the things that happen in the community symmetric skeptic block model is that actually the weak recovery threshold is achievable by an efficient algorithm So another way of saying is that as far as detection is concerned there is no gap between the fundamental and the computational months however as you go into the asymmetric cases they're going to find regions where there is a large gap in particular that's the difference between this dashed blue line here corresponds to the computational limits anything outside the dashed blue square weak recovery is possible inside that asked to square weak recovery is not possible at least using any of the currently known methods by contrast to see that the blue line the dark blue line says that weak recovery is possible using an optimal method so in the asymmetric setting this is say we have a computational statistical gaff.
[00:25:20]
Any questions at this point so these are the pretty pictures. Motivating some of the insight and the rest of I'm going to talk about the proofs and some of the key techniques used in in finding the mappings between these problems. Aren't so when I talk about as I said here I don't want to say I'm going to do the give you the full proof of everything I want to give you some of the keep the key steps in particular maybe the pieces that I think are interesting or could be useful to be applied more broadly to other problems and I said even though I've been here one day I've talked to many people who are working on related problems and so my hope is that might be something.
[00:26:04]
And perhaps one of the key ideas that was used in thinking about this problem is this connection between Mitchell information and and so let's let's go back to that signal plus noise model but I'm going to do it 100 represented here in matrix notation so before I was saying think about each row of X. is observed under this this Gaussian noise he reckoned represented Witness X. remember as my matrix for each I have and rows it's a very tall skinny matrix and then I multiplying on the right by my S.N.R. matrix adding I id independent Gaussian noise and then on the left I have a matrix case is just a matrix representation of the exact same problem and a basic kind of fact if you will is that if you can represent that M.S.E. matrix as the gradient of the mutual information so here is the M.S.C. between X. and your graph and this is going to be the gradient X. and the graph and this additional kind of side information from the signal plus know it all noise model so in the special case where everything is universe this is a result known as the eye in the sea relation that's been well developed it's also been developed in the context of.
[00:27:19]
Various gradients with respect to different parameters and number of different ways this is a it's a. A bit of a generalization that was in some work I did a year ago where we applied it to these metrics and one thinks one emphasizes that this relationship is very general it actually holds for any joint distribution on X. and G.'s So if there's nothing special about the fact she happens to be a graph is really just a property of gradients with under additive Gaussian So what does that tell us if we set this basic property it says that really all you need to be able to do is compute this and limiting the child and her mission between your graph and your labels under the addition of the signal plus noise model parameterized by some metrics S.N.R. if I can compute this limit and then I can use the difference in the limits to get information about the gradients and that's that's exactly our approach we now would reduce the problem to saying let's compute the limiting Mitchell information under a church model and I should mention this is different from what with the approaches that were taken before the approach by just bonded it all goes through multiple routes that 1st a teen abounds in the matrix estimation problem then they have another interpolation with an eraser channel to get back to performance metrics so this is already I'd say a key difference at the beginning that I think simplifies the houses so the next step is it can be a comparison for Mr Cassatt block wall to Matrix estimation problem so this type of comparison is not new in the sense that this is kind of been also a key step in and some of the earlier works and we're going to go through a say I think about this graph right as using notation is kind of we're taking an end by end matrix taking into this you know drawing interest condition independent corpus Bernoulli and generating our matrix observation a different model would be what if we just saw all the you know what's inside the Bernoulli distribution the mean under additive Gaussian noise and the statement of channel universe out Lettie is that under the proper scaling regimes which in this case means the average degree is growing at some rate these have the same neutral information of the leading order terms.
[00:29:35]
So what this is going to tell us is that instead of worrying about the details of the cafe block model it's efficient to focus only on the matrix estimation problem or if you just like Matrix estimation problems you're also starving the cast of Bach model And by the way I just this is if you evaluate this this normalization with this funny one minus square root that's just chosen so that the match is up to with the parameter.
[00:30:03]
And as I mentioned this is so this particular results a bit more general than some of the ones that were done previously which only work for kind of rank one major cities OK so now we simplify the problem or we said the goal is what is a mutual from a nation in estimating a matrix from Gaussian noise observations so we have our 2 observations here going to the the one directly on the matrix of the labels X. and then the one that comes from this this kind of matrix squared version of X. times are extremes.
[00:30:42]
And to make the connection between these I mean introduced interpellation function so this interpolating function is simply the mutual information in X. and this pair of observations as parameterized by these 2 different S.N.R. terms the 1st one is a positive definite matrix this is that matrix S.N.R. that's dimensional on the 2nd one just a scale what is this interpreted function to do well if I evaluate it with T. equals 0 that means there is none of this matrix observation and I just have my single post noise problem this problems easy so this is the one I can analyze and I like whereas if I evaluate this with T. quills one that's exactly the that's the problem I'm trying to obtain right this is the one that is asymptotically Tommy the neutral information I need so I need some way of comparing this function a value way to the T. called 02 teeth balls one.
[00:31:39]
No I should note that you know introducing interpolating functions as the standard idea and it's also used a lot in Cisco physics literature and one of the things that you do then is you then you might pick a path of how do I Turkle A From here it's kind of clear it was one but you might also change at the same time and then a lot of the challenge for the work is to then look at the gradients along that interplaying path and say you're going to cancel things our find what did or quantify the difference so that's that's where as I said there's a huge body of work dedicated to that what I'm going to show is a simple relative simple trick or technique that's going to allow us to get a one sided inequality here that bypasses a lot of the nastiness.
[00:32:23]
And this technique is to say well let's look at the gradients of this function with respect to these so harnessing this I am a c We see a gradient with respect to S. That's our M.S.C. matrix I told you that one of the great respect to T. Well it turns out that's exactly the kind of messy in the matrix version of this X. are extra suppose so looking at B. is an expert to handing them out it's a it's not too hard to show that in fact there's a basic inequality between these gradients so keep in mind that one of them is a scalar The 2nd is a scalar the 1st one is a matrix unless you can say that the 2nd gradient is upper bounded by the 1st and that's to say somehow estimating the square of a matrix is easier than estimating the original Matrix and exits and so one of the emphasizes that this function here it's not too important except that depends only on the 2nd moments of this overall nature so this is a very simple function G. of you and it's fixed doesn't depend at all on the other particulars of the problem all right so what I want the goal now is to say can we turn this basic inequality across an estimation and a call in the gradients into an information inequality and the way we do that is via duality so.
[00:33:43]
More to construct a conjugate function to this mutual information and if you're worried about this this mutual formations concave has lots and lots of nice properties of the function I've defined here is convex and the dual variable to you to correspond to the M.S.C. So you can all think is an answer transform So basically if you find that you know by say what is the value of S. that achieves the supreme Well it's exactly given by the fixed point involving the gradient with respect so the significance of this is that it's easier to work on this conjugate function that is going through them to take the gradient with respect to T. on my conjugate from.
[00:34:24]
This 3 lines the 1st one is simply the envelope theorem is simply says what happens when you take a gradient of a supreme the 2nd one is the estimation inequality so this is a gradient with respect to T. but I said that upper about it by the group that the 1st gradient stuck inside the G. function and then the 4th one is simply recognizing that the gradient with respect to the 1st function is by definition you so that's estimation of quality just turned into information inequality in the conjugate space and I integrate this because a simple bound on this conjugate function so the left here this is the complicated one right it involves both so it is a one here that means this involves the graph the right hand side everyplace the graph by 0 so that involves only the signal plus noise model there's no graph and my simple function G..
[00:35:18]
So now it's all the hard stuff done this is just some putting things back together I rewrite my original function of interest in terms of its conduit stick in the inequality that I just showed on the previous page few more tricks to rewrite G. in terms of its conjugate swap in order of a minimum nothing here is conceptually very deep and you just arrived a nice simple expression which was in the special missions but.
[00:35:50]
The 1st step of derivations I did here actually holds generally for any choice of X. and G. If I specialize it to the case where the X.'s are ID with the normalization I said before then we see that the 1st term and here is that information from the 2nd plus noise model I said That's easy to compute it and the 2nd one is some nice function that the Pens only on this this parameter of Delta and R. So the point is we've gone from something at the beginning complicated involved and wise dependence to something computer ball so to put it all together what have we done or is it just the steps right as we've now obtained an upper bound on the asymptotic natural formation of interest in terms of this kind of variational expression involving this potential function of things that are compute.
[00:36:42]
Upper bounds great be really nice at this word equality so you say well can you make this tight and the truth is that to show the other side this inequality is a lot lot harder it's where all the real hard work is done I say everything I just showed you was a simple simple trick to prove the lower bound requires work so what we do is currently is we just leverage your result by the large Milan who looked at the special case when there is no side information and they show that there is a quality so this corresponds to the case of S. equals 0.
[00:37:18]
And. And so having these values if we can evaluate exactly S. equals 0 and we have an upper bound for general that's the one that allows us to do is get an upper bound on the gradient with respect to S. if this inequality were tight and we have a lower bound everywhere than we have the corresponding lower bound on the gradient as well so that is how we then get the bound on the messy R.H. So that's more or less what I want to show maybe just some something cluing points is that we know here we started a broad class of network models I think many of the techniques we've developed special effects channel universal they could be applied more generally to other problems maybe observation models we have partially observed nature expect or ization current.
[00:38:05]
There's also other extensions to if you were to serve multiple networks what if you observe covariate information along with the nodes and so one of the strengths of these types of interpellation techniques is that they work very well because they're very general with lots of different information multiple information you can kind of also work the composition of models so one problem I didn't talk about use a similar types of techniques for multilayer networks and there the composition property is very important by studying the individual components of the network you study the full.
[00:38:36]
Behavior of the entire network together this would also then be able to obtain kind of understand where these computation gaps occur which is something that's people are very interested in something and again if you're interested in this there is that our papers on so that I think I will conclude.
[00:39:00]
Thank. You. Thanks.