[00:00:05] >> So Garrett was a post-doc with young Dan at Berkeley Young's lab in my lab had very similar perspectives at that time and Young had also done a lot of natural images stuff. Kind of coming at it from an independent perspective but we were both doing a lot of natural images and actually I think it's very 1st neuroscience paper was the decoding paper so if you want to see one of the 1st awesome vision decoding papers that was ever published it was garrets 1900 papers in general that's really cool paper and he probably since he's doing a century stuff now he won't actually say anything about that paper if you teach the course I don't know do you ever mention that. [00:00:46] So look at the paper it's a really good paper and a lot basically all of the visual the coding stuff that anybody does kind of builds on that paper foundational paper thanks very much for having me I think it's wonderful coming to engineering neuroscience groups I think neuroscience desperately needs engineers and I'm really happy that George Tech is building up in this area and the reason is before that when you get to the med issue which is you may have noticed if you happen to read the abstract that it seemed kind of schizophrenia and that's because I thought there's a bunch of neurophysiology people here in there's a bunch of M.R.I. people here and I was trying to figure how to make both groups happy so I decided I'm just going to give 2 talks the 1st half of the talks are going to be about your physiology the 2nd half of the talk will be about M.R.I. but hopefully you will see that although these are 2 very different approaches on 2 different primate species there's a common theme and I'll begin the talk with that common theme which is really about reverse engineering the brain so I'm going to bore you a bit with my personal philosophy and it builds on kind of this observation which is you can think about neuroscience rather than as science as a reverse engineering problem in engineering usually we have a model some mathematical theoretical model with well characterized limits and we try to build a machine that matches that models closely as possible. [00:02:07] And then reverse engineering we have a machine that somebody gave us we have no control over it and we try to build a model that describes that machine as closely as possible and that's reverse engineering and these both have the same kind of components you have some sort of device you have measurement apparatus it's going to measure the device make sure it's in spec or figure out how it works and you've got some sort of modeling framework but the ways that these 3 factors get mixed together in these 2 approaches to engineering are different and the constraints in these 2 the directions are very different and in particular inverse engineering are limitation is usually in the amount of data we can gather to characterize this and in neuroscience that's definitely true the real problem we have faced today in earth sciences measurement it's all measurement limited Now if we could record from every neuron in the brain for days and days and days we would have a different problem which we would not know how to model that data but that would be a really good problem to have and we're not even at that problem yet because we can't even get the data we need to model this complicated system that we have in the brain. [00:03:14] So I tend to think of all of this as it's the reverse engineering problem I tend to think of it as an optimization problem I've got a complicated system and I'm trying to model I want my model to predict how the system will respond under new conditions I want to generalise I want my models to predict as accurately as possible what data do I have to collect given the constraints of the methods I'm allowed to use and how much data do I have to collect in order to build the best possible model and that's that's not like the perspective you go into if you're designing experiments. [00:03:50] With this in mind is not kind of a point no hypothesis testing perspective it's more of an optimization perspective so everything in my lab I tend to think of as an optimization problem and in particular I tend to think of this is kind of trying to maximize the mutual information between. [00:04:06] The source which is my brain and the destination which is my computer I want to get as much entropy from this brain over here into this computer so that I can basically build a model of the brain on my computer but the problem is to go from the source through this destination I have to go through some channel and this channel just like any information theory system this channel is lossy and limited and noisy and in fact all the measurement methods we have for an earth science or are basically that they're all really bad they're just bad in different ways. [00:04:39] So you guys probably all seen this graph before this is a description of the different methods we have available for neuro science where we have the spatial scale of the measurements and the temporal scale of the measurements on this graph and all of these measurement methods at the bottom are methods we're allowed to use in animals and a lot of these are really good they have really good spatial resolution they have a really good temper a solution of course the wonderful thing about the rodent models over the last 10 years and the evolution of measurement is that we keep developing new methods of measuring more and more neurons from rodents over time and then up here at the top we have all of the methods that are available for humans and all of these methods have problems some of them are fairly fast but they have really low spatial resolution some of them are fairly slow but they have excellent spatial resolution for humans but excellent spatial resolution for humans means 2 millimeters. [00:05:36] Not you know not a membrane so all the measurement methods we have for humans are going to be bad and we want to pick the measurement method that essentially will optimize our information flow through our experiment. Then the other thing you have to think about it for solving this optimization problem is what kind of experiment are we going to do and what kind of modeling framework are we going to use and as Gary mentioned I've always thought about the all of this reverse engineering problem as a black box problem this is a if you remember you took it for all those of you were engineers you took like a 3 course sequence some variation of a 3 course sequence the 1st course was linear systems they told you here's a black box but it's linear How do you characterize it you get the impulse response it does everything great and then they go OK next course non-linear systems how do we deal with that well you can kind of project into the space where even though it's non-linear as long as it stays show you can still treat it with linear systems that are systems and then you take the nother downhill systems course and they go boom we have no idea how to do this let's pretend it's linear yea and everything works so it we need a system identification is always about somehow like linearizing your system so that you have some hope of getting something out and that's perfectly valid thing to do so so this is kind of a way to think about this black box problem the system identification problem in a statistical point of view I've got some Xperia balls which are my Either my stimulus or my scheme of a task is also an X. variable rate and then I got some Y. variables which of the data I measured and I got my black box and I'm just going to measure it using some regression model right this is a regression problem it's a giant regression problem I can either treat it from an inference perspective or an optimization perspective in practice optimization ends up being the best thing to do because linear regression or linearized regression is a really great thing the really great method that works really really well and so we collect some data with a 5th set and then we collect some data with a test set and we just look to see how well are our. [00:07:36] Model predicts and the tests that hopefully using new stimuli a new task that we didn't use to fit the model and eventually we're going to hit the noise ceiling which is the best possible model we could ever build given the data we have and then we're done then we need to go get some. [00:07:49] Do. So think about all this optimization perspective this means that we're going to design the experiment using a very very different method than is normally used in biology and psychology normally all biology and psychology is built on point no hypothesis testing point a hypothesis testing you can think of as binary gradient descent but it assumes that the local hill side you're on in this state space of possible models is the right hill and my complaint is given that if we had every neuron in the brain we would know how to build the model of the brain it's unlikely that any hypothesis we have about the brain is even close to the optimal hypothesis the problem is if you end up pushing a point all hypothesis paradigm really far and eventually it breaks and you're no hypothesis you're no hypothesis cannot be rejected you have no idea what to do because you only map this local part of the gradient of the entire model space the entire hypothesis space. [00:08:50] So I prefer a kind of a data driven method where we sample as much as possible broadly across the space in order to sample as many hypotheses as possible and as many possible model forms as we can as rapidly as possible and for doing that then we're going to select the method that maximizes the neutral information through the system per unit time money graduate student that's really the factors that limit science right time money and graduate students so per unit time many graduates who just want to get as much data through entropy through the system as possible we want to optimize our experiment to maximize information over the hypothesis space so not just one hypothesis but all possible hypotheses that might be relevant we want to gather data that spans all of them we want to fit an explicit quantitative computational model that actually makes quantitative predictions on the dependent variable What are we measured we want to make sure that this is cross validated so that we have a 5th set and a test set there are separate that are 2 separate things so that we make sure that we're not overfitting and we. [00:09:49] Want to always measure our cost of production relative to the noise ceiling the noise is given by the intrinsic noisiness of our measurement and how many data samples we have and we want to test generalization of the model under naturalistic conditions so there was a giant war in the vision community that was fought in 1990 S. about whether we should like understand vision like television engineers and just use sine wave greetings for everything that was like the classic view and then you had this crazy community of people that said no this is a modern nominal system you have to measure it under the conditions that it operates because we're going to build a model that's not perfect you want at least the nonlinearities in that model to be the valid nonlinearities of the system operates with every day and after 10 years of fighting the community kind of concluded OK you can build your model help you want doesn't matter you can build with ratings you can build natural images you can build it with thoughts doesn't matter but when you test it you have to test it under naturalistic conditions because the only way to know if you have another novel system that your model predicts well is to measure that system and predict in the situation where that in that situation where that system normally operates. [00:10:57] So I would encourage you guys to do that if you're not to me already certainly Number 5 no matter what you do I think number 5 is really really important I should mention in biology and psychology almost nobody ever test predictions and almost I would say 95 percent of people don't test predictions and 99 percent of people don't test generalization and that's because in this field. [00:11:18] Everybody has a point no hypothesis testing and refocus on the school significance OK this is my little rant about how much I despise the physical significance I taught to 6 for 15 years at Berkeley I kept trying to discourage students from even thinking about Siskel significance Sissoko significance is necessary but not sufficient for doing science so school significance is just a really bad 1930 S. method for trying to get some sense of what your data are random or not and like trying trying to move science forward by finding non random versus random data sets is not a very good way to make progress so I would encourage you build predictive models. [00:11:55] All right OK so now I'm going to talk about physiology. How long to have that rank on 15 minutes but I talk it over 15 or let me when I get till 1215. OK I'll talk fast so let's talk about your physiology the rest of my talk will take 5 minutes now let's talk about your physiology. [00:12:16] This is a brain we flatten it out here just so you can see all the belittle brain areas and of course as you all know the MacApp hierarchical visual system consisting of something probably on the order of 40 visual areas connected in this very rich complicated network and for many years probably the 1st 20 or 30 years of my career I focused on this little blue area here called Area V 4 which is an intermediate visual area it's not primary visual cortex V one it's not and temporal cortex where you already have neurons selected for objects it's somewhere in the middle and the reason I was focusing on that area was I thought well it's not as boring as primary visual cortex but it's not hopelessly complicated maybe by the time I'm dead I will be able to figure out what this visual area actually does and the 1st part of this talk is I think maybe I've kind of figured it out and I'm not dead yet so that's that's a success All right so what does before code for well here are here are some classic experiments on before this is all data that I published showing that this is a one before the on the left one before nor on the right and this is just the finding rate to these different little stimuli called Cartesian stimuli and can see that this neuron really cares about these hyperbolic ratings maybe it's interested in intersections maybe represents intersections this neurons cares about curvature at corner my colleague from David medicines lab went on later to show that these neurons are can be selected for little corner elements of shapes. [00:13:44] In particular these little like extruded points and pieces of curvature a little local sort of contour elements of shapes seem to be represented in this area but after a 3rd. The years I would say of work and before we really have no before model Wait sorry one more thing I should mention there's not only neurophysiology data to indicate that the 4 represents the intermediate shape features whatever the heck they are there's also some. [00:14:12] Indirect evidence from modeling using. Networks So on the top here is a standard network you probably have all seen this is like the original cafe network I think that's just a hierarchal network trained on natural images so it's somehow embodies the statistical regularities of natural images to Otherwise it wouldn't be able to do much classification and on the bottom is a human data from an M.R.I. experiment we're looking at the back of the head where the visual system is in humans and now we've colored each of the little locations in the human occipital cortex according to the level of this cafe network that best describes that that location and you can see that these early areas and primary visual cortex or all described by very early layers of the network comm one and comp 2 and the highest levels of visual processing in the human which are these areas you may have heard about if you're going to a psychology talk the occipital place area the extra straight body area the fusiform face area things like that those are all represent those are all best modeled by these really highest levels of the network and middle areas like the 4 are represented by these intermediate levels so there's a lot of indirect evidence that this before area just as you think based on the visual hierarchy it's representing features of intermediate complexity but the question is What are those and the bottom line is we do not have a good model for before this is the best model for before that's out there this is published back in 2006 and this basically says VIF. [00:15:45] Takes the image and it calculates the power spectrum of that image and it's and has a tuning curve based on the power spectrum. That is an agreed Just least stupid and brain dead model there is no possible way that this model could be right because obviously the visual system has to encode phase information otherwise everything with identical power spectrum would look the same and you would be able to tell the difference between ball and a cloud which would seem kind of bad so so this model has to be wrong or can we do better than this so what we decided after many years of hitting our head against this is we just did not have enough data we realize these are awake physiology experiments and we realize that we could not possibly get enough data in a day we would need from a neuron in order to characterize a receptive field we would need say 22500000 frames of video to be able to even make a stab at characterizing of a receptive field and that would be hours of data and we would need to record over multiple days and weeks so we started implanting these you raise an area before these Utah rays are awesome because they stay in there for weeks and weeks and weeks or months or years if you're lucky and you can record continuously from the same neurons over many days now if you look at the actual potential of the neurons across days from one of these electrodes they will actually change and that's because the physical location of the brain is moving slightly relative to the electrode but my allow been independently Jim Decretals lab at the same time came to the school method that we called the fingerprint method. [00:17:16] And the ideas you bring an animal in every day and you show him a movie in this case is a movie that's $450.00 frames long and you just mapped the Paris to less time histogram across a day during this movie Across the days and even though the waveform of the actual potential projected on the electorate is slightly changing from day to day the neurons selectivity for these complicated natural movies remains invariant and then you can actually map across days one electrode and every single neuron has a unique functional fingerprint that reflects its stimulus selective and it doesn't look like any other neuron in the brain and so you can track these neurons over days you can see here we have one neuron for $37.00 days and it's exactly the same neurons that meant that we could collect millions of frames of video from each neuron So a typical neuron in this data set we have like a 100000 frames of video some neurons we have 3 to 4000000 frames of video and my general rule of thumb for life is if I can't figure out what Iran does after a 1000000 frames of video I'm going to give up and go get another job so we have 800 of these neurons we have a lot of data from each neuron we should be able to finally solve this problem now we have a big data problem I should mention the similar we use we use simple stimuli and we use natural movies. [00:18:31] Just so we have both OK so now we have to build a model we did we record in 3 areas the one of the 2 where we accidentally end up exactly in the phobia so as far as I know these are the only primate folk you know recordings made with in Awake animal and the reason we could do that is even though there are microscope Codd's that are fairly large because we have infinite data we can basically aggregate over those microscopes and we can model them out. [00:18:55] We also recorded from before so let's start with the one in V 2 as you probably all know in here if you want to be 2 can be neurons be one neurons can be described as these space show temporal Gabor functions and that's because any natural image can be decomposed into a linear superposition of spatial temporal Gabor's that's just a feature of natural images. [00:19:16] So we decided we would basically start with this sort of model and we have a simple cell model which is just a bore and then halfway rectification a complex cell model which is to go boars. Squared and and then rectified these are completely standard models and these are going to be elements of our of our sort of network model but we're going to incorporate other things too we want to deal with color so we have to deal with cone opponents see we think that these receptive fields are taking spatial temporal derivative So we want to incorporate that we want to deal with all of these things somehow in the model so rather than building this model explicitly in terms of kind of a classic engineering approach of putting a little elements together which we've done we decide in this particular experiment to take advantage of the fact that we have G.P.S. now and things like torching Charisse that allow us to to gradient descent using deep Norma networks really really quickly but rather than just building a generic deep neural network this is architected neural network it's very very specifically designed to both incorporate all the assumptions we know from from previous B. one physiology into the model and it's designed to be able to learn on a fairly small dataset because even though we have a 1000000 times a video it's small relative to. [00:20:38] Relative to. Data set you would normally train a deep in all network on so let me just do it graphically here we've got our stimulus the stimulus gets moved into a bank of complex cells and simple cells I should mention that these complex cells and simple cells the basis functions here don't have to be Galaxian they're actually learned from the network these basis functions are coupled in pairs and they have to be orthogonal but the network could learn to learn to fit anything here these get sent a centrally into a pooling stage with a hidden layer and then that predicts the output so basically we're going to model every neuron that we record as a sum of a linear sum of simple and complex cells that are rectified together and summed. [00:21:21] And that kind of model works pretty well in all of our leaders all cortex and in MT As we've seen in previous work from our lab this is a high premier model but we have a lot of data and we are very careful not to over fit this model and this is the results of them all you can see on the horizontal axis here is correlation coefficient. [00:21:42] And the blue histogram here is our predictions and across valid data set using a standard network and the green is the cross validated predictions using this hierarchical compositional energy model and you can see that this model works really well you might say well wait it's only predicting it point 8 remember we're predicting here in at the frame rate of the video which is we're predicting 16 millisecond bins in 16 milliseconds a neuron can only have between 0 and about 4 spikes and those data are not Galaxian they're put on so actually if you correlation coefficient your metric point it's basically the noise only we're predicting essentially with this model all of the predictable variance in these neurons in the you want to be 2 so my view is like on the one finally after 30 years of my life this would be one model that is basically just done all you want is a model that's biologically plausible because all the elements in here are biologically realizable and you want to model that predicts perfectly this is the model. [00:22:41] And on the bottom here I'm just showing that this model predicts better than all of the other models that have been proposed by our evil competitors you know as are our friends but just from different labs All right so. Now we're going to I went the wrong way OK So this shows you some receptive fields here if you want to separate fields they look like the one receptive fields these are the 21 receptive fields they look little wonky the size here doesn't really mean anything because the sizes of the frames have been normalized roll of the receptive field size so what about the size here see all these new one cells look like spatial temporal Gabor's which is great because that's what the one neurons are and there are non-classical receptive fields here but they're subtle you can maybe see them a little bit on the edges here like this one's a little Kirby you notice on the end this one also has a slightly different feel but these are relatively subtle effects in the receptive field these B 2 neurons look like spatial temporal good boards but they can be a little bit more complicated especially knows this one seems to be pretty strange. [00:23:44] But this talks about before so what are you doing before well Mike all over the person did this work had this brilliant idea maybe they say you got to be one and then to build a B 2 you stick another V one on top of the one and it takes the output of the one and it does of the one and things and then to make of the 4 you take another B. one and you stick it on top of the 2 and you do the one a things to the output of the 2 So basically this whole model is just a hierarchical stack of Gabor wave lit. [00:24:14] Energy filters stacked on top of one another in this hierarchical network allowing complicated pooling to sort of. Leverage the statistical relationships over increasingly larger space or temporal distances to create more complicated receptive fields so in this case we have we have to take because before Sept fields are fairly large and the out and the one has a long polar map we take a long polar transform on the front end don't worry about that then we have essentially our car the one model we have our V 2 model and we have a V 4 model now some of you know Mike likes to call this a scattering module B. to A scattering transform so you may know about the scattering transform it's basically just a way to take a bunch of disparate data sets from a lot of different basis functions that are scattered over a very wide range and pull together the information that's relevant in order to create an invariant representation that sort of partials out all the variable stuff and of course one of the big goals of the visual system is to make an invariant representation so we have essentially of the one that feeds into another V one then there is just a mixing module that feeds into a pooling module so all of the stuff is a skin just learned from the data and now we can look at the predictions and before you know if these predictions in before are not as good as the ones in the one but I should mention if you take any standard classical model of the one like a dog man model or any good bore filtering spatial temporal model that doesn't have all of the complicated non-class course at the field stuff in it and you run it over natural images at a 16 most 2nd frame rate you will get a distribution looks almost exactly like this so my take home message is our current V. for model is as good as the classic V. one model that we all walk around with all the time thinking be one solved even though the correlation between the classic model and the responses at a 62nd time frame is only about point 5. [00:26:14] So since we all think that's the glass is half full I am now ready to say the before glass is half full and the problem is remember when you know whenever you're fitting a model the amount of data you need to improve your model remember goes up as square root of N. so far N. is a 1000000 frames we're going to need a huge number of frames in order to build a better model of the 4 than this and I doubt that anybody's going to collect that data anytime soon OK So what are these before Sept builds look like they look hideous hideously complicated they all look if you notice kind of like we're am spatial temporal Gabor's modulating other spatial temporal bores right so here you can see this is 1 AM pattern at a low frequency and this is another one at a higher frequency all of these look like some weird combinations of am except this one this one is very strange I should mention previous results from my lab in other people's labs showed that the 4 cells if you wanted to cluster them into 2 clusters you wouldn't it with a texture cluster and a conduit cluster and these are probably more contour neurons and this is more of a texture. [00:27:24] OK These are all hideously complicated. One way you can try to interpret these things yes question. Migraine or it's. Yes migraine auras actually usually are v one based and we know that based on the retina topic organization the migraine aura. But. I can see that or they could look like a bad acid trip or something like that yeah. [00:27:54] Yes All right so. All right so this now in this case in this particular. Video might is fading between the best one second piece of space time for a neuron in other words the the piece of spacetime that evokes the highest nor response and the receptive field and this is to give you a sense of what the relationship is between the receptive fields and the stuff in the images that triggers that neuron and what you will quickly see or slowly see when you watch this is you cannot look at the receptive field of the neurons and have any idea whatsoever of what is going to trigger them in in the world other than if there is a receptive field here there has to be stuff here but you won't know what that stuff has to be there's no obvious intuitive relationship between these and that is frustrating but I think it's just inevitable these neurons live at this intermediate level where we don't have any language to describe the pieces of shape that they represent what we as humans can talk about really simple things and she can talk about color we can talk about an edge we can talk about occur and we can talk about complicated things like animals and people and phones but we don't have a language to talk about like the section of space that encompasses half my face and the stuff behind me and my shoulder here that that is doesn't exist in our language but these neurons have to code. [00:29:27] For that kind of stuff and in fact. Remember And before the attention of fact the top down effect is only about 20 percent of the neurons response so 80 percent of Iran's response doesn't have to do with image segmentation it just has to do with the stimulus statistics so these neurons are really largely driven by the stimulus and they're coding the entire visual field in a really complicated hideous way. [00:29:50] So we have $800.00 neurons so one thing you would like to do is book through all these neurons and figure out what the clusters are after all neurophysiology is always boils down to butterfly collecting I have 800 neurons I want to know do the neurons form groups to they form a continuum What is the manifold they will live on right that's a that's a butterfly cataloguing problem so we have a 100 butterflies here but each one of these lives in a model space of thousands of parameters and there's no clustering algorithm that can cluster $800.00 points in like 810000 dimensional space so we use a brain based clustering which is we stared at the data really hard for quite a while I'm not proud of this it's just what we did and after a long time Mike Oliver who spent years looking at these data decided there were contour cells curvature cells temporal contrast cells and these AM modulation cells and that that's or provisional cataloguing of the kinds of phenomena but I hope I made it clear you should not believe this has any kind of valid clustering it's just a gives you a sense for what we see. [00:30:56] Gary you're looking particularly suspicious you just have resting skeptic face or you're with us. Which meant. And so much like you. OK So one question that often comes up when I give this talk is this seems like a huge amount of work when you just use a deep neural network like any computer scientists would use that so easily get the cafe network it gives us features you just take your stimuli project to the cafe network gives you activations use that you take a linear some of those activations you break your before Iran it works great to be way easier Well OK so let's try that on the horizontal axis is V G G And on the vertical axis is this convolutional energy model and you can see if the signal to noise is low which is down in the lower left corner yeah you can use a deep in all network and will work just as well as this complicated biologically informed deep neural network but once you get up to a decent signal to noise in other words my data is not random I have actual real signal here then then our network always works better than the capping network so and this network has far fewer parameters than the deep network the our network has tens of thousands of parameters digital network has millions. [00:32:12] OK And I should just wanna give a shout out to Mike Young Kirkland in the early 2000 they came up with this interesting model for V. for this model is actually brutal Also house and sparse coding model is that me but also has a sparse coding model with another sparse it's my pacemaker All right so it's a. [00:32:38] I'll just try to ignore it it's entertaining There is also an sparse coding model with another sparse coding model on top of it gives you complicated like this so that's again the idea of a centrally have a view one like thing that does sparse coding of natural scenes and you take that output and you send it into another bank of filters that does sparse coding of those inputs and you get these higher order if you notice amplitude modulating kinds of patterns and that's essentially what we're seeing with the same kind of actually physically not physically instantiated but a model of neurophysiological data so I'm pretty happy with that and I think this is a good inspiration. [00:33:16] OK So to make a long story short because I'm in the interests of time if the classical The one model is correct then we have a before model that's correct and I'm not going to do this anymore because I've already got a 1000000 frames of video per cell and I'm not going to collect $10000000.00 frames of video Purcel that would be crazy so I'm going to declare victory and move on so let's move on to something that has its own problems that are different problems from neurophysiology the problem with their physiology is always how can I get more neurons the problem with humans is how can I get any data at all. [00:33:54] So in my lab the human method that we use is F. M.R.I. As mentioned I'm not a big fan of F. M.R.I. and that's because I like to think that I'm going neurophysiologist and if I'm right does not measure neurons it doesn't even measure anything stored in neurons it measures the plumbing M.R.I. is metabolic signal that measures oxygen use over time. [00:34:14] No matter how many times you hear an M.R.I. talk or people say they're recording from neurons they are not recording from neurons and in fact the link between M.R.I. signals and neurons is obscure complicated and something you really don't want to think about if you don't have to so here we have a brain the brains and communally fold it up inside the head so as you saw we computationally inflate it and we flatten it out so in the middle here is the visual system this is prefrontal cortex left and right this is the motor and so mad a sensory strip this is the auditory system down here and now somebody is watching this movie while fixating on a small green dot that's not shown here and we're simply painting the metabolic brain activity on the surface of cortex while they watch this movie. [00:34:55] Now there is no there's a linear fading here but it's only linear interpolation between the samples taken every 2 seconds so there's no data imposed on here these are the actual data we have. And you can see that as people watch this movie we get fairly complicated patterns of brain activity right there sometimes some things are activated other times other things are activated and the patterns of brain activity are constantly changing over time and changing and evolving and of course that makes sense because after all anything that you can see in this video must be represented in brain activity and anything that you are reminded of in this video must be represented bring it to be so I just saw coffee and I would really like more coffee right now so something in my brain that said hey I could maybe I could get a coffee like I reach down here to my backpack and my coffee then that must also represent a brain activity right so some of this brain activity is going to be deterministic going to be directly from the video and other act brain activity is going to be non-deterministic because there's something in the brain that's driving up but we don't have access to those variables that are latent variables. [00:35:58] So you can think about M.R.I. as a multiple regression problem I've got stuff in this video in late variables I can't measure I've got 100000 points on cortex I measured and I just have to solve this giant multiple regression problem for every point on cortex figuring out what it liked in the movie or what latent variable it responded to and that's again just a system identification problem except I now have 100000 black boxes I have to deal with. [00:36:25] So for those of you don't do M.R.I. This is the one that is the only slide you ever need to think about an M.R.I. is 11 side some of them are I. It's a chemistry experiment except now instead of having a chemical liquid chemical we have you we stick you in the M.R.I. machine and the M.R.I. machine is a big station wagon alliance some vanishingly small fraction of protons along the main magnetic field then we introduce an electromagnet the electromagnet puts in an orthogonal signal and it spins the protons off axis according to wherever whatever gradient we put in now the protons are spinning off axis we turn this electromagnet off and now the protons spin back down to align again with the made magnetic field and we can measure the the proton resetting back down to the main magnetic field in terms of the spin rate changing in the transverse plane or the plane along the beach 0 direction that made magnetic field direction and that gives us 2 parameters called T one and T 2 and all of our eyes basically just looking at this you want to teach you parameters these relaxation parameters and a 7 using chemistry M.R.I. people are incredibly ingenious at coming up with essentially gradients frequency and phase modulator gradients that they can impose on this box sample that essentially tag the spatial location and they allow us to pull out the D. phasing T. wanted to 2 components from different locations inside this volume instead of just getting the whole volume up once. [00:37:48] And so we can get various kinds of weightings from M.R.I. we can get a T one here's a T 2 we get P.D. or we can get a functional M.R.I. and a functional M.R.I. takes advantage of a bug. Although in this case it's a feature which is that De oxy him a globe and has a low balled signal blood oxygen level dependence signal and oxy hemoglobin has a highball signal and so as the blood flow and the oxygen levels increase it changes the ball. [00:38:15] And neurons are little metabolic engines that make A.T.P. and when they make A.T.P. they have to extract oxygen sugar from the blood stream the blood stream has this group Goldberg mechanism that when it detects that oxygen has been extracted increases blood flow and that's what measure an M.R.I.. [00:38:31] So now we've got these M.R.I. signals there are these bold signals that are kind of vaguely related to the net synaptic activity in a local area. And now we want to model the data are our input data is complicated stuff and our output data is this bold signal how are we going to model it well it's a regression problem so this is turns out all this data is Galaxy and by the time we're done with it and start processing it so we're going to use Ridge aggression and so we have some features and some. [00:39:01] Bold signals and we're basically just going to fit these using Ridge aggression and we're going to get essentially voxels by future weight matrix out of this thing in the end and the cool thing about this method is. We can basically test multiple hypotheses simultaneously and we do this by a process called linearizing or linearized regression where basically we take our original stimulus in this case it's a story and we project it into different feature spaces that we hypothesize might be related to the bold responses that we measured So for example we can take the story we can extract the spectral signature from the story we can extract the phonemes we can extract the syntax we can extract the semantics Now each of these feature spaces might be larger or smaller like to take the sound spectrogram of a story that might require $3000.00 features phonemes there's like $65.00 or so phonemes in the English language syntax maybe there we might use a H H M M with maybe 30 syntactic States for that semantics maybe we will model 2000 semantic concepts so we're going to break the stimulus simultaneously up in all these feature spaces then for each one of 800000 voxels we're basically to fit all of the way we're going to find beta weights for all these features spaces simultaneously and now we're going to collect a new data set with a different stories and we're going to project those stories into these features space is going to multiply them times the weight we get in the 1st step stage and then we're going to just compare the predicted responses to the observed responses some of these beta weights will be a large and we. [00:40:31] Think that's maybe what this is what that voxel represents some of them will be small the voxels will tell us what feature spaces they care about so this is a way of taking a complicated naturalistic similis and simultaneously testing many many different hypotheses about stuff in the stimulus that could have been represented by the system. [00:40:51] And if you notice this is very similar to what we did in the last in the last experiment except I suppressed all the stuff about future spaces there so now we can look at multiple feature spaces simultaneously these are data from people listening to a story in the out M.R.I. machine this is 3 feature spaces and we can because these features spaces are correlated we can do various partitioning to look at the part and partial correlations and we can find for each individual voxel how much that box is activity can be predicted from either the individual feature spaces or the joint feature spaces between them and the sad thing about M.R.I. is it's very slow because it's measuring this metabolic brain activity so you can't pull out data about fast changing things like the sound spectrogram the similar to noise for finding sound spectrogram representation ZX in the auditory belt in humans that sound spectrogram data is very important to Noise and so we can't really do a very good job of that but we can do a good job of an M.R.I. is things that are slower where slow means on the order of a 2nd or 2. [00:41:53] Basically there's very little data there's very little information the M.R.I. data at a time scale faster than about half a Hertz So it turns out that semantics which is the meaning of the stories has a lot of signal to noise in M.R.I. because the meaning of a story evolves at a time scale that is commensurate with the thing we're measuring which is this metabolic signal in the brain so let's look at semantics for a minute now in this particular experiment our semantic feature space which we got by semantic analysis for those of you who care. [00:42:22] Had about say $2000.00 semantic basis vectors so we have a $2000.00 dimensional semantic space which spans all the English language. And now we can analyze the brain data the brain data is just a set of voxels with beta weights in the semantic space so that means we've got $100000.00 voxels each of these voxels is represented by a 2000 dimensional vector that's a lot of data and no human can possibly think about that data so you can as you mentioned earlier reaction if I go to give a talk somewhere the easiest method to mention erection talk about is principle components analysis So let's take this data and do P.C.A. on it when we do this if you do a principal defense analysis on M.R.I. data you going to find that about usually in a typical M.R.I. experiment it takes about. [00:43:09] 100 stories in a typical M.R.I. experiment where you take a feature space like this semantic picture space you'll need about 100 principal components to account for that data and about 10 of those will be shared in the account for about 30 percent of the variance so this just shows the 1st 3 principle components. [00:43:27] Just because then I can conveniently show them on a screen where I color each principal component are G. and B. So if you want that did not make sense to you just think about it this way there's a principle component space where we can project in which we can project all the semantic concepts and nearby concepts in the space or concepts that are represented Similarly in the brains of all the subjects and that turns out to be social things are represented similarly to emotional things for example and visual things like bump be round and so on are represented similar to tactile things like rough and smooth. [00:44:01] And then we can also project these football components onto the surface of the brain and similar colors on the surface of the brain are locations in the brain that represent similar semantic concepts so all of the red locations on this brain are representing the same stuff and to see what that stuff is we can go up here and see what's red and you can see that the red things are social and mental things. [00:44:22] OK so this data sets really complicated that's not what this talks about so I'm not going to go over it in detail if you are curious you can go to our website and play with this online brain viewer this is really cool javascript application that allows you to interact with the data so you can click on individual voxel and you can find out in this cloud at the lower right what semantic concepts this experiment predicts that voxel would respond to the most so you can see it all these little red locations respond to social things like mother father brother sister and things that happen to social groups like marriage death divorce murder which comes up a lot. [00:45:00] Then you can and there's a bunch of these red spots right now in a 2nd you'll see that I'll go over here to this green spot this green spot is actually a known brain area that has a number related area in post your pride cortex and in fact there's a whole network of number related areas and turns out that these number patches tend to represent numbers time money and weights and measures and dates all those things are represented in the same network and there are multiple again 20 or 30 different areas that all represent numbers so these there are these multiple networks in the brain for different semantic categories you can also play with this another fun too we have online that allows you to basically go in and look at a subspace So if you want to find out how the concept of dog is represented it will show you this is how this is these red areas are locations that represent dog and blue or his reps and locations that don't represent dog to make a long story short in this semantic experiment what we find is that every semantic concept like dog is represented across multiple brain areas and every brain area that represents a semantic concept like dog will represent a constellation of related semantic concepts. [00:46:07] We think that these multiple representation are due to the fact that the brain needs to represent different attributes of the semantic concept so for example you might have an area down here in your old factory bulb that represents how a dog smells. You might have this area over here might represent which is near auditory cortex might represent how a dog sounds an area over here on the wall might represent where where you see dogs in the environment like in the dog house for example and all somehow or other when you think about the concept of dog all this information is aggregated by the brain and emerges into this magical process we call consciousness. [00:46:46] OK these semantic network interacts with all of the sensory networks so this sense this manic network that I showed you last night is pretty much a model doesn't really care whether the information is coming in by sight or sound or smell but you can actually once you collect a large amount of data across you know just 2nd across individuals you can look for the correspondence between semantic maps a provision with this a mobile network and to make a long story short I'm just going to show you this slide on the lower right here this little line is the border between occipital cortex which is driven by movies and not stories and poly modal cortex which is driven by stories in blue and you can see that on both edges of this line the semantic concepts align almost perfectly so there seems to be there's a whole constellation many many many different high order visual areas that respond to certain categories of semantic things in movies like dogs and cats and buildings and planes and everything you can imagine and those are a raid at the answer reporters' occipital cortex and those feed into another network which is this a mobile network that is that is activated by stories about dogs and cats and planes or movies about dogs and cats or planes or whatever you had a question. [00:48:04] There is is just back here and here there is this so one thing to remember about the ventral stream is M.R.I. has its own issues and one of the problems it has is drop out and places you get the worst drop out are anything near the ears which is a bug. [00:48:22] Of the temporal lobe you have a really low signal to noise and stuff in the medial wall tends to be very far from the antennas that are receiving the signal so you have a low signal to noise and stuff at the bottom of the brain next to the cerebellum So next the ventricles stuff in orbital frontal cortex all those areas are areas of dropout an M.R.I. if you want to target those you need very specialized sequences that will actually only have to compensate for the drop out so things near the ears which is probably around somewhere around here we don't have a noise there so if you don't see something M R M R I's like every other method of measurement if you don't see something just don't think about it just it's not if you're not allowed to say anything about it you see it line you don't see it you can't see anything OK so that's true for this too. [00:49:05] People oftentimes want to say things about things they can't see don't don't don't make that mistake all right so I should mention that this method of system identification for M.R.I. You might think well you know what. Is this good is it bad I don't know you didn't test a hypothesis what should you know what do I think about this the one thing I want to remember is we were optimizing for information flow through the system and this method the easiest way to see how much information you gave for a method is to build a decoder to building it is a bad way to do science but it's a really good way to visualize how much information you recovered and this method recovers way more information than all the other methods in M.R.I. by log units so these are data from one movie experiment we put people in the M.R.I. machine we showed the movies now this decoder on the left which is these 4 panels one is our decoder that was built using essentially a motion energy pyramid kind of like the one I showed you for V 4 where we're decoding information only from primary visual cortex and on the upper left here is the image we showed on the upper right is our reconstruction and just to make it clear how good the reconstruction is we have the edge maps here on the bottom you can see that the edge maps align pretty well. [00:50:17] On the right is a decoder we both using the same data set from another couple of visual areas that are higher order visual aids that represent the semantic information in the movies and you can see that this decoder also works really well we're decoding about 300 concepts simultaneously each Tiahrt each 2 2nd period we can decode about 8 and a half bits of semantic information using this one model so you know you figure there's 400 or so brain areas in the human brain we could build an optimal decoder for each one of those brain areas and decode all that information simultaneously if you wanted to there's a lot of information an M.R.I. just has to be used quickly so in the last couple of minutes I just want to mention one of the other cool things about this method is that it allows you to get maps in each individual brain so so this is one semantic map is another one these are 4 semantic maps from 4 people and you can look at the semantic maps and you can decide are they similar or different now and I won't waste time taking the poll because I've done this many times before a lot of you are saying these maps don't look the same and. [00:51:20] 70 percent right of in these data about 30 percent of the variance is common group variance and about 70 percent of the variance represents individual differences and if you look everybody has 2 red spots in the temporal puddle junction 2 red spots to red spots to red spots to it spots everybody's got some green stuff above that red spot green green green green everybody's got a big red patch with a blue patch appear in the media wall red patch blue patch or patch blue passion where you can see there's commonalities but there are also differences especially in prefrontal cortex can see this particularly nice brain down here you can see there is a orange stripe pinstripe a blue stripe orangey and then some spots and everybody kind of has that. [00:52:02] Pinky Bluey greeny wait so these are you can see that these are kind of similar right now the way we deal with this kind of data and M.R.I. sadly is we average so. This is an example just case you don't do M.R.I. to give you a sense for what happens when you average So imagine you have a bunch of presidents and you decide you want to map them all into some generic president space so your way to do that is just average the heck out of them this is actually intelligent averaging because they take basically key points from the President's I mean align them all together and people do that in M.R.I. they take key points like the right in the align them you know other people aren't idiots they do the best they can with this method but still you see that this average president not only does it not look like any real president but also there are these weird artifacts and it just seems to Laurie and the reason it seems Gloria is you can't perfectly average presidents together because it's an it's a it's a non-convex optimization you don't really know what the right answer is So you're going to do some hacks and anywhere you did a hack it's going to be suboptimal smoothing optimal smoothing is great I love optimal smoothing it suboptimal smoothing because that means you lost information right so this is going to do some optical swing in every place at some optimally smooths are going to lose information and that's what happens in M.R.I. and here's the example of that here. [00:53:20] 8 individual brains we're just looking at the occipital lobe here you can see the data from the individual brains and here is a cross object average we got by using the best possible method for a cross object averaging that you can get in the standard M.R.I. software packages and you can see that a whole bunch of the information from these individual brains is wiped out when you do that so that's bad we don't want to do that we want to basically have a model where we have individuals and we have a group and they're related to each other but when we go from the individual data to the group we don't just blur the heck out of everything and lose all the information because I told you at the beginning of this I don't like losing information so after many years of waiting on this problem we come up with this new generated model for dealing with this which I really like it has the awesome acronym pragmatic which is probably stick and generative model of areas tiling cortex and I should mention upfront that my contribution to this whole project was I came up with the acronym pragmatic everything else of this project was my grad student my awesome grad student now at Texas Alex who is and he developed this Molly said well let's think about this problem as you know of course it is kind of like a pizza where each of the functionally it's a little pepperoni on the pizza now you know if you go into Domino's and you tell them you know the Domino's manager tells the new trainee OK I want you to make all the pieces identical and everybody gets 16 pepperoni as well that that's not going to make all the pizza exactly the same going to be. [00:54:43] Around so the pepperoni is going to slightly shift on every single pizza and that's kind of the problem we have with these functional areas so we said what we can do is we can we can basically learn a division of the brain into the functional areas from this continuous data we can basically fit a ball in spring model to allow for the individual differences and we can learn those ball and spring parameters and then we can basically build a generative model that given a new piece of anatomy will generate the functional patterns for that data and this is really cool nobody had done this before so here are 3 subjects down this way now I should mention in each one of these rows this subject was left. [00:55:20] Out of the pragmatic algorithm so we took the functional data and the anatomy for say 7 subjects we trained it on pragmatic and then we predicted where the functional data should be for the new subject so here is the true subject's true functional map here is the pragmatic parcel ation and this is the likelihood ratios you can see that in all 3 cases pragmatic does a really good job red indicates locations that are semantically selective where the pragmatic model is correct white means it's not semantically selective so them it's irrelevant to the model and blue you know there are some blue areas and blue areas are areas where the model predictions are wrong and these are always areas if you inspect this data closely where it knows there is a functional area that supposed to be at that location but in the real person it's likely shifted off from where it's supposed to be so this is a really cool method that we're developing it's not ready for human use right now Alex is the only one that can use the software and he's a mutant So I'm pushing really hard in his new lab to make this into a software that everybody can use. [00:56:22] You can pull out now a functional network this is a functional partial ation for semantic information that pulls out 150 brain areas from one experiment you have to do 150 experiments to get all the brain areas from one experiment and you can now use this to build optimal localizers to identify each one of these areas whichever one you want OK do I have 2 minutes then no yes no I can quit I can quit. [00:56:49] 2 minutes I don't believe if you want this that this is just be quick this is the last thing because I just want to because I know there's some clinical people here what you want to do is what you would like to do is use this method in the clinic for those of you who don't know M.R.I. is never used in clinical applications except for pre-surgical mapping for epilepsy because if you somebody has epilepsy and a doctor is going to cut out a part of the brain they want to make sure it's not a brain that you will notice is missing so they map eloquent cortex before they do the surgery motor cortex to make sure that they're not damaging anything that they can avoid damaging any more than they have to what we would like is to move this method into the clinic so that we can map individual people that have certain or logical disorders like somebody maybe pre-clinical for Alzheimer's or some other neurodegenerative disease and or schizophrenia or a wide variety of other mental disorders if we can use these hiding dimensional mapping methods they will get way more information from each individual person and will be able to maybe actually make M.R.I. clinically useful so the problem is our experiments in our lab take you know individual subjects in the M.R.I. machine over a period of weeks for you know somewhere between $3.00 and $6.00 hours and that's not clinically realistic you have to be able in the clinic to get your data in 20 minutes because if it's not 20 minutes long it doesn't fit the M.D.'s workflow and they won't do it so we have to try to get the same semantic maps we've got out from a 6 hour data set from a 20 minute data set and that means we optimize experiment design you want to optimize the model so we started optimizing the model by doing the following both an auto encoder they saw on course called a multi view auto in KOTOR it's a certain kind of auto and coder and inspected killer case our views are different brains and different feature spaces and we basically take all the data we have from a bunch of subjects with different brains in different feature spaces and we show that through an auto encoder if you remember an auto encoder is the best unsupervised deep network learning method we have and our own code goes from a big feature space through a little tiny feature space back out to the big feature space and by doing this by having the output layer. [00:58:48] In the input layer the cost function is basically this output has to match this input so essentially the thing learns to match the output as closely to the input as possible given the constraint of this hidden layer right so it's a really good method prevention only with action so we put all these brains into this auto encoder we train the odd one coder and the cool thing about a multi view our own code if you can now take a new brain the data system has no information about this new brain and you can predict on the output and what this new brain should look like and just to give you the last slice of this talk here is data from one subject from a narrative comprehension experiment here is data predicted for this subject from this multi view auto encoder this this data set at the top was collected from 3 hours of people listening to stories the data set at the bottom was collected from 20 minutes of the same subject listening watching movies and none of the story data was used to build this model but you can see that the this model does a really good job of predicting this data so we're very optimistic that this might work in the future. [00:59:51] And I think that's about it thanks very much for your time. T.. We're. Going to questions. Yes. Course F.M.R.I.. Is I think almost quarter nature. For pushing that. There are many people who use F. M.R.I. for subcortical mapping and I always feel sorry for those people. So let M.R.I. is like just it's just radio chemistry right there's an antenna and there's a sample and basically the sample is going to emit some energy basically in this case I'm going to put energy into the sample and then I'm going to turn it off and the samples going to the spins are going to process back down and it's going to emit energy and I'm going to measure that energy on the antenna so like any simple system this falls the inverse square law if your piece of your sample is really nearly antenna the world is great you get high so it's a noise the farther your sample is in the antenna the lower so it's a noise so the antennas on the outside of the head the bowels of the brain the giblets of the brain are like really far from the antennas so the since the noise for cortex is awesome the still to noise for anything far from the cortex is bad. [01:01:21] And then on top of that you end up with a resolution problem which is. You know M.R.I. functional M.R.I. Nowadays the typical resolution going to get is 2 or 3 millimeters and if you take the whole hippocampus in a human that's going to be maybe 5 voxels at best so so you really you're hitting up against multiple problems now Berkeley is actually building the next generation of human M.R.I. for cortical mapping So we're going to be a 7 Tesla stationary magnet but all the guts of the magnet are being completely reengineered from scratch and that should take us for cortical imaging from 2 to 3 millimeters down to 400 microns and that's a good number because of cortical columns about 500 microns across so that if this if this very expensive physics project works we'll be able to actually record call on level noninvasive data from humans but only from cortex because. [01:02:17] The antennas are going to be very small and the gradients very strong and so you'll have really good resolution right next the antennas but once you get down to the giblets you're not it's not going to be better resolution than the other M.R.I. machine. Of course. That's another question that you mentioned was your question. [01:02:36] The you talked about representation of different blocks I'm just curious if that representation changes over time from the context. What what happens. Let me see if I have a slide. So. If I don't have a slide I can describe this. But I don't think I have the RIGHT NOW and not in the stock OK so. [01:03:03] Attention the traditional way to think about attention. Is that it changes the gain of neurons OK but the thing to remember is if you have a deep network so you have the just the cafe network and you change the gain of a small number of units on the peripheral edge of this network that will actually have the effect of changing the tuning at this higher level it will change the features that are represented at the higher level in fact that's how you train a deep neural network great deal no one at work starts with a random weights and then you change the weights by very small amounts and that changes the actual representation at all subsequent levels so it's almost impossible to build a deep neural network that doesn't have the consequence of small gain changes at one level changing tuning at a higher level so the language we use about. [01:03:54] Area rate and whatever happens on the presentation but it's not fixed it's not the right kind of language to use for this because the representation change again I think it's like you know this linear approximation we all make because humans aren't that smart and the brain is and the thing to remember is it's always it's a fiction right so we 1st show tuning shifts in before neurophysiology unlike back in 2006 where we had animals attending to streams of stimuli and you would basically and they were coming in very very fast you tell the animal pay attention the dog attention the cat and you would see that the tuning curve for an individual before neuron would shift toward the dog or shift toward the cat depending on what he was attending to but the amount of ship was small it was only about 15 or 20 percent because that's all of the attention effect you get in before so we did an experiment 2013 which I'm still surprised works because an experiment I tried to get people to do in my lab for 2 years and no one would do it because no one thought it would work and then I had an M.R.I. physics come to my lab he didn't know anything about neuroscience so I could convince him to do experiments that no one else would do so so I was very happy of this because it worked we basically had. [01:05:01] A person in the M.R.I. machine we have them watching movies and a Condition A We said look for vehicles and every time you see a vehicle you hit the button and condition B. look for people every time you see people hit a button OK so now we have to define intentional states and we can model the entire brain the visual semantic selectively of the entire brain under a condition a searching for vehicles and condition be searching for people the reason we pick those categories is they're kind of the most dissimilar categories you can pick a group of about brain representation. [01:05:29] And what you see in that case is that voxels all over the brain will shift toward the attended category the tuning of the box of ships where they attend a category but there's 2 interesting aspects of this 1st of all the amount that the voxel shifts to the attendant category is inversely proportional to how far or proportional to how far it is from the sensory areas so the one voxels shift almost not at all walks of the prefrontal cortex will shift completely and that's also consistent with the monkey data we have from Miller's lab showing that prefrontal cortex neurons are basically completely destroyed by the task they'll either only respond or not respond to being on the task so the other interesting thing we found from that experiment was that most of the brain shifted toward the attendant category but there was one network in the brain that shifted away from the tended category and that's the default network so if you search for a dogs for those of you know who don't know the default network right if you put something in M.R.I. machine and you tell them don't do anything this network of areas in the brain becomes active and it's called the default network because when. [01:06:37] People who 1st discovered this weren't psychologists and they thought when you told somebody to not do anything they were actually not doing anything turns out that the default network is actually the rumination network and if you put them in the M.R.I. machine and tell them not to do anything they start saying to themselves when the hell do I get out of the M.R.I. machine this is really uncomfortable I really want to have a beer Where's my $50.00 what you know they talk to themselves turns out if you have if you ask somebody to talk to themselves and then my machine it you light up the default network so it's kind of the inner directed network it's the network that's directed to the little person inside of you instead of the outside world. [01:07:13] So the default network in these attention tasks actually ships away from the attended category so I have a Just So story about the Save no idea if this is true my theory is you know imagine imagine you go home tonight and your cat Fluffy is missing you all the sudden surge looking worst. [01:07:31] Were you don't want you're thinking about fluffy remembering what fluffy looks like you're generating all these images of fluffy and all these thoughts hypotheses about where Fluffy could have hidden based on her past behavior but you don't you don't want to have you want to respond to those internal representations of fluffy because then you would say there's a puppy she's in my head everything's great right that would not work well so you you essentially detune the network that is generating this internal speech about the thing that you're you're you're on if you're concentrating on the thing in the outside world and you tune all of the resources you can toward the thing in the outside world to optimize your allocation of hardware that's so it's centrally a match filter the brain tries to build as good a match filters it can but it can only build that much filter based on the connections of the neurons in primary sensory cortex neurons only have very local connections and they can't they can't shift very far to the tender category but improve for cortex because of all the and direct connections for the network those neurons have access essentially the entire state space of the entire brain and they can do whatever you want and that was way longer answer than you wanted but but I think attention ships are really important and nobody studies them is really really important so anyway.