Thanks. Audiovisual OK. Good good good. So thanks Maria for the introduction thanks a lot may be here to kick off the electricity for the new season I'll give you a little bit of background about myself first to some minor correction the title is a little bit misleading we're going to touch on malware but there are other groups who are much more familiar with malware that I have my specialization Rose My teams are the brain of my operation I just get to get up here and talk and make myself look like I know I'm talking about but how we specialized in sui specialize in software security so get your bearings I work for. This is Georgia Tech Research Institute we're a part of Georgia Tech most of you are probably familiar with it but those for those who aren't were the Apply research arm of Georgia Tech I'm considered a faculty member at Georgia Tech but I'm not a professor not on a tenure track I have taught before of top of the introduction information security course here we collaborate commonly with our academic colleagues but we're just a separate side of the health so I enjoyed from Georgia Tech my particular lab the cipher lab is really the cybersecurity lab of G T R I we do all things cyber security the. There's very little that you will find in cyber security that is not somewhere in someone's bailiwick within our lab so we have a group that does model where analysis indeed we have a group my group because I'm within a larger group that does the security of imbedded systems and cyber physical systems my particular expertise is Software Assurance all we also have a group that does secure networking we have a penetration testing team we have a security group that does multilevel security systems so there's a lot going on there G T R I We also teach professional education courses in cyber security and. Just like I said about everything that has to do a cybersecurity. I want to know about software assurance today now. I have a challenge and that is that there are a lot of people in this room who are already well versed in software security and will understand a lot of the techniques I'm going to talk about but there are also people in the room who have less of a background in computer security I need to be able to meet the kind of meet in the middle so here's my the something I'm going to make Here's where I'm going to land I want to assume that everyone in the room pretty much has some strong computing background and understands the basics of programming at least understands the basics of computer architectures worth of some assembly code I'll talk about that a little bit. There are going to be people like well I'm going to soon that also going to assume that the techniques I'm wanted to present here maybe you haven't had a great chance to delve into so maybe that's going to be the happy middle of the so that we can reach the as both sides of the audience as far as those who are very technical adept at computer security and those who are more beginning to think about computer security Maria likes it in these lecture series when we can talk about the kinds of projects that we're doing within our organizations and I'm going to talk about the. A little bit unfortunately because of the nature of our worth there's a lot that I can't talk about that in a public lecture and I always hate it when people say that it sounds like it's a cop out for like hiding something but frankly the fact of the matter is I have to be a little bit generic in what I say so if I seem a little bit guarded at times that's what that's why. It's Friday afternoon we have pizza we're supposed to have a little bit of fun with this so let me set the stage by going back in time to just shortly past World War two to the tech model railroad club of MIT at the tech model railroad club at MIT they were really and there still are two kinds of students there is through the students who enjoy the modeling that happened above the layout that is working with the details of the trains painting making the layout just absolutely beautiful and just a nice realistic model railroad layout and then there are the students who enjoy working below the layout and on the signals and the power that is making the train move and do what it's supposed to do we have a big model railroad layout you have lots of moving pieces there's complexity involved and that in one thousand nine hundred forty six there's probably a few in the room who can remember that far that back that far one hundred forty six was the dawn of computing right there were not readily available computers to control things like the railroad they all said we would expect to have and used to control them today so when the students at the tech model railroad club came up with a clever way to do something they call that a hack so for example one of the hacks that they had was you could stand at the control station of the railroad layout you could select the portion of the track that you wanted to control and their system would do some big you wait your control inputs from control inputs that came from other stations to control the layout of the. Large railroad layout. This was done with mechanical relays mechanical switch switches and equipment largely donated from the phone company. As time went on as computers started to appear on the MIT campus during the one nine hundred fifty S. and into the one nine hundred sixty S. This same group of students took their culture their hacking culture they have Eve those and the terminology forward and brought it into the computer labs of MIT as the years went on the term half and hacker became popularized personal computing started to appear in the seventy's for businesses and home especially into the eighty's with all the rise of personal computing and the term hacker took on more of a negative connotation it because on a connotation of people who tended to break into computer systems and do malicious things that they're not supposed to do and we still use it that way today rightly though it's interesting to note that recently the term hacker has actually picked up a little bit more of its original meaning when we talk about things like life hacks or hack a thons hack of the day this kind of terminology I'm glad it's taken on this less malicious tone these days so for assurance is really about avoiding software hacks so we all most of the people this room probably have done some kind of software hacking before we all enjoy it we love it we like to take computers and make them do things that they're not supposed to do special tricks there's nothing wrong with that it's a lot of fun but we also live in a world where we all carry smartphones for example and on our smartphones we want to be able to access our mobile banking site and we want to be able to look at pictures of cats we want to be able to install puzzle apps from possibly dubious sources maybe from different kinds of app stores but we also want to be able to do our secure business logons from that are two factor authentication other applications that require security so the on. Operating System on a modern mobile device is supposed to give us certain guarantees that is the different applications are not supposed to interact with each other unless you explicitly allow them to do so and the operating system itself is supposed to be closely guarded right so this stands in contrast by the way to the state of computing and in the eighty's in even into the ninety's to some extent when there was assumption that with personal computing if you were running a piece of software you knew it was for you trusted it in L. the software could talk to all the other software that address space had a flat memory layout there was no not a lot of protection and modern and mainframe computers and large systems in that era there were protections we saw that businesses were realizing the need to have different compartments of security different user logons and the ability to share systems but it wasn't so in the personal computing field we're still to this day paying that debt right so in the ninety's with Windows in Apple in that we saw that business computer systems for the it were run ran on personal hardware were deployed they're supposed to give us access controls but we still know that IO A. Ten we know that Windows and even Linux there are vulnerabilities that exist in those operating systems and in the software that run on those operating systems this is really what we focus on when we talk about software security is how can we one write software that is secure against these kinds of attacks that we see that spreads where and spreads fire assists and then to on the flip side of this how may we go about exploiting software so the understand how to secure something I think it's important to understand how to exploit it and that's a lot of fun as a matter of fact a lot of the middle and technical portion of this talk or talk more about exploitation because that's what we need to be able. All to. Ensure that we can write secure software one thing I like to call out is that this doesn't just for today in these days to mobile devices or to desktop computers it also very much pertains to embedded devices or even the Internet of Things which I tend to think of as the extreme in of unvetted power we have limits on a light bulb the recent rise of the button was the marabout nets and then the. And these other bot nets these relied on secure weaknesses in these Internet of Things devices to spread is a new it's a new security landscape these pictures here show what's going on this is the delta electronic in-flight entertainment system and you can see it runs Linux and runs up with patients on Linux so we're entering a world where security and security across the board is more and more difficult to guarantee So what is Software Assurance Once again I disobey do a software assurance from Enterprise assurance overall our network operations I think of software assurance as taking and ensuring the security of an individual piece of software or an individual operating system in contrast to network operations for example where we do maybe pin testing of an entire enterprise an organization and try to find the overall weaknesses in its network software is complicated write letters kernel this was I did this slide months ago is what's probably grown out of another order of magnitude when our in the Linux kernel contains over nineteen million lines of code fourteen thousand different contributors it's easy for just one system to fail and this by the way doesn't even include statistics for. Applications that run on the kernel like apache or the other applications some of which may run with root privileges there's a lot they can go wrong. I love this graph this is a graph I put together. I took There's a database that is maintained by an organization called miter that might or maintains a COM a database of common vulnerabilities and exposures C.V.S. it's just a database of problems in from puter security and it gives the security community a common terminology and a common way to think about particular exploits and particular problems with computer security I categorize the seat raw cv count by year and the statistics lie a little bit some of the C.V.S. a reserve or have just made me have not been fully developed but I think that the trend is telling in one thousand nine I it was down there you see this linear growth and it's not stopping even though in the last ten to twenty years I think that computer security has become a lot more a lot more in the public eye Microsoft has gotten much better and much much better than they were twenty or even fifteen years ago about the security of windows at the cv eke out still continue to rise we still continue to see exploits like eternal blue pop up that spread ma Ware and to make make headlines so I'll told you I like this craft the reason I love this craft is because it shows that I will never be out of a job as long as I can keep my ethics clane there's always going to be security work to do for those of you who are on a computer security track should also find this either encouraging or discouraging just depending on the way you want to look at it there's a pattern though that I want to recognize that we teach in our search here cyber systems course bugs in software lead to vulnerabilities which lose exploits so a bug could be something like a crash it could be an unexpected state in the computer program it could be a logic error that ultimately opens up a vulnerability ultimately in order to understand what kinds of bugs lead to vulnerabilities we're after just learn by doing and by example so the best hackers are the ones who just get in. Roll up their sleeves and do the hack of the day they capture the fly there's a couple after the flies going on even this weekend that some of you may be taking a part of a part in that's because bugs which lead to vulnerabilities of ultimately lead to exploits that is when you use a vulnerability against the software this is all about finding exceptions finding out where things go wrong. So I'm full with it here but why well I was actually working on this slide deck I did indeed see a Microsoft Powerpoint crash the powerpoint crashed on me there's a lot of Sly's a lot of megabytes I was flipping through the slides quickly. What I thought as a security researcher was a ha I have a crash let's see maybe and I didn't have the time to do it with a tirade but I said we could open this up in a bug or find the cause of the crash and perhaps it's exploitable now Powerpoint is used to Preview files that come over via e-mail it's used to open files often from unknown sources or downloaded off the internet even previewing files on a web browser so it's not impossible that the crash that I saw while I was working on this very slide deck would have been a vulnerability that could lead to some exploitable condition without further research we don't know. So I told you that to understand software vulnerabilities we just need we need to understand the types of things they can go wrong let me just get you thinking in what we call the security mindset let's talk about a few types of things that can go wrong this is by no means an exhaustive list or something out there called the national vulnerability database in the Nvidia eight it's a good it tries to taxonomy vulnerabilities and put them into into something understandable and it's hard to do is hard to do right because all these vulnerabilities again are weird exceptions member. Mismanagement is at the top of everyone's list right this is kind of exploitation one hundred one and that's what we're going to talk about more in the next few slides this is when we do not software does not properly manage its memory I'll go into that in more detail these are my pet I put this list together just based on some of my pet peeves by the way the misuse of cryptography or security protocol calls if you've not taken a cryptography class you should watch your one of the cryptography class is that profile graffiti is really really hard to do right computer security in general is hard to do right there's lots of that in the ROM but especially with cryptography because when we look at the big actors in the cryptography world and last name of the N.S.A. the English the Chinese the ones who study this kind of science in great detail no one really that we think we think really breaks the underlying cryptographic primitives even the schemes when they're done right are very hard to break we can make schemes that are provably correct everyone goes after the implementation yet still it's easy to misuse cryptography especially for developers who either decide that they can do it better than the libraries that come with the operating system or come with the programming runtime and it's easy to mess up the way that cryptography is done protocols it's easy to mess to mess up the way the protocols are done last failure to validate a service or to ficus is often a common source of computer of weaknesses in software and devices misconfiguration happens all the time in software people leave on the full credential people configure their web server in ways that are insecure again hard to do right leaving had to buy in features This is why I put this Easter Ace here software developers I've been in that software development world for twenty some years now. And software developers love to expose the bugging features and leave the building features on because hey we may use them someday or it makes it easier to develop Well it also increases your attack surface right so Easter eggs is sometimes what we call undocumented features and software undocumented features are bad and secure software they can be fun but secure software should not only do what it's supposed to do but it should also not do anything that is not supposed to do otherwise we open ourselves to vulnerabilities race conditions this is a particular condition where multiple processes or multiple threads or in the case of the cloud maybe even multiple systems are trying to share the same object and they can only get it right so something happens in the sharing of the objects that sets that object into a state worth exploitable. I want to talk more about memory mismanagement and to do this we need to dig a little bit into the way that P.C. memory works and process memory is managed so again process is like a application running on a piece of computer this diagram is notional So for those who really understand the loaders and architectures of computers you'll know that the layout here is not exactly what it looks like in process memory but this gives you an idea that when you start a process on Windows it's an executable file and Linux is also an executable file just of a different format from the operating system loads that executable into memory and gives it in modern systems a virtual address space so it's not real physical memory is a virtual memory that the process can then access in order to do things like store process data but notice also that the program instructions are mixed in typically with the same virtual address space as the data that the price. Assessed manipulates. We can also have other program data stream tables kernel data and they have this reserve as a matter of spaces reserved for the kernel to work and all of this memory address layout space is intermixed Now modern operating systems give us some protection on the way that different parts of this address space can be used but. There are still tricks that we can use to subvert some of those protections either take away the. Memory the program instructions and program data and the data that comes in from the user are all mixed in the program's virtual address space in particular buffer overflows one o one will concentrate on buffer overflows that happen on the stack those is a portion of computer memory that's reserved for local variables it's kind of a done done in a way where you think of stacking books as function calls happened the function well put on the stack it's the data that it needs locally for that function so the local parameters and variables for that particular function if it calls another function it stacks upon that the next set of variables for the next function that's been called it also stacks of this is very important so control flow data specifically like the return address for example this means that when a function pushes data onto the stack it's mixes the user data that you see up here the variables with the return address data the typical traditional buffer overflow on the stack works like this if we can write data on the stack that somehow goes beyond the memory allocated for that particular data we may be able to I say we were talking about the attacker in this case the attacker may be able to write data that overwrites program control flow data thus giving the attacker capability. Control of the process and question. So in one thousand nine hundred eighty eight I was in Mrs all Byrne's computer science class this was a middle school and Mrs All were imposed on the board a newspaper clipping that talked about an attack that had happened on the Internet now I had never heard of the Internet before in one thousand nine hundred eight for me computers were like Mighty and I and I the helm of the Apple two that we have the computer labs at school I didn't realize that there are so many computers being connected across the world but that newspaper clipping talked about a worm that spread between computers across the United States across the world really and that had propagated itself I don't know if you use the term warm at the time but that's what we call it now we call the Morris worm the more swarm is a great example of how software security can go wrong this is code from finger D. This is the finger daemon which in one thousand nine hundred eighty typically ran on Unix systems with root privilege there is very little protection and what you'll notice from the thing and it was one of the very vectors by which the Morris worm spread I think the for those who haven't used Unix in that capacity before it allows one UNIX system to get information about the users on another Unix system so it's listing on a network socket running with root privileges at least in this time for ERA and here's the implementation of it you'll notice a few things the program fingered a allocates the variable line on the stack with a fixed length of five hundred twelve bytes. But here in the Gets command the function reads from standard input into the line gets in no way to check the length of data that it's receiving it just reads it will populate the buffer and. Possibly right writes past the end of the allocated buffer This allows us to do what we talked about on the previous slide laws is to overflow the buffer override possibly control flow data like the return address for air and then take control of the program. So and the course of presenting preparing to present this I decided that I was going to go back in time and I was going to write an exploit for finger D. I wanted to see how it may have worked I want to give it as a demonstration of how an actual exploit works and how an actual actual exploit looks so this is a by an Area editor it's just one that shows where I edited and manually created the by unary data that I was going to send to finger D. to exploit the system. I want to call I want to call a few things in this exploit first of all most most of it is just capital lays right there's a bunch of four ones most of it is just unused just space filler fill it just that I don't need to be concerned about but the important parts are here at the beginning of the exploit we see something we call shell code now a shell code is instructions are data that when disassembled corresponds to something code that actually executes in a. C.P.U. it's machine language data so here's the corresponding disassembly of that shell code what you'll see here for those who are familiar with computer architectures is that this is setting up the stack for a function call and then it's calling this exact function. Down here after enough data after enough. Padding we actually have the address of the shell code and that's what's going to overwrite the return address pointer on the stack and that's when a lot of us exploit the same dirty daemon I took me I wrote this It took me. Probably few hours to get it right there is a little bit of back and forth little book trial and error when you're right in this. So we'll talk about performing the exploit this disassembly is the disassembly from the finger of the daemon now I compiled it on Linux not on the systems it was originally written on and I turned off a few things in the compiler some optimisations I made my job a little bit easier by changing the way that the that the compiler and the operating system interpret programs things are supposed to prevent attack of the tackle from being able to write an exploit in just a few hours doesn't mean the attacker can't write an exploit it just means that there are some advanced techniques that are involved so here's the corresponding disassembly Here's the return address this instruction sets the instruction pointer based on the return address that is on the stack so that include can clean up that part of the stack and go back to the function that called it but my exploit has written over it in the proper values from the stack so you can see here I've run the exploit and if you do everything carefully enough you could run the exploits with the press anyone ever being any the wiser notice first of all they ran the exploit in G.D.B. that was because that's that's where I developed the exploit it was taking a little extra work to actually have written it so that it would be absolutely so they would be. Stable in a production environment but the cause is that the finger did damage it X. It exited Normally I could've called this in in the blink of an eye so over the network the traffic would have come in and it would have performed the exploit the process would have exited normally yet what my particular exploit did was install the backdoor on the system it allowed me then to log into the system using a password less. Account that I created. That's buffer overflows want to. One that's going back to one nine hundred eighty eight it's still largely done that way I talked about some of the protections that are done in modern operating systems let me go a little bit deeper than the first time I've presented this because I do know that we have a lot of people in the audience who are technically savvy how do we prevent an attacker from doing this kind of thing well modernly we have what's called a S.L.R. either a space layout randomization this is one thing that makes it so the computer or when the loader loads a program into memory it doesn't necessarily put all of the code at the same spot the same times it loads it it randomizes the layout to some extent so that that way is harder for an attacker to figure out where the code that he may need to use the exploit is another thing we do notice that my shell code was on the stack I just pushes straight through with exploits a modern stacks have a bit set in the memory pages call the no execute bit this prevents the shell code from running on the stack and means that tack or has to find another place to put the code before the C.P.U. will run it or has to find another technique like return oriented programming to exploit the stack and then also the last thing that we do. Sometimes will change this will put something called a canary on the stack for example a stack Canary is a particular number that if it gets overwritten you know that the stack has been corrupted and then the debunker or the operating system can detect stack corruption and shutdown the program before something bad can happen. So a few years ago the college of computing in two thousand and fourteen needed a sucker to teach introduction to undergraduate information security class financially they called me naturally accepted I was in the unit where I was talking about T.L.'s transport their security this is the protocol right that's pretty much secure. The Internet in a sense. At the end of my T.L.'s unit I venerated Open S.S.L. I said use Open S.S.L. if you're going to use if you're going to use T.L.'s I said it's been well inspected by the security community uses good programming techniques I think it's the way to go in the secure way to go and then not two weeks later heart played head heart bleed was the biggest probably the most impactful or potentially impactful vulnerability that we've ever seen in modern times right. On a blue Schneier the security researcher said on the scale of one to ten Heartbleed merits and eleven laws security researchers were no joke saying don't use the internet for anything that you care about no banking nothing but Capek's years until such time as the dust settled hard plate is patched and then once you do start using the Internet again change all of your passwords assume just assume that everything was compromised so this X. Casey the card to a cartoon did a good job explaining the Heartbleed bug I know that it's too small for you to read so let me just explain it to you hardly works like this the client connects your web browser typically connects to a server during the course of that connection the T.L.'s session your client can send to the server something called a heartbeat request since the name Heartbleed the client says a server just want to check if you're still there if you are send me back three bytes and put the contents in those three bytes put the word through or you can say simply back five both five bytes and put the context tech one just whichever And so the server were great should reply to your heartbeat request and let you know if the server still there and you can check to see the integrity of the connection still looks good from your standpoint. Heart Bleed work by fish you could send to the server our heartbeat requests a hard. Request you may say hey server are you still there if so respond with the words through and by the way through a sixty five thousand bytes long and the server were then sent back a buffer of sixty five thousand bytes that started with the word through but then after that could be any data that the server happen to be processing on the heap at the time and the way the Heartbleed works we can understand it pretty easily is that it was a it was simply a failure to mediate and correctly check the data that the user was sending so we can see here here's the heart of the Open S.S.L. code before Harvey before Hartley was patched the length of the payload of the heartbeat requests that is the data that is supposed to store or send back is stored in this variable payload that's the length of the payload. Open S.S.L. then allocates a buffer using payload plus and necessary padding for the heartbeat response as allocates it beautifully just as much data as it needs there's no problem with allocation here everything is well coded and coded correctly. And then it copies from the buffer where the user's original data was stored it copies payload bytes of data into the response buffer and sends that response back to the client. It's obvious when you think about it though that if payload is larger than a response it's going to send sixty five thousand bytes of data off the heap and here's the neat thing about it Heart Bleed not only can you do it once you can do it over and over and over and over again with the same P.L.S. connection with very very very little chance of crashing the server it's possible that you could try that this could then copy could read into a page of memory that would give you a segmentation fault but it's very unlikely whatever the server happen to be thinking about on the heap maybe from different threads of the server service in different users it could be private keys could be passed. It could be anything where you'd be sent back and you could do a sixty five sixty four share time this in a code review and when you for those are students when you start doing development you're going to start reviewing one another's code right and you're going to start looking over it for problems this is very unlikely to be caught and a code review right unless you just happen know this that hey this payload has been tainted and is no longer as specified by the user. One more example of a vulnerability that I want to talk about I can admit it to some of you may know one of the researchers on my team he crafted this for a billet and it's really clever it's very very subtle again is another vulnerability that is very unlikely to be caught in a code review and because we have so many technical people in the room I want to spend a minute here and see it OK is it too faint to read the code. OK I want to spit it out and I want to see who in the room may be able to see the see the vulnerability. And so while the world comes crashing down I'll give you some hints. First of all we're going to read some data we're going to process it where Read it and a number. And here we ensure that we want overflow the buffer looks good if I were doing this in a code review I would as I would look over this and say yeah everything is being checked just right we're even checking to make sure that the F.O. can work correctly or check and make sure that we're not going to overwrite the buffer this is well written but where's the problem as anyone. But that's OK be. By this buffer that we're part reading and. Is an unsigned char so. Look at the declaration of numb bytes. Here's an exploit I wrote this is just a denial of service Alan I think it clever I just wrote this exploit so that it would crash the program why does this crash the program one little endian format and this be architecture I used it. So the first four bytes in this case is going to be read into numb bytes. And little endian when you have a signed integer What does this give you. That starts with an eight. Year negative number. So in this case number. Is going to be interpreted as a negative number. It's certainly not going to be greater than max bytes so this check here on the buffer side will pass and will go the program will go right on reading data that the user specifies into the buffer but here it's going to numb bytes will be implicitly cast to a signed type and it's just going to keep on reading overwriting the buffer or writing as much as you care to write a buffer the code is absolutely completely broken you see how hard it is to catch these kinds of software vulnerabilities that's why that common view C.V. database keeps growing so I want to transition at about this point. The portion of the talk where I start to talk about what is it that we can do about it well talk a little bit about the projects that I have going on what can what. What we've done about it and I calculate I have what ten minutes left. By the way. Here's where I read it like I said I just I said fault I could have exploited it but I just had so much time in the day right notice I compiled it. With wall all weren't warnings turned on and still the compiler didn't report any warnings didn't give me any trouble KOTOR of you want to win the call if it's completely broken. But there is a technique that likely would have caught it it's called fuzz testing and that's the first technique I want to talk about this is something that my team has developed a name for We've developed a name for ourselves and first test and first as he is well understood I've been happening for two or three decades in the last ten years or so we're seeing more and more intelligent approaches to fuzzing my teams expertise by the way just to give you a preview of coming attractions we've made a name for ourselves of fuzzing embedded systems which is a little bit difficult more difficult but fuzz testing is this we take a program that we want to analyze or an operating system or sometimes it could either be a protocol Well let's just think of it in terms of a program let's think of that particular program we looked at just a minute ago where reads from the user data from that from a file and we start with a set of seed inputs see the inputs could be inputs to the program that are well that we know are supposed to work with the program I could be seeds that have crashed the program in the past and we want to do some regression testing or he could be seeds that we suspect may be problematic we just have to generate a set of seed inputs the idea behind fuzz testing is then to run the program repeatedly. With mutating those seed inputs according to a. Set of rules that are well understood. We flip bits we flip we change by this we continue mutating until we find a seed input that causes the program to do something different now I got a technical audience here so I can tell you a little bit about what we mean do something different that is follow a different execution path could be something that causes the program to crash we really want to search and explore all of the paths that could happen in the program so notice then to do this while its first test being first tested we have to somehow be able to examine the state of the program and we do this typically we can either instrument the by if we have the source code we could in some of the source code but sometimes it's easier to instrument the by unary that is to put instructions in the by unary that allow us to hook to it almost like it to bugger or we can hope to elected to bugger or in the case of embedded systems we might while running emulation sometimes the hypervisor will let you inspect the process state or the operating system state and know what's going on right it's a little more technical that I gave before and I've given this lecture in the past so we have to understand what's going on with the program that can be the tricky part we continue once we find an input that does something different we've explored a new path of the program this is good so we can continue to explore we add that really to the pool of seed inputs and we continue to explore and mutate that end but looking to dig further into deeper paths if we find an input causes a crash then that's good we like that because that means that here's something we can report and remediate and well also add that to the set of seed inputs that will start mutating that inputs we've taken this address space this input space which could be infinitely large depending on the program and we started we figure out a way now to search that input space in a way so that it's meaningful so that we can explore the path the program and look for and puts the. Was unwanted program behaviors that again bugs lead of our abilities that lead to exploits that's the basic idea behind first tested. Or even example I wrote this vulnerable program it's pathological in some sense it has a mediated user input it has a stat buffer overflow the code review hopefully would catch this right I mean it's pretty obvious what's going on hasn't held floating point perception those usually are not exploitable but it could crash the program which could be bad enough or you could be able to use the unwanted state of maybe crash a thread and then exploit a race condition it could be part of a chain of attacks we have a user control format string. So search show it off I read word hog our first tester on it and it found this long input crash the program I found the percent in perhaps the program for those of you speak C. you'll remember the percent in rights to the place of memory how many point in memory how many characters have been printed so far right so this format strain allows the user to control processed memory to some extent and it found this floating porting step in a didn't find what oxy It started with one hundred C.B.C. just somehow got there sooner so here's the crashes that we found would find testing have caught the subtle vulnerability Absolutely that's a great case for first a stamp program because certainly it would have flipped that bit found that a negative number crashed the program that would have happened soon the first one the finger D. certainly would have caught that as just a simple read from standard IO If you're sending a data flows or would have quickly found that hey there's an input long inputs in general fresh the program all bleed a little bit is a little bit different Heartbleed may not it depends on how you instrument of the program and if you had been done for protocol fuzzing on Heartbleed you likely would have caught it that way but just with the kind of fuzzing that I've talked about for. Less likely you would have found it. But we can get smarter than just regular fuzz test and I've talked about how we're searching the input space of the program for inputs Well we all we know for those who have had theory we know that the path the number of paths the program can follow is this is not necessarily a tractable problem if you want to go the pathological case think of a program who to get to a certain put path there's a hash function in the input data has to match a certain hash in order to get to that program hash to get that path of the program this can be done more work sometimes does this kind of thing to serve to make sure that it's running on an environment that is interested in running on so that obviously a hash function if it's done right we don't know what preimage is will get us to a certain point the program so that's not tracked of all but sometimes we can do analysis of a program and get smarter than just blind fuzz and so you look here I've done the strange set of input where I've changed to gather a password. We can take a program and do what we call symbolic analysis of the program that is where we take the by unary we lift the by an area into an intermediate representation that we can then solve using mathematical techniques like that just a simple. Set of predicates that you see here and we can solve for which inputs are going to lead us down a certain program path what the community the program analysis community has found is that user code reviews often find on the order of this was there was a study done so forgive these numbers are not in them exactly for efforts on the order of of a challenged by an areas where there are hundreds of vulnerabilities that typically find code reviews will typically find five ten percent of the vulnerabilities that hard to find random fuzzy symbolic execution will find a much greater percentage of the vulnerabilities. Fuzzing not random fuzzing but intelligent fuzzing with inspection of the program state can get in find even more vulnerabilities the best thing though that we've got so far for finding vulnerabilities of in a program are techniques that are typically conchology execution we combine some dynamic fuzzing with symbolic execution that is when you have a hard to solve code path we stop we analyze that particular code path and we find how we can break through that code path and then send that to the fathers of the fuzz you can continue on its way in some of the cyber Grand Challenge by an Aries this can cause a profound as many as half of the vulnerabilities right there half still not there still very hard to find. So I always talk a little bit about what we do edgy each year I this fuzzy technique that we've done that I have expertise in enterprise posies pretty well understood some of you have probably used F L before Microsoft has a closing platform Google has first or is out there it's a well understood technique so Windows Linux even I.O.'s we've done these Windows and Linux are again fuzzed commonly my technique in particular has done vx works and embedded so you wonder why there's a picture of a cat there if I had been given this presentation to a sponsor of G.T. or I see the image of the particular kinds of systems that we fuzz against was cop out I can't tell you what we're fuzzy Exactly so I hope you can enjoy this picture of my daughter's cat in lieu of the image but use your imagination think about things that drive critical systems especially the government may be interested in that's what they want to find vulnerabilities and that's where they arise expertise in particular lies. So I'm right on just about on time let me in not going through this entire top ten list which by the way just scrape it off of Carnegie Mellon's. The Software Engineering Institute website called a few things that are important if you're going to write secure software and that is this for four years effective Q.A. techniques don't just check your software this is a student we love you students but students software is software that is made to do what is supposed to do then you're done right you submit it you get to grade pound on your software use fuzzing get past just the regular validate assumptions test for failure condition expect failures to happen. Minimize the attack surface Don't be clever keep it minimal separate your modules and testable functionality and lastly this is the biggest one the number one don't trust anyone internally in your program examine your program state especially when you receive data from outside validate that data make sure that the change of data is sanitized completely. I would. Rather just average it only it wasn't quite that loud so I think that we're at the end of the hour this is good because I don't have to answer any card questions but I'd be glad to stick around and say hello Is that OK. I know that you have other classes so if you got to go please go won't be offended by the questions. I. Will. Write. This. So for the. So let me change your question a little bit so his question was about code coverage we're not so much interested in code coverage here then as we are past. Coverage so we can if we even if we're able to test one hundred percent of codes we're really. More interested in how do we get to the paths that we get so in one nix My understanding is that we do it through binary instrumentation we have piece of malware I think comes into our now as this platform we want to do some analysis on them now where I will inject commands into their I think they're somewhat above us like in three just in Iraq or something like that though our standard spec fact coverage and spec the states we can break it we can look at the path of the code is taken and we can find out if we've hit that path before. The silence or question. Of those things just. You know we do basic box so it's basic what coverage are the path analysis using basic blocks and like I said my understanding is that we instrument the programming by their instructions so that when your basic walk is found we can get information back from to our. Analysis tool that tells us we've hit a new path. OK. Thank you.