• Login
    View Item 
    •   SMARTech Home
    • Institute for Information Security & Privacy (IISP)
    • Institute for Information Security & Privacy Cybersecurity Lecture Series
    • View Item
    •   SMARTech Home
    • Institute for Information Security & Privacy (IISP)
    • Institute for Information Security & Privacy Cybersecurity Lecture Series
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Automatic Feature Engineering: Learning to Detect Malware by Mining the Scientific Literature

    Thumbnail
    View/Open
    dumitras.mp4 (523.3Mb)
    dumitras_videostream.html (985bytes)
    transcription.txt (48.52Kb)
    Date
    2017-09-29
    Author
    Dumitras, Tudor
    Metadata
    Show full item record
    Abstract
    The detection of malware and network attacks increasingly relies on machine learning techniques, which utilize multiple features to separate the malicious and benign behaviors. The effectiveness of these techniques primarily depends on the feature engineering process, which is based on human knowledge and intuition. However, given the adversaries’ efforts to evade detection and the growing volume of security reports and publications, the human-driven feature engineering likely draws from a fraction of the relevant knowledge. In this talk, I will present an approach to engineer such features automatically, by mining natural language documents such as research papers, industry reports and hacker forums. We utilize techniques inspired by IBM’s Watson question answering system, and we address challenges and opportunities specific to the security domain. As a proof of concept, we train a classifier with automatically engineered features for detecting Android malware, and we achieve a performance comparable to that of a state-of-the-art malware detector, which uses manually engineered features. In addition, our techniques can suggest informative features that are absent from the manually engineered set, and they can link the features generated to human-understandable concepts that describe malware behaviors. Finally, I will discuss the remaining challenges for automatically extracting semantic security insights from natural language and the opportunities that this direction opens for understanding and predicting adversary behaviors.
    URI
    http://hdl.handle.net/1853/58827
    Collections
    • Institute for Information Security & Privacy Cybersecurity Lecture Series [118]

    Browse

    All of SMARTechCommunities & CollectionsDatesAuthorsTitlesSubjectsTypesThis CollectionDatesAuthorsTitlesSubjectsTypes

    My SMARTech

    Login

    Statistics

    View Usage StatisticsView Google Analytics Statistics
    • About
    • Terms of Use
    • Contact Us
    • Emergency Information
    • Legal & Privacy Information
    • Accessibility
    • Accountability
    • Accreditation
    • Employment
    • Login
    Georgia Tech

    © Georgia Institute of Technology

    • About
    • Terms of Use
    • Contact Us
    • Emergency Information
    • Legal & Privacy Information
    • Accessibility
    • Accountability
    • Accreditation
    • Employment
    • Login
    Georgia Tech

    © Georgia Institute of Technology