• Login
    View Item 
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Structured visual understanding, generation and reasoning

    Thumbnail
    View/Open
    YANG-DISSERTATION-2020.pdf (42.87Mb)
    Date
    2020-01-13
    Author
    Yang, Jianwei
    Metadata
    Show full item record
    Abstract
    The world around us is highly structured. In the real world, a single object usually consists of multiple components organized in some structures (e.g., a person has different body parts), and multiple objects usually exist in a scene and interact with each other in predictable ways (e.g., man playing basketball). This structure manifests itself in the visual data that captures the world around us and in the text describing it and thus can potentially provide a strong inductive bias to various vision tasks. In this thesis, we focus on exploiting the structures existing in visual data to improve visual understanding, generation and reasoning. Specifically, for visual understanding, we model structure at different levels to improve image classification, scene graph generation and representation learning. In visual generation, we exploit the foreground-background structure in images to generate images in a layer-wise manner to reduce blending artifacts between foreground and background. Finally, we use the structured visual representations as the intermediate interface to bridge visual perception and reasoning to address different vision and language tasks, including image captioning and visual question generation. Through extensive experiments, we demonstrate that leveraging structure in visual data can not only improve the model performance, but also make vision and language models more grounded and interpretable.
    URI
    http://hdl.handle.net/1853/62744
    Collections
    • College of Computing Theses and Dissertations [1191]
    • Georgia Tech Theses and Dissertations [23878]
    • School of Interactive Computing Theses and Dissertations [144]

    Browse

    All of SMARTechCommunities & CollectionsDatesAuthorsTitlesSubjectsTypesThis CollectionDatesAuthorsTitlesSubjectsTypes

    My SMARTech

    Login

    Statistics

    View Usage StatisticsView Google Analytics Statistics
    facebook instagram twitter youtube
    • My Account
    • Contact us
    • Directory
    • Campus Map
    • Support/Give
    • Library Accessibility
      • About SMARTech
      • SMARTech Terms of Use
    Georgia Tech Library266 4th Street NW, Atlanta, GA 30332
    404.894.4500
    • Emergency Information
    • Legal and Privacy Information
    • Human Trafficking Notice
    • Accessibility
    • Accountability
    • Accreditation
    • Employment
    © 2020 Georgia Institute of Technology