• Login
    View Item 
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Urban 3D scene understanding from images

    Thumbnail
    View/Open
    KUNDU-DISSERTATION-2018.pdf (27.35Mb)
    Date
    2018-01-22
    Author
    Kundu, Abhijit
    Metadata
    Show full item record
    Abstract
    Human vision is marvelous in obtaining a structured representation of complex dynamic scenes, such as spatial scene-layout, re-organization of the scene into its constituent objects, support of each object, etc. We also see the complete extent of the scene, even parts which are occluded. For example, even when part of the scene directly below a car is not visible, we infer that it is a part of road. This kind of structured and complete 3D scene understanding is very useful for several applications like autonomous driving. Our objective is to build a 3D scene representation of complex, real-world urban scenes from images alone much like the capabilities of human vision. The classic top-down "analysis-by-synthesis" approach offers an elegant account for such richness in human vision, but is computationally expensive and the resulting energy landscape is highly multi-modal and thus difficult to optimize. Combining top-down analysis with fast, discriminatively trained bottom-up predictors offers to solve this problem. However even recent versions of this hybrid approach are still restricted to toy problems. We revisit analysis-by-synthesis approach for complex real-world 3D scene understanding in light of advances in deep-learning methods, and availability of large-scale training data in the form of annotated images and 3D CAD models. In this thesis, we explore three different scene understanding frameworks with increasing richness in representation. The presented frameworks reasons jointly about the scene structure, their semantic labels along with 3D orientation and position of object instances over time. We also demonstrate seamless integration of different constraints and prior knowledge into our model and an effective fusion of measurements from multiple images in a video into a final representation of the scene. We evaluate these scene understanding frameworks on challenging real-world datasets of complex urban scenes.
    URI
    http://hdl.handle.net/1853/61114
    Collections
    • College of Computing Theses and Dissertations [1191]
    • Georgia Tech Theses and Dissertations [23877]
    • School of Interactive Computing Theses and Dissertations [144]

    Browse

    All of SMARTechCommunities & CollectionsDatesAuthorsTitlesSubjectsTypesThis CollectionDatesAuthorsTitlesSubjectsTypes

    My SMARTech

    Login

    Statistics

    View Usage StatisticsView Google Analytics Statistics
    facebook instagram twitter youtube
    • My Account
    • Contact us
    • Directory
    • Campus Map
    • Support/Give
    • Library Accessibility
      • About SMARTech
      • SMARTech Terms of Use
    Georgia Tech Library266 4th Street NW, Atlanta, GA 30332
    404.894.4500
    • Emergency Information
    • Legal and Privacy Information
    • Human Trafficking Notice
    • Accessibility
    • Accountability
    • Accreditation
    • Employment
    © 2020 Georgia Institute of Technology