• Login
    View Item 
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Learning embodied models of actions from first person video

    Thumbnail
    View/Open
    LI-DISSERTATION-2017.pdf (23.27Mb)
    Date
    2017-08-28
    Author
    Li, Yin
    Metadata
    Show full item record
    Abstract
    Advances in sensor miniaturization, low-power computing, and battery life have enabled the first generation of mainstream wearable cameras. Millions of hours of videos are captured by these devices every year, creating a record of our daily visual experiences at an unprecedented scale. This has created a major opportunity to develop new capabilities and products based on computer vision. Meanwhile, computer vision is at a tipping point. Major progress has been made over the last few years in both visual recognition and 3D reconstruction. The stage is set for a grand challenge that can break our field away from narrowly focused benchmarks in favor of “in the wild”, long-term, open world problems in visual analytics and embedded sensing. My dissertation focuses on the automatic analysis of visual data captured from wearable cameras, known as First Person Vision (FPV). My goal is to develop novel embodied representations for first person activity recognition. More specifically, I propose to leverage first person visual cues, including the body motion, hand locations and egocentric gaze for understanding the camera wearer's attention and actions. These cues are naturally ``embodied'' as they derive from the purposive body movements of the person, and capture the concept of action within its context. To this end, I have investigated three important aspects of first person actions. First, I led the effort of developing a new FPV dataset of meal preparation tasks. This dataset establishes by far the largest benchmark for FPV action recognition, gaze estimation and hand segmentation. Second, I present a method to estimate egocentric gaze in the context of actions. My work demonstrates for the first time that egocentric gaze can be reliably estimated using only head motion and hand locations, and without the need for object or action cues. Finally, I develop methods that incorporate first person visual cues for recognizing actions in FPV. My work shows that this embodied representation can significantly improve the accuracy of FPV action recognition.
    URI
    http://hdl.handle.net/1853/59207
    Collections
    • College of Computing Theses and Dissertations [1191]
    • Georgia Tech Theses and Dissertations [23877]
    • School of Interactive Computing Theses and Dissertations [144]

    Browse

    All of SMARTechCommunities & CollectionsDatesAuthorsTitlesSubjectsTypesThis CollectionDatesAuthorsTitlesSubjectsTypes

    My SMARTech

    Login

    Statistics

    View Usage StatisticsView Google Analytics Statistics
    facebook instagram twitter youtube
    • My Account
    • Contact us
    • Directory
    • Campus Map
    • Support/Give
    • Library Accessibility
      • About SMARTech
      • SMARTech Terms of Use
    Georgia Tech Library266 4th Street NW, Atlanta, GA 30332
    404.894.4500
    • Emergency Information
    • Legal and Privacy Information
    • Human Trafficking Notice
    • Accessibility
    • Accountability
    • Accreditation
    • Employment
    © 2020 Georgia Institute of Technology