• Login
    View Item 
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    •   SMARTech Home
    • Georgia Tech Theses and Dissertations
    • Georgia Tech Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Deriving Sensor-based Complex Human Activity Recognition Models Using Videos

    Thumbnail
    View/Open
    KWON-DISSERTATION-2021.pdf (25.54Mb)
    Date
    2021-10-14
    Author
    Kwon, Hyeokhyen
    Metadata
    Show full item record
    Abstract
    With the ever-increasing number of ubiquitous and mobile devices, Human Activity Recognition (HAR) using wearables has become a central pillar in ubiquitous and mobile computing. HAR systems commonly adopt machine learning approaches, which use supervised training on labeled datasets. Recent success in HAR has come along with the advances in supervised training techniques, namely deep learning models, which also have made dramatic breakthroughs in various domains, such as computer vision, natural language processing, and speech recognition. Across domains, the keys to derive robust recognition models, which strongly generalize across application boundaries, were highly complex analysis models and large-scale labeled datasets to serve the data-hungry nature of deep learning models. Although the field of HAR has seen first, substantial success from using deep learning models, the complexity of HAR models is still constrained, mainly due to the typically only small-scale datasets. Conventionally, sensor datasets are collected in user studies in a laboratory environment. The process is very labor-intensive, recruiting participants is expensive, and annotations are time-consuming. As a consequence, the sensor data collection often results in only a limited size of a labeled dataset, where a model derived from such a small-scale dataset is not likely to generalize well. My research develops a framework, namely IMUTube, that can potentially alleviate the limitations of large-scale labeled data collection in sensor-based HAR, which is the most pressing issue to limit the model performance in HAR systems. I aim to harvest existing video data from large-scale repositories, such as YouTube. IMUTube is a system that bridges the modality gap between videos and wearable sensors by tracking human motions captured in videos. Once the motion information is extracted from the videos, the information is transformed to virtual Inertial Measurement Unit (IMU) sensor signals for various on-body locations. The collection of virtual IMU data from a large amount of videos is then used for deriving HAR systems that can be used in real-world settings. The overarching idea is appealing due to the sheer size of readily accessible video repositories and the availability of weak labels in the form of video titles and descriptions. The IMUTube framework automatically extracts motion information from arbitrary human activity videos and is thereby not limited to specific scenes or viewpoints by integrating techniques from the fields of computer vision, computer graphics, and signal processing. Tracking 3D motion information from unrestricted online video poses multiple challenges, such as fast camera motion, noise, lighting changes, occlusions, and so on. IMUTube automatically identifies artifacts in the video that challenges robust motion tracking to generate high-quality virtual IMU data only from those video segments that exhibit the least noise. Using IMUTube, I show that complex models, which could not have been derived using the typical, small-scale datasets of real IMU sensor readings, have become trainable with the weakly-labeled virtual IMU dataset collected from many videos. The availability of more complex HAR models represents the first step towards research opportunities to design sophisticated, deep learning models that shall capture sensor data more effectively than the state-of-the-art. Overall, my work opens up research opportunities for the human activity recognition community to generate large-scale labeled datasets in an automated, cost-effective manner. Having access to larger-scale datasets opens up possibilities for deriving more robust and more complex activity recognition models that can be employed in entirely new application scenarios.
    URI
    http://hdl.handle.net/1853/66388
    Collections
    • College of Computing Theses and Dissertations [1191]
    • Georgia Tech Theses and Dissertations [23877]
    • School of Interactive Computing Theses and Dissertations [144]

    Browse

    All of SMARTechCommunities & CollectionsDatesAuthorsTitlesSubjectsTypesThis CollectionDatesAuthorsTitlesSubjectsTypes

    My SMARTech

    Login

    Statistics

    View Usage StatisticsView Google Analytics Statistics
    facebook instagram twitter youtube
    • My Account
    • Contact us
    • Directory
    • Campus Map
    • Support/Give
    • Library Accessibility
      • About SMARTech
      • SMARTech Terms of Use
    Georgia Tech Library266 4th Street NW, Atlanta, GA 30332
    404.894.4500
    • Emergency Information
    • Legal and Privacy Information
    • Human Trafficking Notice
    • Accessibility
    • Accountability
    • Accreditation
    • Employment
    © 2020 Georgia Institute of Technology