Deep Segments: Comparisons between Scenes and their Constituent Fragments using Deep Learning
MetadataShow full item record
We examine the problem of visual scene understanding and abstraction from first person video. This is an important problem and successful approaches would enable complex scene characterization tasks that go beyond classification, for example characterization of novel scenes in terms of previously encountered visual experiences. Our approach utilizes the final layer of a convolutional neural network as a high-level, scene specific, representation which is robust enough to noise to be used with wearable cameras. Researchers have demonstrated the use of convolutional neural networks for object recognition. Inspired by results from cognitive and neuroscience, we use output maps created by a convolutional neural network as a sparse, abstract representation of visual images. Our approach abstracts scenes into constituent segments that can be characterized by the spatial and temporal distribution of objects. We demonstrate the viability of the system on video taken from Google Glass. Experiments examining the ability of the system to determine scene similarity indicate ρ (384) = ±0:498 correlation to human evaluations and 90% accuracy on a category match problem. Finally, we demonstrate high-level scene prediction by showing that the system matches two scenes using only a few initial segments and predicts objects that will appear in subsequent segments.
Showing items related by title, author, creator and subject.
Azmat, Shoaib (Georgia Institute of Technology, 2014-06-19)This dissertation presents an efficient multilayer background modeling approach to distinguish among midground objects, the objects whose existence occurs over varying time scales between the extremes of short-term ephemeral ...
Dedhia, Vaibhav (Georgia Institute of Technology, 2018-04-30)Today, there are various different paradigms for vision based autonomous navigation: mediated perception approaches that parse an entire scene to make driving decision, a direct perception approach that estimates the ...
Schindler, Grant; Dellaert, Frank (Georgia Institute of TechnologyInstitute of Electrical and Electronics Engineers, 2010)Modern structure from motion techniques are capable of building city-scale 3D reconstructions from large image collections, but have mostly ignored the problem of large-scale structural changes over time. We present a ...