Deep Segments: Comparisons between Scenes and their Constituent Fragments using Deep Learning
MetadataShow full item record
We examine the problem of visual scene understanding and abstraction from first person video. This is an important problem and successful approaches would enable complex scene characterization tasks that go beyond classification, for example characterization of novel scenes in terms of previously encountered visual experiences. Our approach utilizes the final layer of a convolutional neural network as a high-level, scene specific, representation which is robust enough to noise to be used with wearable cameras. Researchers have demonstrated the use of convolutional neural networks for object recognition. Inspired by results from cognitive and neuroscience, we use output maps created by a convolutional neural network as a sparse, abstract representation of visual images. Our approach abstracts scenes into constituent segments that can be characterized by the spatial and temporal distribution of objects. We demonstrate the viability of the system on video taken from Google Glass. Experiments examining the ability of the system to determine scene similarity indicate ρ (384) = ±0:498 correlation to human evaluations and 90% accuracy on a category match problem. Finally, we demonstrate high-level scene prediction by showing that the system matches two scenes using only a few initial segments and predicts objects that will appear in subsequent segments.
Showing items related by title, author, creator and subject.
Patterson, Michael J. (Georgia Institute of Technology, 1985-08)
Wu, Jianxin; Rehg, James M. (Georgia Institute of Technology, 2009-07-23)CENTRIST (CENsus TRansform hISTogram), a new visual descriptor for recognizing topological places or scene categories, is introduced in this paper. We show that place and scene recognition, especially for indoor environments, ...
Kim, Kihwan; Summet, Jay; Starner, Thad; Ashbrook, Daniel; Kapade, Mrunal; Essa, Irfan A. (Georgia Institute of Technology, 2008)Using off-the-shelf Global Positioning System (GPS) units, we reconstruct buildings in 3D by exploiting the reduction in signal to noise ratio (SNR) that occurs when the buildings obstruct the line-of-sight between the ...