Encoding 3D contextual information for dynamic scene understanding
Abstract
This thesis aims to demonstrate how using 3D cues improves semantic labeling and object classification. Specifically, we will consider depth, surface normals, object classification, and pixel-wise semantic labeling in this work. The works outlined in this document aim to validate the following thesis statement: Shape, used as an additional context, improves segmentation, unsupervised clustering, object classification and semantic labeling with little computational overhead. The thesis will show that: Combining shape and object labels improves results while (1) requiring few extra parameters, (2) provides better results using surface normals than depth, and (3) combining shape with labels improves accuracy for each task.} We describe various methods to combine shape and object classification and then discuss our extensions of the work which focus on surface normal prediction, depth prediction, and semantic labeling specifically.