Show simple item record

dc.contributor.authorKundu, Abhijit
dc.contributor.authorLi, Yin
dc.contributor.authorDellaert, Frank
dc.contributor.authorLi, Fuxin
dc.contributor.authorRehg, James M.
dc.date.accessioned2015-07-10T17:14:32Z
dc.date.available2015-07-10T17:14:32Z
dc.date.issued2014-09
dc.identifier.citationKundu, A., Li, Y., Dellaert, F., Li, F., & Rehg, J. M. (2014). "Joint Semantic Segmentation and 3D Reconstruction from Monocular Video". Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (Eds.). Computer Vision – ECCV 2014. 13th European Conference, Zurich Switzerland, 6-12 September 2014. Proceedings, Part VI. In Lecture Notes in Computer Science, 2014, Vol. 8694, pp. 703-718.en_US
dc.identifier.isbn978-3-319-10598-7
dc.identifier.issn0302-9743
dc.identifier.urihttp://hdl.handle.net/1853/53675
dc.description© Springer International Publishing Switzerland 2014. The original publication is available at www.springerlink.comen_US
dc.descriptionDOI: 10.1007/978-3-319-10599-4_45
dc.description.abstractWe present an approach for joint inference of 3D scene structure and semantic labeling for monocular video. Starting with monocular image stream, our framework produces a 3D volumetric semantic + occupancy map, which is much more useful than a series of 2D semantic label images or a sparse point cloud produced by traditional semantic segmentation and Structure from Motion(SfM) pipelines respectively. We derive a Conditional Random Field (CRF) model defined in the 3D space, that jointly infers the semantic category and occupancy for each voxel. Such a joint inference in the 3D CRF paves the way for more informed priors and constraints, which is otherwise not possible if solved separately in their traditional frameworks. We make use of class specific semantic cues that constrain the 3D structure in areas, where multiview constraints are weak. Our model comprises of higher order factors, which helps when the depth is unobservable. We also make use of class specific semantic cues to reduce either the degree of such higher order factors, or to approximately model them with unaries if possible. We demonstrate improved 3D structure and temporally consistent semantic segmentation for diffcult, large scale, forward moving monocular image sequence.en_US
dc.language.isoen_USen_US
dc.publisherGeorgia Institute of Technologyen_US
dc.subjectConditional random fielden_US
dc.subjectConstraintsen_US
dc.subjectMonocular videoen_US
dc.subjectSemantic cuesen_US
dc.subjectSemantic segmentationen_US
dc.subject3Den_US
dc.titleJoint Semantic Segmentation and 3D Reconstruction from Monocular Videoen_US
dc.typeBook Chapteren_US
dc.typeProceedings
dc.contributor.corporatenameGeorgia Institute of Technology. Institute for Robotics and Intelligent Machinesen_US
dc.contributor.corporatenameGeorgia Institute of Technology. College of Computingen_US
dc.contributor.corporatenameGeorgia Institute of Technology. School of Interactive Computingen_US
dc.publisher.originalSpringer International
dc.identifier.doi10.1007/978-3-319-10599-4_45
dc.embargo.termsnullen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record