Show simple item record

dc.contributor.authorHartmann, Glenn
dc.contributor.authorGrundmann, Matthias
dc.contributor.authorHoffman, Judy
dc.contributor.authorTsai, David
dc.contributor.authorKwatra, Vivek
dc.contributor.authorMadani, Omid
dc.contributor.authorVijayanarasimhan, Sudheendra
dc.contributor.authorEssa, Irfan A.
dc.contributor.authorRehg, James M.
dc.contributor.authorSukthankar, Rahul
dc.date.accessioned2013-08-28T16:10:23Z
dc.date.available2013-08-28T16:10:23Z
dc.date.issued2012-10
dc.identifier.citationHartmann, G.; Grundmann, M.; Hoffman, J.; Tsai, D.; Kwatra, V.; Madani, O.; Vijayanarasimhan, S.; Essa, I.A.; Rehg, J.M.; & Sukthankar, R. (2012). “Weakly Supervised Learning of Object Segmentations from Web-Scale Video”. Computer Vision – ECCV 2012. Workshops and Demonstrations 7-13 October 2012. Proceedings, Part I. In Lecture Notes in Computer Science, 2012, Vol. 7583, pp. 198-208.en_US
dc.identifier.isbn978-3-642-33862-5 (Print)
dc.identifier.isbn978-3-642-33863-2 (Online)
dc.identifier.issn0302-9743
dc.identifier.urihttp://hdl.handle.net/1853/48736
dc.description©2012 Springer-Verlag Berlin Heidelberg. The original publication is available at www.springerlink.comen_US
dc.descriptionDOI: 10.1007/978-3-642-33863-2_20
dc.description.abstractWe propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Specifically, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as "dog", without employing any pre-trained object detectors. We formulate this problem as learning weakly supervised classifiers for a set of independent spatio-temporal segments. The object seeds obtained using segment-level classifiers are further refined using graphcuts to generate high-precision object masks. Our results, obtained by training on a dataset of 20,000 YouTube videos weakly tagged into 15 classes, demonstrate automatic extraction of pixel-level object masks. Evaluated against a ground-truthed subset of 50,000 frames with pixel-level annotations, we confirm that our proposed methods can learn good object masks just by watching YouTube.en_US
dc.language.isoen_USen_US
dc.publisherGeorgia Institute of Technologyen_US
dc.subjectObject masksen_US
dc.subjectSpatiotemporal segmentationen_US
dc.subjectVideo segmentationen_US
dc.subjectVideo stabilizationen_US
dc.titleWeakly Supervised Learning of Object Segmentations from Web-Scale Videoen_US
dc.typeBook Chapteren_US
dc.typeProceedings
dc.contributor.corporatenameGeorgia Institute of Technology. College of Computingen_US
dc.contributor.corporatenameGeorgia Institute of Technology. School of Interactive Computingen_US
dc.contributor.corporatenameGeorgia Institute of Technology. Center for Robotics and Intelligent Machinesen_US
dc.contributor.corporatenameUniversity of California, Berkeleyen_US
dc.contributor.corporatenameGoogle Researchen_US
dc.publisher.originalSpringer-Verlag Berlin / Heidelberg
dc.identifier.doi10.1007/978-3-642-33863-2_20
dc.embargo.termsnullen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record