Capture, analysis and synthesis of photorealistic crowds
MetadataShow full item record
This thesis explores techniques for synthesizing crowds from imagery. Synthetic photorealistic crowds are desirable for cinematic gaming, special effects and architectural visualization. While motion captured-based techniques for the animation and control of crowds have been well-studied in computer graphics, the resulting control rig sequences require a laborious model-based graphics pipeline to render photorealistic videos of crowds. Over the past ten years, data-driven techniques for rendering imagery of complex phenomena have become a popular alternative to model-based graphics. This popularity is due in large part to difficulties in constructing the sufficiently-detailed models that are required to achieve photorealism. A dynamic crowd of humans is an extremely challenging example of such phenomena. Example-based synthesis methods such as video textures are an appealing alternative, but current techniques are unable to handle new challenges posed by crowds. This thesis describes how to synthesize video-based crowds by explicitly segmenting pedestrians from input videos of natural crowds and optimally placing them into an output video while satisfying environmental constraints imposed by the scene. There are three key challenges. First, the crowd layout of segmented videos must satisfy constraints imposed by environmental and crowd obstacles. This thesis addresses four types of environmental constraints: (a) ground planes in the scene which are valid for crowd traversal, such as sidewalks, (b) spatial regions of these planes where crowds may enter and exit the scene, (c) static obstacles, such as mailboxes and walls of a building, and (d) dynamic obstacles such as individuals and groups of individuals. Second, pedestrians and groups of pedestrians should be segmented from the input video with no artifacts and minimal interaction time. This is challenging in real world scenes due to significant appearance changes while traveling through the scene. Third, segmented pedestrian videos may not have enough frames or the right shape to compose a path from an artist-defined entrance to exit. Plausible temporal transitions between segmented pedestrians are therefore needed but they are difficult to identify and synthesize due to complex self occlusions. We present a novel algorithm for composing video billboards, represented by crowd tubes, to form a crowd while avoiding collisions between static and dynamic obstacles. Crowd tubes are represented in the scene using a temporal sequence of circles planted in the calibrated ground plane. The approach consists of representing crowd tube samples and constraint violations with a conflict graph. The maximal independent set yields a dense crowd composition. We present a prototype system for the capture, analysis, synthesis and control of video-based crowds. Several results demonstrate the system's ability to generate videos of crowds which exhibit a variety of natural behaviors.