Bayesian 3D multiple people tracking using multiple indoor cameras and microphones
MetadataShow full item record
This thesis represents Bayesian joint audio-visual tracking for the 3D locations of multiple people and a current speaker in a real conference environment. To achieve this objective, it focuses on several different research interests, such as acoustic-feature detection, visual-feature detection, a non-linear Bayesian framework, data association, and sensor fusion. As acoustic-feature detection, time-delay-of-arrival~(TDOA) estimation is used for multiple source detection. Localization performance using TDOAs is also analyzed according to different configurations of microphones. As a visual-feature detection, Viola-Jones face detection is used to initialize the locations of unknown multiple objects. Then, a corner feature, based on the results from the Viola-Jones face detection, is used for motion detection for robust objects. Simple point-to-line correspondences between multiple cameras using fundamental matrices are used to determine which features are more robust. As a method for data association and sensor fusion, Monte-Carlo JPDAF and a data association with IPPF~(DA-IPPF) are implemented in the framework of particle filtering. Three different tracking scenarios of acoustic source tracking, visual source tracking, and joint acoustic-visual source tracking are represented using the proposed algorithms. Finally the real-time implementation of this joint acoustic-visual tracking system using a PC, four cameras, and six microphones is addressed with two parts of system implementation and real-time processing.