VISUAL DENSE THREE-DIMENSIONAL MOTION ESTIMATION IN THE WILD
Abstract
One of the most fundamental abilities of the human perception system is to seamlessly
sense the changing 3D worlds from our ego-centric visual observations. Driven by the
modern applications of robotics, autonomous driving, and mixed reality, machine perception
requires a precise dense representation of 3D motion with low latency. In this thesis,
we focus on the task of estimating absolute 3D motions in the world coordinate in unconstrained
environments observed from ego-centric visual information only. The goal is to
achieve a fast algorithm that can produce an accurate representation of the densely rich 3D
motions.
To achieve this goal, I propose to investigate the problem from four perspectives with
the following contributions.
1) Present a fast and accurate continuous optimization approach that solves the scene
motions as fixed-a-priori planar segments.
2) Present a learning-based approach that recovers the dense scene flow from egocentric
motion and optical flow, decomposed by a novel data-driven rigidity prediction.
3) Present a modern synthesis of the classic inverse compositional method for 3D rigid
motion estimation using dense image alignment.
4) Present a two-view monocular scene flow approach that recovers depth, camera motion,
and 3D scene motions of rigid moving scenes.