This dissertation presents a novel formulation for the problem of visual motion analysis and interpretation, as an inference of motion layers from a noisy and possibly sparse point set in a 4-D space. The core of the method is based on a layered 4-D representation of data and a voting scheme for affinity propagation. Within the 4-D space of image positions and potential velocities, moving regions are conceptually represented as smooth surface layers, and are extracted through a voting process that enforces the motion smoothness constraint. By using an additional 2-D voting step that incorporates intensity information (edges) from the original images, accurate boundaries and regions are inferred. Subsequently, the 3-D scene structure and motion are estimated by enforcing the rigidity constraint for each moving object.
The proposed framework consistently handles both smooth moving regions and motion discontinuities, without using any a priori knowledge of the motion model. The method is also computationally robust, being non-iterative, and does not depend on critical thresholds, the only free parameter being the scale of analysis.
The contributions of this work are demonstrated by analyzing a wide variety of difficult cases - opaque and transparent motion, rigid and non-rigid motion, curves and surfaces in motion, from sparse and dense input configurations.