A Voting-Based Computational Framework for Visual Motion Analysis and Interpretation
Image motion is a rich source of information for the visual perception system, providing a multitude of cues to identify distinct objects in the scene and infer their 3-D structure and motion. Most approaches rely on parametric models which restrict the types of motion that can be analyzed, and involve iterative methods which depend heavily on initial conditions and are subject to instability. Further difficulties are encountered in image regions where motion is not smooth - typically around motion boundaries. This dissertation addresses the problem of visual motion analysis and interpretation, by formulating it as an inference of motion layers from a noisy and possibly sparse point set in a 4-D space. The core of the method is based on a layered 4-D representation of data and a voting scheme for affinity propagation. Within the 4-D space of image positions and velocities, moving regions are conceptually represented as smooth surface layers, and are extracted through a voting process that enforces the motion smoothness constraint. By using an additional 2-D voting step that incorporates intensity information (edges) from the original images, accurate boundaries and regions are inferred. The inherent problem caused by the ambiguity of 2-D to 3-D interpretation is usually handled by adding additional constraints, such as rigidity. However, providing a successful approach that enforces a global constraint has been problematic in the combined presence of noise, multiple independent motions, or non-rigid motion. By decoupling the processes of matching, outlier rejection, segmentation and interpretation, we extract accurate motion layers based on the smoothness of image motion, then locally enforce rigidity for each layer, in order to infer its 3-D structure and motion. The proposed framework consistently handles both smooth moving regions and motion discontinuities, without using any prior knowledge of the motion model. The method is also computationally robust, being non-iterative, and does not depend on critical thresholds, the only free parameter being the scale of analysis. The contributions of this work are demonstrated by analyzing a wide variety of difficult cases - opaque and transparent motion, rigid and non-rigid motion, curves and surfaces in motion, from sparse and dense input configurations.