Given two sparse sets of identical point tokens, we encode the image position and potential velocity for each token into a 4-D tensor. The moving regions are conceptually represented as smooth 2-D layers in the 4-D space of the image coordinates and pixel velocities. Token affinities are expressed by their preference for being incorporated into smooth 2-D layers, as statistically salient features. Communication between sites is performed by convolution-like tensor voting, which enforces the motion smoothness while preserving motion discontinuities, thus selecting the appropriate velocity for each input point, as the most salient token. By performing an additional dense voting step we simultaneously infer velocities and layer orientations at every pixel location. We then extract motion boundaries and regions based on both pixel velocities and estimated local layer orientations.
Using a 4-D space for this Tensor Voting approach is essential, since it allows for a spatial separation of the points according to both their velocities and image coordinates. Consequently, the proposed framework allows tokens from the same velocity layer to strongly support each other, while ignoring the influence from other layers. Since the Tensor Voting computation complexity depends only on the input size, our methodology can be efficiently implemented despite the higher dimensionality space. Unlike other methods that optimize certain objective function, our approach does not involve initialization or search in a parametric space, and therefore does not suffer from local optima or poor convergence problems. We demonstrate the contribution of this work by analyzing several cases - opaque and transparent motion, rigid and non-rigid motion, curves and surfaces in motion, and show that our experimental results are consistent with human perception.
As a continuation of this research effort, we will extend our framework to incorporate the use of monocular information for handling real image sequences. Other research directions include studying the effects of velocity propagation into overlapping layers, and the issue of multiple scales. Finally, we plan to investigate the use of information from multiple frames, for both occlusion handling and a more robust methodology.