Inference of 2D Layers from Uncalibrated Images

Elaine Kang


In video processing, it is desirable to have a structured representation that reduces data redundancy and facilitates operations for video analysis and manipulation. A conventional frame-based representation treats a video as a set of independent frames containing redundancies, and provides VCR-like operations. As opposed to the frame-based representation, a layer-based representation provides structures called layers that characterize spatial and temporal correlation between frames. This reduces redundancies and allows layer-based operations, which are consistent across frames and therefore more efficient. In this representation, a layer is a 2D image region with time-varying information generated by camera motion and object motion. Each layer is associated with a corresponding motion descriptor and a compact form of image regions from the original images. Layers can efficiently represent non-rigid objects, transparent objectís motion and shallow 3D surfaces. Also, each layer facilitates encoding additional information such as texture and blurring masks. To obtain a layer-based video representation, grouping based on motion and spatial support is essential, and is challenging. In this dissertation, a robust layer inference method is presented.

There are two types of layers: 3D layers and 2D layers. A 3D-layer consists of a 3D plane equation, the texture of the plane, and a depth ordering per pixel. A 2D-layer consists of a 2D transformation equation and a 2D sub-image texture including pixels under the same motion. Extracting 3D layers requires the analysis of camera motion, which is non-trivial and unnecessary task for some applications such as video coding and analysis. This dissertation focuses on extracting 2D layers from uncalibrated images. Targeted motion models are affine and homography transformations.

My approach reliably extracts multiple 2D motion layers, affine or homography, from noisy initial matches. This approach is based on: (1) a parametric method to detect and extract 2D affine or homography layers; (2) the representation of the matching points in decoupled joint image spaces; (3) the characterization of the property associated with affine transformation in the defined spaces; (4) a process to extract multiple 2D motions simultaneously based on tensor-voting; (5) local affine to global homography estimation; (6) layer refinement based on a hybrid property: motion and color homogeneity. The robustness of my approach is demonstrated with many results in diverse applications.

Maintained by Philippos Mordohai