Spatio-Temporal Analysis and Manipulation of Visual Information

Michal Irani


Abstract

Video provides a continuous visual window into the space-time world. It captures the evolution of dynamic scenes in space and time. This makes video much more than just a collection of images of a scene taken from different view points. In this talk I will show that by treating video as a space-time data volume, one can perform tasks that are very difficult (and often impossible) to perform when only ``slices'' of this information, such as image frames, are used. In particular, I will demonstrate the power of this approach by two example problems:

(i) I will describe a new approach to alignment and integration of information across multiple video sequences, which utilizes all available spatio-temporal information within video sequences. By combining the spatial and dynamic visual scene information within a single alignment framework, situations which are inherently ambiguous for traditional image-to-image alignment methods, are uniquely resolved by sequence-to-sequence alignment. Moreover, coherent dynamic information can sometimes be used for aligning video sequences even in extreme cases when there is no common spatial information across these sequences (e.g., when the fields of view of the video cameras do not overlap, or when the cameras are of different sensing modalities, such as with IR and visible camera).

(ii) I will show how extended spatio-temporal scene representations can be very efficiently used to view, browse, index into, edit and enhance the video data. In raw video data the spatio-temporal scene information is implicitly and redundantly distributed across many video frames. This makes access and manipulation of video data very difficult. However, by analyzing the redundancy of visual information within the space-time data volume, the distributed scene information can be integrated into coherent and compact scene-based visual representations. These lead to very efficient methods for access and manipulation of visual information in video data.


Maintained by Philippos Mordohai