Welcome - VACE

Video Analysis and Content Extraction

This project builds on our earlier Video Analysis and Content Extraction work (the MOVER project). The new project shares many goals and techniques. Our goal is to develop methods for automatic, robust and scalable recognition of objects and events in a scene using video. Events, objects and the relations between them define the key components of video content. Their automatic inference will allow analysts to achieve dramatic savings in both time and effort for browsing through large sets of data and also to automatically sound alarms for real-time surveillance.

Object appearence in video changes based on the viewpoint, illumination, surface texture and occlusions. Differences in an action include the person, style, trajectory, instruments and time duration. Thus, the mapping between data streams and desired entities is not one-one, therefore invariant intermediate representations are needed.

Using models of the scene, imaging, objects and activity reduces ambiguity in data analysis, and world knowledge helps distinguish between normal and abnormal events.

We focus on:

  • Detection and Tracking: mobile objects such as pedestrians and vehicles. Using static cameras, these can be detected by looking for changes in the pixel colors of the image. Multiple objects may merge into a single blob or a single object may split into multiple blobs. We have developed techniques that use shape models to detect objects from the multiple moving blobs.
  • Event Recognition: For event recognition, it is useful to distinguish between primitive and composite events. Events such as the abandonment of luggage are inferred by a chain of primitive events detected by observing human body trajectories. These primitive events, such as walking, running, and jumping, require inference of body joint trajectories. The direct inference of 3-D trajectories from videos is a difficult task. In order to more completely analyze video data we have recently developed methods using stored action templates.
  • Action Models: Events involving multiple agents acting simultaneously require more complex inference mechanisms. We have devised hierarchical models that help reason about such events efficiently and accurately.

Highlights and Important Ideas

  • Use of object, event, scene and camera models to enhance performance.
  • Use of multiple cues (shape and motion) and probabilistic graphical models for object detection, tracking and recognition.
  • Function based object recognition
  • Design and use of novel variations of hidden Markov Models for multi-agent event recognition in a structured Event Description Framework.
  • Reformulation of common video processing task as an unsupervised inference in these models.

Model Based Object and Video Event Recognition

For information on the earlier work in the MOVER project.