|
Video Analysis and Content Extraction
This project builds on our earlier Video Analysis and Content Extraction
work (the MOVER project).
The new project shares many goals and techniques.
Our goal is to develop methods for
automatic, robust and scalable recognition of
objects and events in a scene using video. Events,
objects and the relations between them define the
key components of video content. Their automatic
inference will allow analysts to achieve dramatic
savings in both time and effort for browsing
through large sets of data and also to automatically sound alarms
for real-time surveillance.
Object appearence in video changes based on the
viewpoint, illumination, surface texture and
occlusions. Differences in an action include the
person, style, trajectory, instruments and time
duration. Thus, the mapping between data streams
and desired entities is not one-one, therefore
invariant intermediate representations are needed.
Using models of the scene, imaging, objects and
activity reduces ambiguity in data analysis, and
world knowledge helps distinguish between normal and
abnormal events.
We focus on:
- Detection and Tracking:
mobile objects such as pedestrians and vehicles.
Using static cameras, these can be
detected by looking for changes in the pixel
colors of the image. Multiple objects may merge
into a single blob or a single object may split
into multiple blobs.
We have developed techniques that use shape models
to detect objects from the multiple moving
blobs.
- Event Recognition:
For event recognition, it is useful to distinguish
between primitive and composite events. Events
such as the abandonment of luggage are inferred by
a chain of primitive events detected by observing
human body trajectories.
These primitive events, such as walking, running,
and jumping, require inference of body joint
trajectories. The direct inference of 3-D
trajectories from videos is a difficult task. In
order to more completely analyze video data we
have recently developed methods using stored
action templates.
- Action Models:
Events involving multiple agents acting
simultaneously require more complex inference
mechanisms. We have devised hierarchical models
that help reason about such events efficiently and
accurately.
Highlights and Important Ideas
-
Use of object, event, scene and camera models to enhance performance.
-
Use of multiple cues (shape and motion) and probabilistic graphical
models for object detection, tracking and recognition.
-
Function based object recognition
-
Design and use of novel variations of hidden Markov Models for
multi-agent event recognition in a structured Event Description
Framework.
-
Reformulation of common video processing task as an
unsupervised inference in these models.
Model Based Object and Video Event Recognition
For information on the earlier work in the
MOVER project.
|