University of Southern California


Object Tracking and Event Understanding



  Video Analysis and Content Extraction  




Project Description

We propose to develop tools for automated analysis of video sequences. This will include methods for detection and tracking of moving objects resulting in a structured description of the video. This representation will then be used for extracting contents that will allow understanding of the events occurring in the scene. Such automated tools will be essential for analysts in the intelligence community to cope with and make effective use of the vast quantities of video data that are becoming increasingly available.

A key issue in content extraction from videos is the use and availability of the spatial and mission context. Such context is more likely to be available for observations at a specific site (such as monitoring for security applications) but will be more difficult to obtain for videos where little is known about the scene of activity or only generic context (such as the activity takes place in some hotel lobby) is available. Our approach has several innovative and unique characteristics:

  • Detecting and tracking of moving objects in video streams through the characterization of each pixel’s path in the 2D + t space. Our approach based on the characterization of beams of paths, provides a robust approach to layered segmentation of moving objects of various scales and their tracking by studying the properties of the beams of trajectories. This approach provides a generalization of the current methods to deformable or articulated objects such as humans in motion.
  • Structured representation of the videos in order to capture spatial and temporal constructs characterized by the detection and tracking. A structured video eliminates the disadvantages of the frame-based representation by providing a description based on moving objects. The atomic elements of the representation are the moving objects providing therefore an adequate information to support both query processing and re-usability of information.
  • Understanding of events characterized by the interactions among the humans and the objects in the environment. We propose a generic, hierarchical representation for understanding events in a scene. Both “single” and “multiple” threaded events are considered. The technique bridges the gap between image oriented structured representations and higher level semantic inferences. Our method will handle uncertainties in the computations rigorously by using Bayesian networks and stochastic finite automata. Our representation will also allow for easy entry of new activity descriptions to be handled by an Event Representation Language.

We expect that our detection and tracking methods will work on a wide variety of videos of different content. Our event understanding methods will be initially limited to situations where spatial and task context are readily available. Extensions to cases where the context is available in very generic form or not available at all will be addressed in future phases of this research.


Research Topics


  • S. Hongeng and R. Nevatia. Multi-agent event recognition. In IEEE Proceedings of the International Conference on Computer Vision, 2001.
  • T. Zhao, R. Nevatia and F. Lv. Segmentation and Tracking of Multiple Humans in Complex Situations. In the proceedings of the conference on Computer Vision and Pattern Recognition, December 2001, Kawai.
  • Fengjun Lv, Tao Zhao and Ram Nevatia. Self-Calibration of a camera from video of a walking human, International Conference on Pattern Recognition 2002.
  • Tao Zhao and Ram Nevatia. 3D Tracking of Human Locomotion: A Tracking as Recognition Approach, International Conference on Pattern Recognition 2002.
  • Elaine Kang, Isaac Cohen and Gerard Medioni. Robust Affine Motion Estimation in Joint Image Space using Tensor Voting, International Conference on Pattern Recognition 2002.
  • Jinman Kang, Isaac Cohen and Gerard Medioni. Continuous multi-view tracking using tensor voting. In IEEE Workshop on Motion and Video Computing, Orlando Florida, December 2002.
  • Tao Zhao and Ram Nevatia. Stochastic Human Segmentation from a static camera. In IEEE Workshop on Motion and Video Computing, Orlando Florida, December 2002.

Data and Formats