Visual monitoring of a scene has received substantial attention in our society in recent years, due to the availability of the low cost video cameras and the increased demand of security in our daily lives. The large amount of such video data has now become a concern and heightened the need of automatic methods of processing the video output.

At IRIS-USC,  our goal is to develop a formalism for automatic human activity recognition from video sequences to augment the efficiency of human monitors. Our focus is on understanding a variety of human motions which include simple movements of body parts, human gestures and large scale activities. 

  • We propose to model activities using a transparent, hierarchical activity representation where by a complex activity is defined from a combination of simpler activities conditioned on various temporal constraints. 
  • Video data to be processed is often noisy and unsegmented, making it a challenge to robustly infer an event.  Our approach is to recognize activities by verifying temporal constraints in a Bayesian framework. 
We consider an event (or activity) as being composed of action threads. Based on the nature of composite action threads, two types of events are defined:
  • Single-Thread Events  correspond to an event whose relevant actions occur along a linear time scale as in the case when only one actor is present.
  • Multi-Thread Events correspond to two or more single-thread events with some logical and time relations between them. Each composite thread in a multiple thread event may be performed by different actors.