Motion regions (or layers) do not always correspond to objects. Single objects may split into multiple blobs or multiple objects may merge into a single blob. Shape (and motion) models can help detect and track individual objects in motion blobs.
Motion blob detection is not always reliable, especially with camera motion or illumination changes. Shape models can be used for direct detection
We use shape based methods to enable extraction with widely varying viewpoints and occlusions. The part based representation is robust to articulation and view changes and is necessary for recognition under partial occlusion. The Hierarchical Tracking link introduces results for model based recognition, tracking, and widely varying viewpoints. See Hierarchical Tracking, Viewpoint Variations and the 2008 CVPR paper: Segmentation of multiple, partially occluded objects by grouping, merging, assigning part detection responses.
When humans are interacting, a joint consideration of them is necessary. This requires the definition of a joint state of multiple human hypotheses and a joint image likelihood of these humans. The detection and tracking is done by searching the hypothesis space for the best interpretation of the image. See Finding Objects in Spatio-Temporal Context page or the 2008 ECCV paper: Human detection by searching in 3d space using camera and scene knowledge Key Object Driven Multi-category Object Recognition, Localization and Tracking Using Spatio-temporal Context for more details.
A variety of general results are shown in the Surveillance Tracking Results page. The tracking system alone is not the final result, it is important in its use for Event Recognition. More examples are results are given in the Event Recognition Results.