Moving Object Detection

Overview: Object Detection and Tracking

Background modeling approaches allow derivation of a set of regions in the frames representing the moving objects in the scene. The inference of objects trajectories requires the matching of the detected regions and an efficient representation allowing to perform multi-object tracking. We use an attributed graph structure where nodes represent the detected moving regions and edges represent the relationship between two moving regions detected in two separate frames. Each newly processed frame generates a set of regions corresponding to the detected moving objects. We search for possible similarities between the newly and previously detected objects to define connections between nodes. Establishing such connections can be done through different approaches such as template matching or correlation. The proposed tracker allows to deal with partial occlusions, stop and go motion in very challenging situations.

As new frames are acquired and processed, we incrementally construct the graph representation of moving objects. Deriving the trajectories of the objects from the graph and from the newly detected regions amounts to extract a path along each graph's connected component. More advanced and details are found under the Motion Detection and Motion Grouping pages or the 2008 PAMI paper Inferring Segmented Dense Motion Layers Using 5D Tensor Voting for more details.

Detection of moving blobs from moving camera

Segmentation and tracking of multiple humans in crowded situations is made difficult by interobject occlusion. We propose a model-based approach to interpret the image observations by multiple partially occluded human hypotheses in a Bayesian framework. More details are found under the Moving Blob from Moving Camera page and the 2008 PAMI paper: Segmentation and Tracking of Multiple Humans in Crowded Environments.

Single Frame Detection

Humans are represented by parts: head-shoulder, torso, legs and full-body. Each part detector is based on combination of learned, local features, trained on multiple (4) views. The posture assumed to be upright (sitting, standing or walking). Learning uses an Adaboost technique from edgelet features. Use occlusion analysis and a joint likelihood function. More details are found under the Single Frame Detection page and in the 2005 ICCV paper: Detection of Multiple, Partially Occluded Humans in a Single Image by Bayesian Combination of Edgelet Part Detectors.

Human Detection by Searching in 3d Space

We developed a 3D search method to account for perspective distortions and tilted orientations of the humans in the image. More details are found in the Human Detection in 3D pages and the 2008 ICPR paper: Human detection by searching in 3d space using camera and scene knowledge.