Model-based Segmentation and Tracking of Multiple Humans in Complex Situations

Tao Zhao


Automatic detecting and tracking people from a stationary video camera is important for many applications. The problems are made difficult mainly due to the temporary/persistent occlusion of multiple people and noise from various sources (e.g., shadow). We propose to tackle the challenges using applicable and general constraints in the form of models. In particular, we make use of a background appearance model to direct the attention to the image regions different to the background. Different from most of the previous work, we use explicit human shape model as an entity for analysis in segmentation and tracking, which counters the ambiguities of low-level processing. The camera model and known site geometry (e.g., a ground plane) provide geometric constraints. We use the strong regularity of human locomotion to assist the estimation of articulated body postures.

First we describe a system for model-based segmentation and tracking in which simple shape model is used. Segmentation is done by using direct image features. Multi-human tracking is factored into matching them one by one according to their depth order inferred from the geometry. This results in a real-time system effective for temporary severe occlusion and persistent occlusion of small groups of people.

The simple approach may not be effective when the number of people and the amount of occlusion increase. We formulate the model-based segmentation and tracking problem under the Bayesian framework. The optimal solutions are defined explicitly as the Bayesian posterior probability in a joint-object space. The solution in this complex high-dimensional space is computed by a Markov chain Monte Carlo (MCMC)-based method. The computational approach also takes advantages of domain knowledge as importance proposal probabilities to direct the Markov chain intelligently to obtain significantly faster convergence. The new formulation is more general and also applies to the case of a larger group of people move together.

We also propose a "tracking as recognition" approach where the estimation of body postures is accomplished by recognizing the motion in a locomotion model. It results in robust performance in very challenging data.

Maintained by Philippos Mordohai