# Model-based Segmentation and Tracking of Multiple Humans in Complex Situations

### Abstract

Automatic detecting and tracking people from a stationary video camera is
important for many applications. The problems are made difficult mainly due
to the temporary/persistent occlusion of multiple people and noise from
various sources (e.g., shadow). We propose to tackle the challenges using
applicable and general constraints in the form of models. In particular, we
make use of a background appearance model to direct the attention to the
image regions different to the background. Different from most of the
previous work, we use explicit human shape model as an entity for analysis
in segmentation and tracking, which counters the ambiguities of low-level
processing. The camera model and known site geometry (e.g., a ground plane)
provide geometric constraints. We use the strong regularity of human
locomotion to assist the estimation of articulated body postures.
First we describe a system for model-based segmentation and tracking in
which simple shape model is used. Segmentation is done by using direct image
features. Multi-human tracking is factored into matching them one by one
according to their depth order inferred from the geometry. This results in a
real-time system effective for temporary severe occlusion and persistent
occlusion of small groups of people.

The simple approach may not be effective when the number of people and the
amount of occlusion increase. We formulate the model-based segmentation and
tracking problem under the Bayesian framework. The optimal solutions are
defined explicitly as the Bayesian posterior probability in a joint-object
space. The solution in this complex high-dimensional space is computed by a
Markov chain Monte Carlo (MCMC)-based method. The computational approach
also takes advantages of domain knowledge as importance proposal
probabilities to direct the Markov chain intelligently to obtain
significantly faster convergence. The new formulation is more general and
also applies to the case of a larger group of people move together.

We also propose a "tracking as recognition" approach where the estimation of
body postures is accomplished by recognizing the motion in a locomotion
model. It results in robust performance in very challenging data.

Maintained by
Philippos Mordohai