Back to Tao Zhao's Research Page

Learning A Highly Structured Motion Model for 3D Human Tracking

Tao Zhao, Tianshu Wang and Harry Shum
taozhao (at) iris (o) usc (o) edu
ACCV 2002 (Asian Conference on Computer Vision), Melbourne, Australia, Jan., 2002.


This paper presents our work on learning high level structure from human motion sequences, and its applications in human figure tracking. We use a structured representation (ˇ°primitivesˇ± and their transitions) of complex motion and propose a two-step unsupervised learning approach to recover the natural ˇ°primitivesˇ± from unsegmented 3D-motion captured sequences of complex human motion. The structure recovery is done under the MDL (minimum description length) paradigm. Then the learnt dynamic model of human motion is used in the CONDENSATION framework to successfully track human motion in a video sequence. Experimental results of ballet dancing sequences demonstrate that our approach works well. The learnt structure is also used to synthesize new video sequences.

Learning Highly Structured Motion Model

Many kinds of complex motions are made up of some "primitives". We are trying to recover the hidden structure from the continous motion captured sequences with unsupervised learning. We assume neither any homogeneity within the primitives nor any low velocity point between two adjacent primitive instances.

The above problem is analogical to this problem: you are given an article with all the white spaces and marks removed and you are asked to recover the vocabulary.

The structure discovery/recovery problem is solved by finding the minimum description length (MDL) of the article/motion sequences as the following:

We apply the method to the arm motion of ballet dancing and result in the following structure. Each ellipse is a "word" or motion primitive. The recovery corresponds to human knowledge of ballet (4 standard poses and the transitions among them) very well.

Final segmentation and labeling of a motion sequence:

Applying the Learnt Motion Model for Condensation Tracking

The skeletons from both the original viewpoint and an overhead viewpoint is shown. In 3D tracking, it is important to show the results from a different viewpoint so that the errors due to projection (e.g., reflective ambiguity, etc) can be seen.

Frontal sequence: AVI file is available here (386K).

45 degree side sequence: AVI file is available here (297K).

The results from the frontal sequence rendered from different viewpoints:

Synthesis result is here (410K).

Download PDF file of the paper (6-page conference version).