Abstract
We address the problem of learning view-invariant 3D
models of human motion from motion capture data, in order to recognize human actions from a monocular video
sequence with arbitrary viewpoint. We propose a Spatio-Temporal Manifold (STM) model to analyze nonlinear multivariate time series with latent spatial structure and
apply it to recognize actions in the joint-trajectories space.
Based on STM, a novel alignment algorithm Dynamic Manifold Warping (DMW) and a robust motion similarity metric
are proposed for human action sequences, both in 2D and 3D.
DMW extends previousworks on spatio-temporal alignment by incorporating manifold learning. We evaluate and
compare the approach to state-of-the-art methods on motion capture data and realistic videos. Experimental results
demonstrate the effectiveness of our approach, which yields
visually appealing alignment results, produces higher ation recognition accuracy, and can recognize actions from arbitrary views with partial occlusion.
Publications
Dian Gong and Gerard Medioni, "Dynamic Manifold Warping for View Invariant Action Recognition", Proc. of the IEEE 13th International Conference on Computer Vision (ICCV 2011), Barcelona, Spain, November 2011. [Video] [Project Webpage]
Selected References
Feng Zhou and Fernando De la Torre, "Canonical Time Warping for Alignment of Human Behavior", Advances in Neural Information Processing Systems (NIPS 2009), Vancouver, B.C., Canada, December 2009.
Eugene Hsu, Kari Pulli and Jovan Popovic, "Style Translation for Human Motion", Computer Graphics and Interactive Techniques Conference (SIGRAPH 2005), Los Angeles, USA, August 2005.
Jennifer Listgarten, Radford M. Neal, Sam T. Roweis and Andrew Emili, "Multiple Alignment of Continuous Time Series", Advances in Neural Information Processing Systems (NIPS 2005), Vancouver, B.C., Canada, December 2004.
Imran N. Junejo, Emilie Dexter, Ivan Laptev and Patrick Perez, "View-Independent Action Recognition from Temporal Self-Similarities", IEEE Transactions On Pattern Analysis And Machine Intelligence (PAMI), 33(1), pp. 172-185, 2010.
Weikai Liao and Gerard Medioni, "3D Face Tracking and Expression Inference from a 2D Sequence Using Manifold Learning", IEEE conference on Computer Vision and Pattern Recognition (CVPR 2008), Anchorage, Alaska, USA, June 2008.
Philippos Mordohai and Gerard Medioni, "Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting", Journal of Machine Learning Research (JMLR), 11, pp. 1-40, January 2010.
Fengjun Lv and Ramakant Nevatia, "Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching", IEEE conference on Computer Vision and Pattern Recognition (CVPR 2007), Minneapolis, Minnesota, USA, June 2007.
Acknowledgements
This work is supported in part by NIH Grant EY016093. We would like to thank Yan Liu, Vivek Kumar Singh, Yann Dumortier and Sikai Zhu for their helpful discussions. We also like to thank Fernando De la Torre, Feng Zhou, David Ross, Michale J. Black and Leonid Sigal for sharing the codes and datasets.

