I am interested in computer vision and related areas such as computer graphics, robotics, and machine learning. To be more specific, my doctoral research focuses on facial gestures analysis - tracking, modeling, quantifying and analyzing facial deformations for gesture understanding. It can be further divided into nonrigid facial deformation modeling and tracking, 3D head tracking, and automatic facial expression recognition. Also, click here to see my publications.

Tracking and Modeling 3D Facial Expressions by Manifold Learning

We propose a person-dependent, manifold-based approach for modeling and tracking rigid and nonrigid 3D facial deformations from monocular video sequences. The rigid and nonrigid motions are analyzed simultaneously in 3D, by automatically fitting and tracking a set of landmarks. We do not represent all nonrigid facial deformations as a simple complex manifold, but instead decompose them on a basis of eight 1D manifolds. Each 1D manifold is learned offline from sequences of labeled basic/primitive expressions, such as smile, surprise, etc. Any expression is then a linear combination of values along these 8 axes, with coefficient representing the level of activation. We experimentally verify that expressions can indeed be represented this way, and that individual manifolds are indeed 1D.

The manifold dimensionality estimation, manifold learning, and manifold traversal operation are all implemented in the N-D Tensor Voting framework. N-D Tensor Voting estimates the local tangent and normal space of each sample nonparametrically. The figures below show an example of Tensor Voting in 2D:

Red points are drawn from a curve in a 2D plane. These points are also perturbed by noise. The left figure shows estimated tangent directions and the right one shows the saliency value.

This framework enables us directly navigate on the manifold along estimated tangent hyperplane. It serves as the building block of our inference methods, and provides excellent robustness to noise and outliers.

Based on the manifold-based deformation model, we propose an iterative algorithm for face tracking. The proposed algorithm can track a rich representation of the face, including the 3D head pose, 3D nonrigid face shape, labeled annotation of the expression with probability, and the activation level. Some tracking results (tracked face, probability and activation level):

 

3D Head Tracking

We develop a hybrid 3D head tracker based on the integration of intensity- and feature-based constraints. To compare its performance with other state-of-the-art trackers, we conduct a series of evluation using synthetic and real videos. The hybrid tracker is shown to be more robust than many existing trackers, especially in varying illumination and expression chang cases. It has been applied in a real-world HCI application with near-infrared (IR) camera. This is an interactive, virtual training environment, and the head pose is used to identify the subject's attention. The IR lights are used to illuminate the dimly lit environment for immersion. In our experiment, several existing head trackers perform poorly in this setting, due to the high noise level, low contrast, poor image quality conditions, whereas our hybrid tracker can successfully track the head pose in real-time.

Theater environment for head tracking application (image courtesy of USC's ICT):

Some tracking results:

 

Automatic Facial Expression Recognition

Once, face is tracked in 3D, it can be registered and warped back to the reference frame. This cancels the coupled global head motion, and the residual motion on the face is the local nonrigid component. We divide the face into 9 regions, each characterize a part of the face. The observed motion inside each region is more homogeneous, and we model it by the affine motion model. However, an expression is a holistic behavior of the face, and only considering local description is not sufficient, since there exist interdependencies between face regions. Hence, we construct a latent variable graphical model to formulate such inter-region dynamics. The region-based face model characterizes the human face and it has a natural connection to the graphical model. To infer the gestures, we learn a Bayesian classifier empirically based on this graph and use the maximum a posterior (MAP) estimator for classification.

An implementation of this work is currently in a national traveling exhibition at California Science Center. See http://www.fearexhibit.org/ for the introduction of this exhibition.

The setting of the automatic facial expression recognition system in the exhibition (image courtesy of California Science Center):

Some screen shots of the implemented system:

The software module is also used for human-robot interaction in Visual Sensing for Natural Human-Robot Interaction

 

Publications

Wei-Kai Liao and Gerard Medioni. 3D Face Tracking and Expression Inference from a 2D Sequence Using Manifold Learning. In IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2008. (pdf and poster)

Wei-Kai Liao, Douglas Fidaleo, and Gerard Medioni. Integrating Multiple Visual Cues for Robust Real-Time 3D Face Tracking. In IEEE workshop on Analysis and Modeling of Faces and Gestures, 2007. (pdf)

Wei-Kai Liao and Isaac Cohen. Belief Propagation Driven Method for Facial Gestures Recognition in Presence of Occlusions. In IEEE workshop on Vision for Human Computer Interaction, 2006. (pdf)

Wei-Kai Liao and Isaac Cohen. Classifying Facial Gestures in Presence of Head Motion. In IEEE workshop on Vision for Human Computer Interaction, 2005. (pdf)