Tracking and reconstruction of lip contour from a stereo pair of cameras
For my undergraduate thesis project at Tsinghua University, I proposed a method to track and reconstruct the lip contour of human faces from a pair of image sequences shot by stereo cameras. Both 2D and 3D shape of lip contour is important for face synthesis and visual speech recognition.
![]() |
![]() |
| MPEG-4 face model | Standardized control points on the lip contour |
Lip contour tracking. The lip contour is difficult to extract and track due to its elastic shape and non-rigid motion. In the first frame of each image sequence, a color segmentation method based on the linear discriminative (Fisher) classification is applied to the original image in the YUV color space to extract the bounding box of lips. Then a Active Appearance Model (AAM) fitting algorithm is applied to obtain the initial lip contour. In the subsequent frames, the lip contour is updated by the active contour method (Snake), starting from its position in the last frame. Therefore, a set of contour points are extracted and tracked in the image sequence by each camera.
![]() |
| The stereo pair of lip contours (left:upper camera; right:lower camera) |
Stereo reconstruction. The calibrated pair of stereo cameras is placed on a vertical line for better performance on stereo matching. The vertical epipolar lines intersect with the lip contour in both images. The corresponding points on both contours are matched along the epipolar lines and triangulated into 3D points. This stereo matching process repeats subsequently for all frames in both image sequences.
![]() |
| The lip contour intersected with epipolar lines. |
Finally the 3D lip contour for each time instant is reconstructed and rendered by OpenGL:
![]() |
![]() |
| Static lip contour | Moving lip contour |