Project: Context Tracker: Exploring Supporters and Distracters in Unconstrained Environments (Software Library Released)


Visual tracking in unconstrained environments is very challenging due to the existence of several sources of varieties such as changes in appearance, varying lighting conditions, cluttered background, and frame-cuts. A major factor causing tracking failure is the emergence of regions having similar appearance as the target. It is even more challenging when the target leaves the field of view (FoV) leading the tracker to follow another similar object, and not reacquire the right target when it reappears. This paper presents a method to address this problem by exploiting the context on-the-fly in two terms: Distracters and Supporters. Both of them are automatically explored using a sequential randomized forest, an online template-based appearance model, and local features. Distracters are regions which have similar appearance as the target and consistently cooccur with high confidence score. The tracker must keep tracking these distracters to avoid drifting. Supporters, on the other hand, are local key-points around the target with consistent co-occurrence and motion correlation in a short time. They play an important role in verifying the genuine target. Extensive experiments in challenging real-world video sequences show the tracking improvement when using this context information. Comparisons with multiple stateof- the-art approaches are also provided.


T. B. Dinh, N. Vo, and G. Medioni. Context Tracker: Exploring Supporters and Distracters in Unconstrained Environments. To appear CVPR 2011. [pdf]

We are pleased to inform that the software library (DLL) of our our Context Tracker is released today (09/07/2011). The instructions are included in the package under the readme.txt file. Please give us feedbacks and comments.

To download the software library click here.

Project: High Resolution Face Sequences from a PTZ Network Camera



We propose to acquire high resolution sequences of a person’s face using a pan-tilt-zoom (PTZ) network camera. This capability should prove helpful in forensic analysis of video sequences as frames containing faces are tagged, and within a frame, windows containing faces can be retrieved. The system starts in pedestrian detector mode, where the lens angle is set widest, and detects people using a pedestrian detector module. The camera then changes to the region of interest (ROI) focusing mode where the parameters are automatically tuned to put the upper body of the detected person, where the face should appear, in the field of view (FOV). Then, in the face detection mode, the face is detected using a face detector module, and the system switches to an active tracking mode consisting a control loop to actively follow the detected face with two different modules: a tracker to track the face in the image, and a camera control module to adjust the camera parameters. During this loop, our tracker learns online the face appearance in multiple views under all condition changes. It runs robustly at 15 fps and is able to reacquire the face of interest after total occlusion or leaving FOV. We compare our tracker with various state-of-the-art tracking methods in terms of precision and running time performance. Extensive experiments in challenging indoor and outdoor conditions are also demonstrated to validate the complete system.

T. B. Dinh, N. Vo, and G. Medioni. High Resolution Face Sequences from a PTZ Network Camera. FG 2011. [pdf]

Project: Co-training Framework of Generative and Discriminative Trackers with Partial
Occlusion Handling

Partial occlusion is a challenging problem in object tracking. In online visual tracking, it is the critical factor causing drift. To address this problem, we propose a novel approach using a co-training framework of generative and discriminative trackers. Our approach is able to detect the occluding region and continuously update both the generative and discriminative models using the information from the non-occluded part. The generative model encodes all of the appearance variations using a low dimension subspace, which helps provide a strong reacquisition ability. Meanwhile, the discriminative classifier, an online support vector machine, focuses on separating the object from the background using a Histograms of Oriented Gradients (HOG) feature set. For each search window, an occlusion likelihood map is generated by the two trackers through a codecision process. If there is disagreement between these two trackers, the movement vote of KLT local features is used as a referee. Precise occlusion segmentation is performed using MeanShift. Finally, each tracker recovers the occluded part and updates its own model using the new nonoccluded information. Experimental results on challenging sequences with different types of objects are presented. We also compare with other state-of-the-art methods to demonstrate the superiority and robustness of our tracking framework.

T. B. Dinh and G. Medioni. Co-training Framework of Generative and Discriminative Trackers with Partial
Occlusion Handling. WMVC 2011. [pdf]
Project: Real Time Tracking using an Active Pan-Tilt-Zoom Network Camera (05/2008-Present)

We present here a real time active vision system on a PTZ network camera to track an object of interest. We address two critical issues in this paper. One is the control of the camera through network communication to follow a selected object. The other is to track an arbitrary type of object in real time under conditions of pose, viewpoint and illumination changes. We analyze the difficulties in the control through the network and propose a practical solution for tracking using a PTZ network camera. Moreover, we propose a robust real time tracking approach, which enhances the effectiveness by using complementary features under a two-stage particle filtering framework and a multi-scale mechanism. To improve time performance, the tracking algorithm is implemented as a multi-threaded process in OpenMP. Comparative experiments with state-of-the-art methods demonstrate the efficiency and robustness of our system in various applications such as pedestrian tracking, face tracking, and vehicle tracking.

T. Dinh, Q. Yu, and G. Medioni. Real Time Tracking using an Active Pan-Tilt-Zoom Network Camera. IROS 2009. [pdf]
Project: Online Tracking and Reaquisition Using Co-Trained Generative and Discriminative Trackers (11/2007-05/2008)

Visual tracking is a challenging problem, as an object may change its appearance due to viewpoint variations, illumination changes, and occlusion. Also, an object may leave the field of view and then reappear. In order to track and reacquire an unknown object with limited labeling data, we propose to learn these changes online and build a model that describes all seen appearance while tracking. To address this semi-supervised learning problem, we propose a cotraining based approach to continuously label incoming data and online update a hybrid discriminative generative model. The generative model uses a number of low dimension linear subspaces to describe the appearance of the object. In order to reacquire an object, the generative model encodes all the appearance variations that have been seen. A discriminative classifier is implemented as an online support vector machine, which is trained to focus on recent appearance variations. The online co-training of this hybrid approach accounts for appearance changes and allows reacquisition of an object after total occlusion. We demonstrate that under challenging situations, this method has strong reacquisition ability and robustness to distracters in background.

Q. Yu, T. B. Dinh and G. Medioni. Online Tracking and Reacquisition Using Co-trained Generative and Discriminative Trackers. ECCV2008. [pdf]

Two-Frames Accurate Motion Segmentation Using Tensor Voting and Graph-Cuts (08/2006-10/2007)

Motion segmentation and motion estimation are important topics in computer vision. Tensor Voting is a process that addresses both issues simultaneously; but running time is a challenge. We propose a novel approach which can yield both the motion segmentation and the motion estimation in the presence of discontinuities. This method is a combination of a non-iterative boosted-speed voting process in sparse space in a first stage, and a Graph-Cuts framework for boundary refinement in a second stage. Here, we concentrate on the motion segmentation problem. After initially choosing a sparse space by sampling the original image, we represent each of these pixels as 4-D tensor points and apply the voting framework to enforce local smoothness of motion. Afterwards, the boundary refinement is obtained by using the Graph-Cuts image segmentation. Our results attained in different types of motion show that the method outperforms other Tensor Voting approaches in speed, and the results are comparable with other methodologies in motion segmentation.

T. Dinh and G. Medioni. Two-Frames Accurate Motion Segmentation Using Tensor Voting and Graph-Cuts. WMVC 2008. [pdf]

Project:  Hand Gesture Classification Using
Boosted Cascade of Classifiers (12/2004-07/2005)

Hand gesture recognition is an important component in applications such as human computer interaction, robot control, and disable people assistance systems in which performance and robustness are the primary requirements. In this paper, we propose a hand gesture classification system able to efficiently recognize 24 basic signs of American Sign Language. In this system, computational performance is achieved though the use of a boosted cascade of classifiers that are trained by AdaBoost and informative Haar wavelet features. A new type of feature to adapt to complex representation of hand gesture is also proposed. Experimental results show that the proposed approach is promising.

Thang B. Dinh, Van B. Dang, Duc A. Duong, Tuan T. Nguyen, Duy-Dinh Le. Hand Gesture Classification Using Boosted Cascade of Classifiers. 4th IEEE International Conference on Computer Sciences, Research, Innovation & Vision for the Future (RIVF 2006).

Disclaimer: IRIS group, Computer Science Department, USC Viterbi School of Engineering or University of Southern California do not screen or control the content on this website and thus does not guarantee the accuracy, integrity, or quality of such content. All content on this personal website is provided by and is the sole responsibility of the person from which such content originated..