Project: Context Tracker: Exploring Supporters and Distracters in
Unconstrained Environments (Software Library Released)
Visual tracking in unconstrained environments is very
challenging due to the existence of several sources of varieties
such as changes in appearance, varying lighting conditions,
cluttered background, and frame-cuts. A major factor
causing tracking failure is the emergence of regions having
similar appearance as the target. It is even more challenging
when the target leaves the field of view (FoV) leading
the tracker to follow another similar object, and not
reacquire the right target when it reappears. This paper
presents a method to address this problem by exploiting the
context on-the-fly in two terms: Distracters and Supporters.
Both of them are automatically explored using a sequential
randomized forest, an online template-based appearance
model, and local features. Distracters are regions which
have similar appearance as the target and consistently cooccur
with high confidence score. The tracker must keep
tracking these distracters to avoid drifting. Supporters, on
the other hand, are local key-points around the target with
consistent co-occurrence and motion correlation in a short
time. They play an important role in verifying the genuine
target. Extensive experiments in challenging real-world
video sequences show the tracking improvement when using
this context information. Comparisons with multiple stateof-
the-art approaches are also provided.
T. B. Dinh, N. Vo, and G. Medioni.
Context Tracker: Exploring Supporters and Distracters in
Unconstrained Environments.
To appear CVPR 2011. [pdf]
We are pleased to inform that the software library (DLL) of our our Context Tracker is released today (09/07/2011). The instructions are included in the package under the readme.txt file. Please give us feedbacks and comments.
Project: High Resolution Face Sequences from a PTZ Network Camera
We propose to acquire high resolution sequences
of a person’s face using a pan-tilt-zoom (PTZ) network
camera. This capability should prove helpful in forensic analysis
of video sequences as frames containing faces are tagged, and
within a frame, windows containing faces can be retrieved. The
system starts in pedestrian detector mode, where the lens angle
is set widest, and detects people using a pedestrian detector
module. The camera then changes to the region of interest
(ROI) focusing mode where the parameters are automatically
tuned to put the upper body of the detected person, where the
face should appear, in the field of view (FOV). Then, in the
face detection mode, the face is detected using a face detector
module, and the system switches to an active tracking mode
consisting a control loop to actively follow the detected face
with two different modules: a tracker to track the face in the
image, and a camera control module to adjust the camera
parameters. During this loop, our tracker learns online the
face appearance in multiple views under all condition changes.
It runs robustly at 15 fps and is able to reacquire the face
of interest after total occlusion or leaving FOV. We compare
our tracker with various state-of-the-art tracking methods in
terms of precision and running time performance. Extensive
experiments in challenging indoor and outdoor conditions are
also demonstrated to validate the complete system.
T. B. Dinh, N. Vo, and G. Medioni.
High Resolution Face Sequences from a PTZ Network Camera.
FG 2011. [pdf]
Project: Co-training Framework of Generative and Discriminative Trackers with Partial
Occlusion Handling
Partial occlusion is a challenging problem in object
tracking. In online visual tracking, it is the critical factor
causing drift. To address this problem, we propose a novel
approach using a co-training framework of generative and
discriminative trackers. Our approach is able to detect the
occluding region and continuously update both the generative
and discriminative models using the information from
the non-occluded part. The generative model encodes all of
the appearance variations using a low dimension subspace,
which helps provide a strong reacquisition ability. Meanwhile,
the discriminative classifier, an online support vector
machine, focuses on separating the object from the background
using a Histograms of Oriented Gradients (HOG)
feature set. For each search window, an occlusion likelihood
map is generated by the two trackers through a codecision
process. If there is disagreement between these
two trackers, the movement vote of KLT local features is
used as a referee. Precise occlusion segmentation is performed
using MeanShift. Finally, each tracker recovers the
occluded part and updates its own model using the new nonoccluded
information. Experimental results on challenging
sequences with different types of objects are presented. We
also compare with other state-of-the-art methods to demonstrate
the superiority and robustness of our tracking framework.
T. B. Dinh and G. Medioni.
Co-training Framework of Generative and Discriminative Trackers with Partial
Occlusion Handling.
WMVC 2011. [pdf]
Project: Real Time Tracking using an Active Pan-Tilt-Zoom
Network Camera (05/2008-Present)
We present here a real time active vision system
on a PTZ network camera to track an object of interest. We address two critical
issues in this paper. One is the control of the camera through network
communication to follow a selected object. The other is to track an arbitrary
type of object in real time under conditions of pose, viewpoint and illumination
changes. We analyze the difficulties in the control through the network and
propose a practical solution for tracking using a PTZ network camera. Moreover,
we propose a robust real time tracking approach, which enhances the
effectiveness by using complementary features under a two-stage particle
filtering framework and a multi-scale mechanism. To improve time performance,
the tracking algorithm is implemented as a multi-threaded process in OpenMP.
Comparative experiments with state-of-the-art methods demonstrate the efficiency
and robustness of our system in various applications such as pedestrian
tracking, face tracking, and vehicle tracking.
T. Dinh, Q. Yu, and G. Medioni.
Real Time Tracking using an Active Pan-Tilt-Zoom Network Camera.
IROS 2009. [pdf]
Project:
Online Tracking and Reaquisition Using Co-Trained Generative and Discriminative Trackers (11/2007-05/2008)
Visual tracking is a challenging problem, as an object
may change its appearance due to viewpoint variations, illumination changes, and
occlusion. Also, an object may leave the field of view and then reappear. In order
to track and reacquire an unknown object with limited labeling data, we propose
to learn these changes online and build a model that describes all seen appearance
while tracking. To address this semi-supervised learning problem, we propose a cotraining
based approach to continuously label incoming data and online update a hybrid discriminative
generative model. The generative model uses a number of low dimension linear subspaces
to describe the appearance of the object. In order to reacquire an object, the generative
model encodes all the appearance variations that have been seen. A discriminative
classifier is implemented as an online support vector machine, which is trained
to focus on recent appearance variations. The online co-training of this hybrid
approach accounts for appearance changes and allows reacquisition of an object after
total occlusion. We demonstrate that under challenging situations, this method has
strong reacquisition ability and robustness to distracters in background.
Q. Yu, T. B. Dinh and G. Medioni. Online
Tracking and Reacquisition Using Co-trained Generative and Discriminative Trackers. ECCV2008. [pdf]
Project:
Two-Frames Accurate Motion Segmentation Using Tensor Voting and Graph-Cuts (08/2006-10/2007)
Motion segmentation and motion estimation are important topics in computer vision.
Tensor Voting is a process that addresses both issues simultaneously; but running
time is a challenge. We propose a novel approach which can yield both the motion
segmentation and the motion estimation in the presence of discontinuities. This
method is a combination of a non-iterative boosted-speed voting process in sparse
space in a first stage, and a Graph-Cuts framework for boundary refinement in a
second stage. Here, we concentrate on the motion segmentation problem. After initially
choosing a sparse space by sampling the original image, we represent each of these
pixels as 4-D tensor points and apply the voting framework to enforce local smoothness
of motion. Afterwards, the boundary refinement is obtained by using the Graph-Cuts
image segmentation. Our results attained in different types of motion show that
the method outperforms other Tensor Voting approaches in speed, and the results
are comparable with other methodologies in motion segmentation.
T. Dinh and G. Medioni. Two-Frames Accurate Motion Segmentation
Using Tensor Voting and Graph-Cuts.
WMVC 2008. [pdf]
Project: Hand Gesture Classification Using
Boosted Cascade of Classifiers
(12/2004-07/2005)
Hand gesture recognition is an important component in applications such as human
computer interaction, robot control, and disable people assistance systems in which
performance and robustness are the primary requirements. In this paper, we propose
a hand gesture classification system able to efficiently recognize 24 basic signs
of American Sign Language. In this system, computational performance is achieved
though the use of a boosted cascade of classifiers that are trained by AdaBoost
and informative Haar wavelet features. A new type of feature to adapt to complex
representation of hand gesture is also proposed. Experimental results show that
the proposed approach is promising.
Thang B. Dinh, Van B.
Dang, Duc A. Duong, Tuan T. Nguyen, Duy-Dinh Le.
Hand Gesture Classification Using Boosted Cascade
of Classifiers. 4th IEEE International Conference on Computer Sciences,
Research, Innovation & Vision for the Future (RIVF 2006).