ECCV 2008 Review 3
Pramod Kumar Sharma
Abstract
1. Beyond Sliding Windows: Object Localization by Efficient Subwindow Search.
Lampert, C. H., M. B. Blaschko and T. Hofmann, CVPR 2008
Abstract: Most successful object recognition systems rely on binary classification, deciding only if an object is present or not,
but not providing information on the actual object location. To perform localization, one can take a sliding window approach,
but this strongly increases the computational cost, because the classifier function has to be evaluated over a large set of candidate
subwindows. In this paper, we propose a simple yet powerful branchand- bound scheme that allows efficient maximization of a large
class of classifier functions over all possible subimages. It converges to a globally optimal solution typically in sublinear time.
We show how our method is applicable to different object detection and retrieval scenarios. The achieved speedup allows the use of
classifiers for localization that formerly were considered too slow for this task, such as SVMs with a spatial pyramid kernel or nearest
neighbor classifiers based on the 2-distance. We demonstrate state-of-the-art performance of the resulting systems on the UIUC Cars
dataset, the PASCAL VOC 2006 dataset and in the PASCAL VOC 2007 competition.
2. Learning to Localize Objects with Structured Output Regression.
Blaschko, M. B. and C. H. Lampert, ECCV 2008
Abstract: Sliding window classifiers are among the most successful and widely applied techniques for object localization. However, training
is typically done in a way that is not specific to the localization task. First a binary classifier is trained using a sample of positive and negative
examples, and this classifier is subsequently applied to multiple regions within test images. We propose instead to treat object localization in
a principled way by posing it as a problem of predicting structured data: we model the problem not as binary classification, but as the prediction
of the bounding box of objects located in images. The use of a joint-kernel framework allows us to formulate the training procedure as a
generalization of an SVM, which can be solved efficiently. We further improve computational efficiency by using a branch-and-bound strategy
for localization during both training and testing. Experimental evaluation on the PASCAL VOC and TU Darmstadt datasets show that the structured
training procedure improves performance over binary training as well as the best previously published scores.