Toward True 3D Object Recognition
This talk addresses the problem of recognizing three-dimensional (3D)
objects in photographs and image sequences, revisiting viewpoint
invariants as a -local- representation of shape and appearance. The
key insight is that, although smooth surfaces are almost never planar
in the large, and thus do not (in general) admit global invariants,
they are always planar in the small---that is, sufficiently small
surface patches can always be thought of as being comprised of
coplanar points---and thus can be represented locally by planar
invariants. This is the basis for a new, unified approach to object
recognition where object models consist of a collection of small
(planar) patches, their invariants, and a description of their 3D
spatial relationship. Specifically, the local invariants used in this
proposal are the affine-invariant descriptions of the image brightness
pattern in the neighborhood of salient image features ("interest
points") recently developed by Lindeberg and Garding and by
Mikolajczyk and Schmid. These affine-invariant patches provide a
normalized representation of the local object appearance, invariant
under viewpoint and illumination changes, that can be used as a local
measure of image, part, or object similarity. The spatial
relationship between local invariants is used to represent the global
object structure and drive the recognition process. I will illustrate
our approach with two fundamental instances of the 3D object
recognition problem: (1) modeling rigid 3D objects from a small set of
unregistered pictures and recognizing them in cluttered photographs
taken from unconstrained viewpoints; and (2) representing, learning,
and recognizing non-uniform texture patterns under non-rigid
transformations. If time permits, I will conclude with a brief
discussion of our current work in 3D photography using shape, texture,
and motion cues.
Joint work with Svetlana Lazebnik, Frederick Rothganger, Cordelia
Schmid, Yasutaka Furukawa, and Kenton McHenry.