The design and implementation of a complete artificial vision system represents a daunting challenge. The Computer Vision research community has been working on this problem for over twenty five years, and we can point to significant contributions in a number of areas. The gap between the state of the art and the goal is still wide.
The main reason why we are not progressing any faster is that, quite simply, we do not know how to proceed, because we are not able to cleanly decompose and express the sub-problems to be addressed. We do not have a road map with clearly landmarks to refer to. The classification of approaches into low, medium, and high level vision, has many drawbacks:
The one and only complete computational theory of Computer Vision can be found in the pioneering work of David Marr (1982). It has served as a guiding light for many students and researchers, defining terms, identifying issues, and suggesting solutions. It is now showing its limitations, and current research results are rarely presented in the context of the Marr theory.
This book represents a summary of the research we have been conducting since the early 1990s, and describes a conceptual framework which addresses some current shortcomings, and proposes a unified approach for a broad class of problems. While the framework is defined, our research continues, and some of the elements presented here will no doubt evolve in the coming years. Why, then, choose to write it now?
In part, because the results are encouraging enough to be presented today, but also because it is the proper way to convey a unified picture, an aspect which often gets lost in individual papers.

This book is not intended as a textbook, although it could be used as a complement to existing textbooks. It is organized in eight chapters. In the Introduction chapter, we present the definition of the problems, and give an overview of the proposed approach and its implementation. In particular, we illustrate the limitations of the 2.5D sketch, and motivate the use of a representation in terms of layers instead. In chapter 2, we review some of the relevant research in the literature. The discussion focuses on general computational approaches for early vision, and individual methods are only cited as references. Chapter 3 is the fundamental chapter, as it presents the elements of our salient feature inference engine, and their interaction. It introduced tensors as a way to represent information, tensor fields as a way to encode both constraints and results, and tensor voting as the communication scheme. Chapter 4 describes the feature extraction steps, given the computations performed by the engine described earlier. In chapter 5, we apply the generic framework to the inference of regions, curves, and junctions in 2-D. The input may take the form of 2-D points, with or without orientation. We illustrate the approach on a number of examples, both basic and advanced. In chapter 6, we apply the framework to the inference of surfaces, curves and junctions in 3-D. Here, the input consists of a set of 3-D points, with or without as associated normal or tangent direction. We show a number of illustrative examples, and also point to some applications of the approach. In chapter 7, we use our framework to tackle 3 early vision problems, shape from shading, stereo matching, and optical flow computation. In chapter 8, we conclude this book with a few remarks, and discuss future research directions.

We include 3 appendices, one on Tensor Calculus, one dealing with proofs and details of the Feature Extraction process, and one dealing with the companion software packages.

In addition to the text and figures, we encourage the reader to download, from the World Wide Web, 2 software packages to experiment with the system.

The systems are available at


The software should run on any PC running under the Windows 95, 98, NT environment, and with adequate memory.