Semantic, Interactive Manipulation of Visual Data

Alexandre R.J. Francois

Digital production of visual content is the object of Computer Graphics (CG). A large number of powerful systems and tools are commercially available. They are often specific of a particular level of representation (e.g. still images, video sequences, 3-D models), or are targeted, either by design or de facto, at one aspect of the production process (e.g. painting, modeling, animation, rendering). In these systems, most operations are per-formed manually. While the CG artist should keep total control of the creative process, tedious tasks should be performed by the computer. Ideally, a CG artist involved in any step of visual content production should be able to manipulate semantic objects, not only pixels, frames or vertices. Extracting objects from pixels is the goal of Computer Vision (CV). The emphasis in this field is put on the development of automatic techniques, result-ing in systems that are specific to a given task, even to a given set of inputs, and require the setting of numerous, non-intuitive parameters. This approach is difficult to integrate in any human creative process. Meanwhile, the field has produced algorithms that could per-form robustly and efficiently many tedious tasks that the CG artist performs manually today. While CV and CG historically evolved as two separate fields, we believe they must play complementary roles in a new approach to visual content production: tedious manual tasks in CG applications can be automated or facilitated, and ambiguities in CV algo-rithms can be resolved with high-level, intuitive human input.

We present an open, interactive system that allows the user to create, experience and manipulate visual data in a consistent, natural and intuitive way. Visual data is defined as any representation of real or virtual entities primarily experienced visually.

The design of such a novel system requires a thorough examination of the new constraints introduced by the goal of bringing more power to the user in an intuitive way so as not to interfere with the creative process, while preserving the freedom of access and control over information at any level of abstraction. Furthermore, the greatest challenge faced when integrating such a wide variety of functionalities in a useful and usable software sys-tem may lie beyond the conceptual and technical specificities of each concerned field or subfield. A central issue in the design of our system is providing universal and extensible data representation and processing, with consistent mechanisms for scheduling, planning and synchronization. Based on these considerations, we have designed an open, modular software architecture that supports concurrent on-line processing of data streams. We have developed the data structures and processing modules necessary to build a visual environ-ment in which a user can interactively manipulate visual data of various levels of semantic representation, either directly or through programs, and interactively build and exploit applications for automatic, on-line processing of data-streams.

We have developed a number of modules whose purpose is to produce, automatically or interactively, representations of visual information that allow the user to access its various semantic levels. At the pixel level, we demonstrate user-driven image feature extraction using an energy map produced by a tensor voting technique, automatic real-time segmen-tation of video streams produced by a fixed camera using a statistical color background model, and mosaicing using the data provided by an instrumented pan-tilt-zoom camera. For image-centered representations, we demonstrate the advantages of Non-Uniform Rational B-Splines (NURBS), present an automatic NURBS curve fitting algorithm, and a NURBS surface-based deformable image region model. At the object/scene-centered level, we emphasize the importance of symmetries in the recovery of intrinsic 3D struc-tures from image-centered level data in monocular views, and show how bilateral symme-try can be used to recover (textured) 3D surfaces. We also show how, in some cases, more constrained volumetric primitives can be recovered directly.

Our most valuable contribution is the design and implementation of an open, generic inter-active system, in which will be incorporated relevant existing CV algorithms, to comple-ment the ones we have contributed. We also intend to use this system as a development platform for future research. Furthermore, we believe that our core architecture design is suitable to naturally extend the scope of our system from visual to multimedia data.