The complexity of vision systems can be represented along many parameters,
one of them being the amount of data that is processed. On one end of this
spectrum is a single image, while on the other end is a large camera network. In
this talk, I will focus on these two ends of the spectrum, analyze their unique
requirements and inter-relationships.
In the first part, we will discuss mathematical models of image appearance. In
my research, I have tried to address the question on how valid are some of the
commonly used models, like linear, bilinear, multilinear, locally linear. Given
the physical laws of object motion, surface properties and image formation, can
we derive some of these models from first principles? We will see that, under
certain mathematical assumptions, we can indeed derive some of these models and
that this analysis provides new insights into problems of tracking and
recognition.
In the second part of the talk, I will discuss our current work on scene
analysis in camera networks. I will first describe a multi-objective
optimization framework that is able to hold tracks of multiple targets over
space and time by adapting between delay and accuracy requirements.
Then, I will describe our recent work on cooperative control of a camera
network using game theory. The presentation will end with a discussion
on the joint use of audio and video sensors for control and inference.
กก