The complexity of vision systems can be represented along many parameters,
one of them being the amount of data that is processed. On one end of this
spectrum is a single image, while on the other end is a large camera network. In
this talk, I will focus on these two ends of the spectrum, analyze their unique
requirements and inter-relationships.
In the first part, we will discuss mathematical models of image appearance. In my research, I have tried to address the question on how valid are some of the commonly used models, like linear, bilinear, multilinear, locally linear. Given the physical laws of object motion, surface properties and image formation, can we derive some of these models from first principles? We will see that, under certain mathematical assumptions, we can indeed derive some of these models and that this analysis provides new insights into problems of tracking and recognition.
In the second part of the talk, I will discuss our current work on scene analysis in camera networks. I will first describe a multi-objective optimization framework that is able to hold tracks of multiple targets over space and time by adapting between delay and accuracy requirements. Then, I will describe our recent work on cooperative control of a camera network using game theory. The presentation will end with a discussion on the joint use of audio and video sensors for control and inference. กก