In this talk, first we will present our approach for tracking people in multiple cameras. We employ the novel approach of finding the limits of field of view (FOV) of a camera as visible in the other cameras. Using this information, when a person is seen in one camera, we are able to predict all the other cameras in which this person will be visible. Moreover, we apply the FOV constraint to disambiguate between possible candidates for correspondence. Tracking in each individual camera needs to be resolved before such an analysis can be applied. We perform tracking in a single camera using background subtraction, followed by region correspondence. This takes into account the velocities, sizes and distance of bounding boxes obtained through connected component labeling.
In the second part of the talk, we will discuss automatically understanding human actions using motion trajectories derived from video sequences. Since an action takes place in 3-D, and is projected on 2-D image, depending on the viewpoint of the camera the projected 2-D trajectory may vary. This may create a problem in interpretation of trajectories at the higher level. However, if the representation of actions only captures characteristics, which are view-invariant, then the higher level interpretation can proceed without any ambiguity. We will discuss a computational representation of human action to capture dramatic changes in a motion trajectory using spatio-temporal curvature of 2-D trajectory. This representation is compact, view-invariant, and is capable of explaining an action in terms of meaningful atomic units.