Temporal Reasoning from Video to Temporal Synthesis of Video

Irfan Essa


In this talk, I will present some ongoing work on extracting spatio-temporal cues from video for both synthesis of novel video sequences, and recognition of complex activities. I will start off with some of our earlier work on Video Textures, where repeating information is extracted to generate extended sequences of videos. I will then describe some of our extensions to this approach that allow for controlled generation of animations of video sprites. We have developed various learning and optimization techniques that allow for video-based animations of photo-realistic characters. Then I will describe our new approach for image and video synthesis that builds on optimal patch-based copying of samples. I will show how our method allows for iterative refinement and extends to synthesis of both images and video from very limited samples.

In the next part of my talk, I will describe how a similar analysis of video can be used to recognize what a person is doing in a scene. Such an analysis of video, aimed at recognition, requires more contextual information about the environment. I will show how we leverage contextual information shared between actions and objects to recognize what is happening in complex environments. I will also show that by adding some form of grammar (we use Stochastic Context Free Grammar) we can recognize very complex, multi-tasked activities.

If time permits, I will describe (very briefly) the Aware Home project at Georgia Tech, which is one primary area of ongoing and future research for me and my group. Further information on my work with videos is available from my webpage at http://www.cc.gatech.edu/~irfan


Irfan Essa is an Associate Professor in the College of Computing, and Adjunct Professor in the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia. At Georgia Tech, he is affiliated with the Future Computing Environments effort, the Graphics, Visualization and Usability Center, and the Intelligent Systems Group in the College of Computing. He has founded the Computational Perception Laboratory (CPL) at Georgia Tech, that aims to explore and develop the next generation of intelligent machines, interfaces, and environments that can perceive, recognize, anticipate, and interact with humans. CPL since 1996 has grown to include 4 other vision faculty and over 30 (undergrad/grad) students. He is also a founding member of the Aware Home Research Initiative at Georgia Tech. Irfan earned his SM (1990) and PhD (1994) from the MIT Media Laboratory, where he also worked as a Research Scholar (1994-1996) before joining the GT faculty. He has received the prestigious awards of NSF CAREER Investigator, Imlay Fellowship, Edenfield Fellowship, and the College of Computing Research, Teaching, and Dean's Awards.

Maintained by Philippos Mordohai