Three tales of reconstruction: real-time, accurate and temporally consistent

Philippos Mordohai


Abstract

In this talk I will describe my 3D reconstruction work at UNC. Specifically, I will present three projects that have been accepted to the upcoming ICCV in Brazil. The first part will be on real-time video-based reconstruction of urban environments. Our system collects video from eight cameras, GPS and INS data. The data are processed off-line but in real-time to produce geo-registered, detailed 3D models. I will focus on aspects of the systems that I have worked on, which include a novel plane-sweeping stereo algorithm that analyzes sparse information on the scene to optimize the selection of sweeping directions and planes; a depth map fusion approach that merges the stereo depth maps according to visibility constraints; and a scheme to generate the final model as a multi-resolution triangular mesh. I will show reconstructions which were obtained at speeds faster than real-time, by leveraging the processing power of the GPU while maintaining an accuracy of a few centimeters. In the second part, I will describe an approach for multi-view 3D shape reconstruction formulated as the computation of a minimum cut on the dual graph of a semiregular, multi-resolution, tetrahedral mesh. Our method uses photo-consistency to guide the adaptive subdivision of a coarse mesh of the bounding volume. This generates a multi-resolution volumetric mesh that is densely tessellated in the parts likely to contain the unknown surface. The graph-cut on the dual graph of this tetrahedral mesh produces a minimum cut corresponding to a triangulated surface that minimizes a global surface cost functional. Our method makes no assumptions about topology and can recover deep concavities when enough cameras observe them. Our formulation also allows silhouette constraints to be enforced during the graph-cut step to counter its inherent bias for producing minimal surfaces. In the third part, I will present an approach for 3D reconstruction from multiple video streams that is capable of enforcing temporal consistency on the reconstruction of successive frames. Our main goal is to improve the quality of the reconstruction by finding corresponding pixels in subsequent frames of the same camera using optical flow, but also to at least maintain the quality of the single time-step reconstruction when these correspondences are wrong or cannot be found. This allows us to process scenes with fast motion, occlusions and self-occlusions where optical flow fails for very large numbers of pixels. To this end we modify the belief propagation algorithm to operate on a 3D graph that includes temporal neighbors, besides spatial, and to be able to discard messages from outlying neighbors. We also propose methods for introducing a bias term and for suppressing noise typically observed in uniform regions. The bias term encapsulates information on the background and aids in achieving a temporally consistent reconstruction and in the mitigation of occlusion related errors.


Maintained by Qian Yu