A Distributed System for Real-time Volume Reconstruction

Slides:

Advertisements

Similar presentations

Christian Lauterbach COMP 770, 2/16/2009. Overview  Acceleration structures  Spatial hierarchies  Object hierarchies  Interactive Ray Tracing techniques.

Advertisements

For Internal Use Only. © CT T IN EM. All rights reserved. 3D Reconstruction Using Aerial Images A Dense Structure from Motion pipeline Ramakrishna Vedantam.

Occlusion Culling Fall 2003 Ref: GamasutraGamasutra.

5/13/2015CAM Talk G.Kamberova Computer Vision Introduction Gerda Kamberova Department of Computer Science Hofstra University.

A Modified EM Algorithm for Hand Gesture Segmentation in RGB-D Data 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) July 6-11, 2014, Beijing,

Tracking Multiple Occluding People by Localizing on Multiple Scene Planes Professor ：王聖智教授 Student ：周節.

Image-Based Target Detection and Tracking Aggelos K. Katsaggelos Thrasyvoulos N. Pappas Peshala V. Pahalawatta C. Andrew Segall SensIT, Santa Fe January.

Shape from Contours and Multiple Stereo A Hierarchical, Mesh-Based Approach Hendrik Kück, Wolfgang Heidrich, Christian Vogelgsang.

ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 

HCI Final Project Robust Real Time Face Detection Paul Viola, Michael Jones, Robust Real-Time Face Detetion, International Journal of Computer Vision,

International Conference on Image Analysis and Recognition (ICIAR’09). Halifax, Canada, 6-8 July Video Compression and Retrieval of Moving Object.

Computational Photography: Image-based Modeling Jinxiang Chai.

1 MURI review meeting 09/21/2004 Dynamic Scene Modeling Video and Image Processing Lab University of California, Berkeley Christian Frueh Avideh Zakhor.

CSCE 641 Computer Graphics: Image-based Modeling Jinxiang Chai.

1 Image-Based Visual Hulls Paper by Wojciech Matusik, Chris Buehler, Ramesh Raskar, Steven J. Gortler and Leonard McMillan [

Motion based Correspondence for Distributed 3D tracking of multiple dim objects Ashok Veeraraghavan.

Stereoscopic Light Stripe Scanning: Interference Rejection, Error Minimization and Calibration By: Geoffrey Taylor Lindsay Kleeman Presented by: Ali Agha.

Multi-view stereo Many slides adapted from S. Seitz.

Automatic Camera Calibration for Image Sequences of a Football Match Flávio Szenberg (PUC-Rio) Paulo Cezar P. Carvalho (IMPA) Marcelo Gattass (PUC-Rio)

Digital Image Processing

Accurate, Dense and Robust Multi-View Stereopsis Yasutaka Furukawa and Jean Ponce Presented by Rahul Garg and Ryan Kaminsky.

1 Activity and Motion Detection in Videos Longin Jan Latecki and Roland Miezianko, Temple University Dragoljub Pokrajac, Delaware State University Dover,

Ray Tracing Primer Ref: SIGGRAPH HyperGraphHyperGraph.

Finish Adaptive Space Carving Anselmo A. Montenegro †, Marcelo Gattass ‡, Paulo Carvalho † and Luiz Velho † †

Shape Recognition and Pose Estimation for Mobile Augmented Reality Author ： N. Hagbi, J. El-Sana, O. Bergig, and M. Billinghurst Date ： Speaker.

Knowledge Systems Lab JN 9/10/2002 Computer Vision: Gesture Recognition from Images Joshua R. New Knowledge Systems Laboratory Jacksonville State University.

CS654: Digital Image Analysis Lecture 3: Data Structure for Image Analysis.

A General Framework for Tracking Multiple People from a Moving Camera

Exploitation of 3D Video Technologies Takashi Matsuyama Graduate School of Informatics, Kyoto University 12 th International Conference on Informatics.

1 Real-time visualization of large detailed volumes on GPU Cyril Crassin, Fabrice Neyret, Sylvain Lefebvre INRIA Rhône-Alpes / Grenoble Universities Interactive.

Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)

1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.

File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.

Finish Hardware Accelerated Voxel Coloring Anselmo A. Montenegro †, Luiz Velho †, Paulo Carvalho † and Marcelo Gattass ‡ †

Chittampally Vasanth Raja vasanthexperiments.wordpress.com.

PMR: Point to Mesh Rendering, A Feature-Based Approach Tamal K. Dey and James Hudson

3/16/04James R. McGirr1 Interactive Rendering of Large Volume Data Sets Written By : Stefan Guthe Michael Wand Julius Gonser Wolfgang Straβer University.

Using Adaptive Tracking To Classify And Monitor Activities In A Site W.E.L. Grimson, C. Stauffer, R. Romano, L. Lee.

Visual Odometry David Nister, CVPR 2004

Multimedia Systems and Communication Research Multimedia Systems and Communication Research Department of Electrical and Computer Engineering Multimedia.

Target Tracking In a Scene By Saurabh Mahajan Supervisor Dr. R. Srivastava B.E. Project.

EECS 274 Computer Vision Projective Structure from Motion.

Camera surface reference images desired ray ‘closest’ ray focal surface ‘closest’ camera Light Field Parameterization We take a non-traditional approach.

APE'07 IV INTERNATIONAL CONFERENCE ON ADVANCES IN PRODUCTION ENGINEERING June 2007 Warsaw, Poland M. Nowakiewicz, J. Porter-Sobieraj Faculty of.

1 Supporting a Volume Rendering Application on a Grid-Middleware For Streaming Data Liang Chen Gagan Agrawal Computer Science & Engineering Ohio State.

CS552: Computer Graphics Lecture 28: Solid Modeling.

CS-565 Computer Vision Nazar Khan Lecture 2.

A. M. R. R. Bandara & L. Ranathunga

CPS216: Data-intensive Computing Systems

Group 4 Alix Krahn Denis Lachance Adam Thomsen

Group 4 Alix Krahn Denis Lachance Adam Thomsen

Chapter 10 Computer Graphics

Paper – Stephen Se, David Lowe, Jim Little

Distributed Ray Tracing

Introduction of Real-Time Image Processing

Data Compression.

3D Object Representations

Removing Highlight Spots in Visual Hull Rendering

Real-Time Human Pose Recognition in Parts from Single Depth Image

© University of Wisconsin, CS559 Fall 2004

E. Borovikov, A. Sussman, L. Davis, University of Maryland

Overview What is Multimedia? Characteristics of multimedia

Coding Approaches for End-to-End 3D TV Systems

Finite Element Surface-Based Stereo 3D Reconstruction

Silhouette Intersection

Uncalibrated Geometry & Stratification

A Volumetric Method for Building Complex Models from Range Images

Figure 3. Converting an expression into a binary expression tree.

Presentation transcript:

A Distributed System for Real-time Volume Reconstruction We present a distributed system for constructing volumetric image sequences in real time. 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Eugene Borovikov and Larry Davis, University of Maryland Motivation Pursuing distributed vision systems for 3D volumetric reconstruction shape analysis motion capture tracking gesture recognition We present work that is part of ongoing research on the problem of distributed motion capture and gesture recognition. 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Eugene Borovikov and Larry Davis, University of Maryland Introduction Volume reconstruction system distributed real-time visual cone intersection octree representation We describe a distributed system for real-time volume reconstruction and volumetric image sequence production. The system reconstructs the volume occupied by a moving object (e.g. a person) in the scene being viewed from multiple perspectives, produces a volumetric image represented by an octree, and then streams the volumetric images into a sequence. Glossary: visual cone – volume viewable through the object silhouette octree – a tree with a branching factor of 8; here used as occupancy maps 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Eugene Borovikov and Larry Davis, University of Maryland Lab design The multi-perspective imaging lab: Dims: 8 × 8 × 3 m 64 digital, progressive-scan cameras organized into 16 quadranocular stereo rigs each stereo rig consists of four cameras: three gray scale and one color each rig is connected to a dedicated PC equipped with four identical digital frame grabbers the 16 networked PC's form the cluster controlled by a dedicated computer, the cluster controller Being equipped with low-noise, high shutter speed cameras, and a high bandwidth network, the Keck cluster provides an ideal environment for high performance multi-perspective imaging 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Eugene Borovikov and Larry Davis, University of Maryland System overview camera calibration silhouette extraction silhouette visual cone intersection volumetric data streaming The volume reconstruction procedure utilizes a multi-perspective view of the scene, and consists of the following steps: camera calibration (Roger Tsai’s method) silhouette extraction (Thanarat Horprasert’s background subtraction) silhouette visual cone intersection (Richard Szeliski’s method) volumetric data streaming (storing in a disk file or visualization via OpenGL) Notice that not all of the above steps have to be done in real-time. Some preliminary steps (e.g. camera calibration) can be done off-line. 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Eugene Borovikov and Larry Davis, University of Maryland Camera calibration Done off-line. calibration frame with 25 non-coplanar calibration points Tsai’s method estimated error in object space is about 3 mm accuracy can be improved somewhat by increasing the image resolution and/or computing the projections of the feature points with sub-pixel precision 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Background subtraction The volume reconstruction procedure relies on accurate foreground object silhouette extraction, which uses background subtraction. color (Horprasert, Harwood, Davis) grayscale (Haritauglu, Harwood, Davis) statistical pixel-wise background models (off-line) Advantages: work well for general backgrounds tolerant to noise shadows elimination 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Multi-perspective silhouette extraction = - = Estimate the foreground object's volume multi-perspective silhouette segmentation silhouette visual cone intersection volumetric data streaming graphical rendering and analysis often are done off-line (time consuming) - = 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Volume reconstruction Snapshots of the volumetric sequence “Conductor” eight level deep octree using six views resistant to background noise due to multi-perspective nature The volumetric reconstruction is not entirely accurate holes and discontinuities due to imperfect silhouette extraction uncarved parts of the volume (e.g. ''wings'') due to the finite number of view points concavities can’t be recovered by volume intersection 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Visual cone construction image plane Starting with the image of the 3D object, for each view compute the silhouette build object’s visual cone as a 3D occupancy map 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Eugene Borovikov and Larry Davis, University of Maryland 3D occupancy map image plane Build visual cone’s occupancy map represent visual cone as an octree intersect octant’s projection bounding square with object’s silhouette to decide octant’s occupancy 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Volume element occupancy Volume element occupancy is determined by the inclusion/exclusion of its projection’s bounding square. Two shortcomings: projections are expensive and redundant naïve hexagon intersections with the silhouette are inefficient Solution: voxel's projection bounding square lookup table (VPBSLT) to get the octant’s bounding square half checker board distance transforms to instantly determine inclusion/exclusion 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Voxel's projection bounding square lookup table (VPBSLT) pre-computed for each view octree node’s projection bounding boxes full  large: O(8depth) = O(23depth) suits well the computation Fast: a projection is one lookup; Large: a lookup table of depth 8 takes about 28MB of space. 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Half checker board distance transforms two per image: positive and negative tells the max square fitting entirely in foreground Efficient way of estimating projection inclusion/exclusion into silhouette 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Final volume reconstruction Intersecting visual cones’ occupancy maps. A voxel is empty, if any view claims it to be occupied, if all views agree it is half-empty, otherwise If a voxel is half-empty, it is split into 8 children, and each child’s occupancy is decided recursively. 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Distributed visual cone intersection 1 2 3 4 5 6 7 2 4 6 These features make the visual cone intersection procedure parallel and highly scalable: works for any number of processor, not necessarily a power of 2 allows for pipelining of the visual cones local memory and network bandwidth efficient 4 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Eugene Borovikov and Larry Davis, University of Maryland Octree encoding Byte buffer in depth-first search (DFS) order: memory efficient compact network transmission friendly disk file streaming friendly convenient for DFS-order procedures The system takes advantage of this fact by encoding the octrees in a special way: memory efficient (memory is allocated/deallocated once for the whole tree) compact (node-a-byte and no need for pointers) network transmission friendly (no need for serialization) disk file streaming friendly (same reason) adapted for DFS-order octree processing procedures (nodes are stored in the DFS-order) 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Eugene Borovikov and Larry Davis, University of Maryland Experimental results Setup: cluster of 16 Pentium II 450MHz PC's 100MBit/s TCP/IP network 14 color cameras 2  2  2 meter scene space On 400-frame sequences of a moving person we observed the following average performance Octree depth Voxel size (cm) Frame rate (Hz) 6 3.125 10 7 1.563 8 0.781 4 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Factors affecting performance octree depth (visual cone intersection) frame acquisition (from board/disk, color filtering) image preprocessing (silhouette, HDT map) latency due to the pipelining of the volume intersection procedure The system's frame rate depends on many factors: octree depth (visual cone construction/intersection and data transmission time) for an octree of depth 8, visual cone construction 78ms cone intersection < 30ms octree transmission < 32ms. There are also static factors, independent of the reconstruction depth but dependent on the source image resolution and quality. frame acquisition (from board/disk, color filtering) image preprocessing (silhouette extraction, and HDT map construction) the typical time for acquiring a frame (from a disk file) is about 35ms frame preprocessing time varies (depending on the size of the silhouette and noise) between 45 and 65 ms latency term due to the pipelining of the volume intersection procedure. For the reconstruction depth of 8, it is typically about 188ms. 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Eugene Borovikov and Larry Davis, University of Maryland Résumé Built an real-time system for volume reconstruction featuring distributed visual cone intersection fast occupancy analysis special encoding for octree Show demo of live 2D and 3D video 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland