A Distributed System for Real-time Volume Reconstruction

A Distributed System for Real-time Volume Reconstruction
We present a distributed system for constructing volumetric image sequences in real time. 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Eugene Borovikov and Larry Davis, University of Maryland
Motivation Pursuing distributed vision systems for 3D volumetric reconstruction shape analysis motion capture tracking gesture recognition We present work that is part of ongoing research on the problem of distributed motion capture and gesture recognition. 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Introduction Volume reconstruction system distributed real-time visual cone intersection octree representation We describe a distributed system for real-time volume reconstruction and volumetric image sequence production. The system reconstructs the volume occupied by a moving object (e.g. a person) in the scene being viewed from multiple perspectives, produces a volumetric image represented by an octree, and then streams the volumetric images into a sequence. Glossary: visual cone – volume viewable through the object silhouette octree – a tree with a branching factor of 8; here used as occupancy maps 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Lab design The multi-perspective imaging lab: Dims: 8 × 8 × 3 m 64 digital, progressive-scan cameras organized into 16 quadranocular stereo rigs each stereo rig consists of four cameras: three gray scale and one color each rig is connected to a dedicated PC equipped with four identical digital frame grabbers the 16 networked PC's form the cluster controlled by a dedicated computer, the cluster controller Being equipped with low-noise, high shutter speed cameras, and a high bandwidth network, the Keck cluster provides an ideal environment for high performance multi-perspective imaging 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

System overview camera calibration silhouette extraction silhouette visual cone intersection volumetric data streaming The volume reconstruction procedure utilizes a multi-perspective view of the scene, and consists of the following steps: camera calibration (Roger Tsai’s method) silhouette extraction (Thanarat Horprasert’s background subtraction) silhouette visual cone intersection (Richard Szeliski’s method) volumetric data streaming (storing in a disk file or visualization via OpenGL) Notice that not all of the above steps have to be done in real-time. Some preliminary steps (e.g. camera calibration) can be done off-line. 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Camera calibration Done off-line. calibration frame with 25 non-coplanar calibration points Tsai’s method estimated error in object space is about 3 mm accuracy can be improved somewhat by increasing the image resolution and/or computing the projections of the feature points with sub-pixel precision 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Background subtraction
The volume reconstruction procedure relies on accurate foreground object silhouette extraction, which uses background subtraction. color (Horprasert, Harwood, Davis) grayscale (Haritauglu, Harwood, Davis) statistical pixel-wise background models (off-line) Advantages: work well for general backgrounds tolerant to noise shadows elimination 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Multi-perspective silhouette extraction
= - = Estimate the foreground object's volume multi-perspective silhouette segmentation silhouette visual cone intersection volumetric data streaming graphical rendering and analysis often are done off-line (time consuming) - = 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Volume reconstruction
Snapshots of the volumetric sequence “Conductor” eight level deep octree using six views resistant to background noise due to multi-perspective nature The volumetric reconstruction is not entirely accurate holes and discontinuities due to imperfect silhouette extraction uncarved parts of the volume (e.g. ''wings'') due to the finite number of view points concavities can’t be recovered by volume intersection 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Visual cone construction
image plane Starting with the image of the 3D object, for each view compute the silhouette build object’s visual cone as a 3D occupancy map 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

3D occupancy map image plane Build visual cone’s occupancy map represent visual cone as an octree intersect octant’s projection bounding square with object’s silhouette to decide octant’s occupancy 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Volume element occupancy
Volume element occupancy is determined by the inclusion/exclusion of its projection’s bounding square. Two shortcomings: projections are expensive and redundant naïve hexagon intersections with the silhouette are inefficient Solution: voxel's projection bounding square lookup table (VPBSLT) to get the octant’s bounding square half checker board distance transforms to instantly determine inclusion/exclusion 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Voxel's projection bounding square lookup table (VPBSLT)
pre-computed for each view octree node’s projection bounding boxes full  large: O(8depth) = O(23depth) suits well the computation Fast: a projection is one lookup; Large: a lookup table of depth 8 takes about 28MB of space. 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Half checker board distance transforms
two per image: positive and negative tells the max square fitting entirely in foreground Efficient way of estimating projection inclusion/exclusion into silhouette 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Final volume reconstruction
Intersecting visual cones’ occupancy maps. A voxel is empty, if any view claims it to be occupied, if all views agree it is half-empty, otherwise If a voxel is half-empty, it is split into 8 children, and each child’s occupancy is decided recursively. 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Distributed visual cone intersection
1 2 3 4 5 6 7 2 4 6 These features make the visual cone intersection procedure parallel and highly scalable: works for any number of processor, not necessarily a power of 2 allows for pipelining of the visual cones local memory and network bandwidth efficient 4 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Octree encoding Byte buffer in depth-first search (DFS) order: memory efficient compact network transmission friendly disk file streaming friendly convenient for DFS-order procedures The system takes advantage of this fact by encoding the octrees in a special way: memory efficient (memory is allocated/deallocated once for the whole tree) compact (node-a-byte and no need for pointers) network transmission friendly (no need for serialization) disk file streaming friendly (same reason) adapted for DFS-order octree processing procedures (nodes are stored in the DFS-order) 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Experimental results Setup: cluster of 16 Pentium II 450MHz PC's 100MBit/s TCP/IP network 14 color cameras 2  2  2 meter scene space On 400-frame sequences of a moving person we observed the following average performance Octree depth Voxel size (cm) Frame rate (Hz) 6 3.125 10 7 1.563 8 0.781 4 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Factors affecting performance
octree depth (visual cone intersection) frame acquisition (from board/disk, color filtering) image preprocessing (silhouette, HDT map) latency due to the pipelining of the volume intersection procedure The system's frame rate depends on many factors: octree depth (visual cone construction/intersection and data transmission time) for an octree of depth 8, visual cone construction 78ms cone intersection < 30ms octree transmission < 32ms. There are also static factors, independent of the reconstruction depth but dependent on the source image resolution and quality. frame acquisition (from board/disk, color filtering) image preprocessing (silhouette extraction, and HDT map construction) the typical time for acquiring a frame (from a disk file) is about 35ms frame preprocessing time varies (depending on the size of the silhouette and noise) between 45 and 65 ms latency term due to the pipelining of the volume intersection procedure. For the reconstruction depth of 8, it is typically about 188ms. 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

Résumé Built an real-time system for volume reconstruction featuring distributed visual cone intersection fast occupancy analysis special encoding for octree Show demo of live 2D and 3D video 12/1/2018 Eugene Borovikov and Larry Davis, University of Maryland

A Distributed System for Real-time Volume Reconstruction

Similar presentations

Presentation on theme: "A Distributed System for Real-time Volume Reconstruction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Distributed System for Real-time Volume Reconstruction

Similar presentations

Presentation on theme: "A Distributed System for Real-time Volume Reconstruction"— Presentation transcript:

Similar presentations

About project

Feedback