Steps Towards the Convergence of Graphics, Vision, and Video P. Anandan Microsoft Research
Graphics : The Traditional View Synthetic Camera Model Image Output
Vision : The Traditional View Real Cameras Model Output Real Scene
Video : The Traditional View Encode transform coeffs motion vectors Decode Display
Summary… Graphics renders models simulates (approximates) physics Vision builds models from images “inverse physics and geometry” and video … is for viewing
And most videos are badly made Collection easy, but quality poor Unstable; bad camera shots; long pauses between interesting content; In the linear form, access to interesting content is hard Signal level compression is unselective Content manipulation extremely hard
New View: A Picture = 1000 Words Why throw away all that information we collect in images and video? Use them to tell stories Geometry is useful, but it is not the central goal
Application Scenarios Communication Commerce Entertainment (movies, games) Education and training
Convergence of Graphics, Vision, and Video Processing
Four Steps Towards Convergence Representations and tools Software infrastructure Hardware pipeline Applications Focus of this talk: Representation and Tools
Geometry/image continuum Common Data Model Geometry/image continuum Concentric mosaics Image centric Geometry centric Figure from Kang/Szeliski/Anandan ICIP paper. We will be developing an imaging model that captures this spectrum and permits easy use of all these techniques. The important thing to understand is that finding a common platform to accommodate this entire spectrum gains us the flexibility to make use of each technique represented in the spectrum and the efficiency to mix the representations without a performance penalty. Layered Depth Images Fixed geometry View-dependent texture Sprites with depth Lumigraph Light field Polygon rendering + texture mapping Warping Interpolation
Image-Based Rendering Use images as rendering primitives Panoramic images Light Field and Lumigraph Concentric mosaics Sprites with Depth (3D reconstruction) …mostly new view generation
Image Based Modeling (and Rendering) The Geometry Centric Approach Images Geomety + Texture Rendered Images
The Lightfield and the Lumigraph The collection of all the light rays through space (3D) At all times (1D) Over all colors (3D) Along different directions (2D) Also various subsets of this The Lumigraph
Layered Sprites ... + ...
Stereo Imaging and Depth Maps Collection of images 3-D Scene Stereo “Model” of scene
The notion of “disparity” 3-D point uR Disparity encodes depth z: Disparity, d = uL - uR z d-1 uL
Stereo Matching criterion Aggregation method Winner selection Measures similarity of pixels How the error function is computed Computing the results
Problems with stereo Depth discontinuities Lack of texture (depth ambiguity) Non-rigid effects (highlights, reflection, translucency)
View-dependent geometry: Concept Correct global geometry, Gglobal GVC C1 C2 VC C3 C4 C5 C6 C7 C8
Video-Based Rendering Image-Based Rendering: render from (real-world) images for efficiency, quality, and photo-realism Video-Based Rendering use video instead of still images generate computer video instead of computer graphics
Virtual Viewpoint Video Capture multiple synchronized video streams
Acquisition setup cameras controlling hard disks laptop concentrators Our video capture system consists of 8 cameras, each with a resolution of 1024 by 768 capturing at 15 frames per second. Each group of 4 cameras is synchronized using a device called a concentrator, which pipes all the uncompressed video data to a bank of hard disks via a fiber optic cable. The two concentrators are themselves synchronized, and are controlled by a single laptop.
Representation Color Depth Color Depth MAIN LAYER BOUNDARY LAYER Matting information
Other VBR Examples Facial animation Video Rewrite, … Layer/matte extraction Video Matting, … Dynamic (stochastic) elements Video Textures, … 3-D world navigation
Video Matting Pull dynamic -matte from video with complex backgrounds [Chuang et al. @ UW, SIGGRAPH’2002]
Video Matting Background modification examples
Video Textures How can we turn a short video clip into an amount of continuous video? dynamic elements in 3D games and presentations alternative to 3D graphics animation? [Schödl, Szeliski, Salesin, Essa, SG’2000]
Video Textures Find cyclic structure in the video (Optional) region-based analysis Play frames with random shuffle Smooth over discontinuities (morph)
Interactive fish This is the result that we get. We processed 8 minutes of fish video to generate the fish animation. The current goal is marked with the red dot, and the bar shows the current frame.
Video as a Space-time Volume
Video as a movie
The “flipbook” paradigm
(vs.) The Space-time Cube
Space-time video geometry
Y T X
Automatic EpiStrip analysis and extraction Space-Time volume EPIs: Epipolar plane images
EpiTubes and layers Automatically extracted EpiTube Original data Re-synthesized layer
Layer extraction from sequence Front-most layer, the dodecahedron Original sequence Background layer
Specularities and Highlights. Specular reflections Highlights
Taxonomy of Specular EPI strips. Specularity across multiple strips Highlight with varying colors Extracted Specularity Extracted true feature trace = + EPI trace of features AND specularities
Specularity in the Lightfield volume
Original = Diffuse + Specular
Original = Diffuse + Specular
Summary Rather than explicitly modeling the dynamic 3D appearance/behavior of the world, learn it from lots of sample data The closer you stick to the original data, the greater the realism The more you (automatically) model/abstract, the greater the control
Summary Great synergy exists between g
MPEG4 and MPEG7