High-Quality Video View Interpolation

Name: High-Quality Video View Interpolation
Uploaded: 2017-08-16T18:29:39+00:00
Duration: PTM22S11
Description: High-Quality Video View Interpolation

High-Quality Video View Interpolation
Larry Zitnick Interactive Visual Media Group Microsoft Research

3D video Image centric Geometry centric
Figure from Kang/Szeliski/Anandan ICIP paper. We will be developing an imaging model that captures this spectrum and permits easy use of all these techniques. The important thing to understand is that finding a common platform to accommodate this entire spectrum gains us the flexibility to make use of each technique represented in the spectrum and the efficiency to mix the representations without a performance penalty. Fixed geometry View-dependent texture View-dependent geometry Sprites with depth Layered depth Image Lumigraph Light field Polygon rendering + texture mapping Warping Interpolation

Current practice free viewpoint video Many cameras vs. Motion Jitter

Video view interpolation
Fewer cameras and Smooth Motion Automatic Real-time rendering

System overview Video Capture Video Capture Stereo Representation
OFFLINE Stereo Representation Compression File Our rendering system consists of offline and online components. The video capture and processing are done offline. Video processing consists of stereo computation and data compression. The dynamic scene can then be interactively viewed by selectively decompressing the data file and rendering it. ONLINE Selective Decompression Render

cameras cameras hard disks controlling laptop concentrators
Our video capture system consists of 8 cameras, each with a resolution of 1024 by 768 capturing at 15 frames per second. Each group of 4 cameras is synchronized using a device called a concentrator, which pipes all the uncompressed video data to a bank of hard disks via a fiber optic cable. The two concentrators are themselves synchronized, and are controlled by a single laptop.

Calibration Zhengyou Zhang, 2000

Input videos

System overview Video Capture Video Capture Stereo Stereo
OFFLINE Stereo Stereo Representation Compression File Our rendering system consists of offline and online components. The video capture and processing are done offline. Video processing consists of stereo computation and data compression. The dynamic scene can then be interactively viewed by selectively decompressing the data file and rendering it. ONLINE Selective Decompression Render

Key to view interpolation: Geometry
Stereo Geometry Image 1 Image 2 Camera 1 Camera 2 Virtual Camera

Image correspondence Image 1 Image 2 Leg Correct Match Score Incorrect
Good Bad Incorrect Match Score Match Score Match Score Wall

Why segments? Better delineation of boundaries.

Why segments? Larger support for matching.
Handle gain and offset differences without global model (Kim, Kolmogorov and Zabih, 2003.)

Why segments? More efficient. 786,432 pixels vs. 1000 segments
Compute disparities per segment rather than per pixel.

Segmentation Many methods will work:
Graph-based (Felzenszwalb and Huttenlocher, 2004) Mean Shift (Comaniciu, et al. 2001) Min-cut (Boykov et al. 2001) Others…

Segmentation: Important properties
Not too large, not too small… As large as possible while not spanning multiple objects.

Segmentation: Important properties
Stable Regions

Segmentation: Our Approach
First average… …then segment. Anisotropic smoothing

Segmentation: Result Close-up

Matching segments Many measures will work:
SSD Normalized correlation Mutual information Depends on color balancing and image quality.

Matching segments: Important properties
Never remove correct matches. Remove as many false matches as possible Use global methods to remove remaining false positives.

Matching segments: Our approach
Create gain histogram 0.8 1.25 Good match 0.8 1.25 Bad match

Local matching Image 1 Image 2 Low texture

Global regularization
Create MRF (Markov Random Field): Image 1 Image 2 A F E D C B R P Q S T U Number of states = number of depth levels Each segment is a node

Likelihood (data term) Prior (regularization term) Disparity Images

Image 1 Image 2 A F E D C B R P Q S T U colorA ≈ colorB → zA ≈ zB

Variance – % of border and similarity of color Normal distribution A F E D C B Disparity

Multiple disparity maps
Compute a disparity map for each image. We want the disparity maps to be consistent across images…

Consistent disparities
Image 1 Image 2 A F E D C B R P Q A S T U zA ≈ zP, zQ, zS

Disparities dependent on neighboring disparities. Likelihood includes neighboring disparities.

Use original data term if not occluded. Bias disparities to lie behind known surfaces when occluded. if not occluded if occluded

Is the segment occluded?
Ii Occluded Not occluded

If occluded… Disparity Ii Occluded

Iteratively solve MRF

Depth through time

Matting Interpolated view without matting Bayesian Matting
Background Surface Interpolated view without matting Foreground Surface Background Background Alpha Strip Width Foreground Foreground Bayesian Matting Chuang et al. 2001 Camera

Rendering with matting
No Matting Matting

System overview Video Capture Stereo Stereo Representation
OFFLINE Stereo Stereo Representation Representation Compression File Our rendering system consists of offline and online components. The video capture and processing are done offline. Video processing consists of stereo computation and data compression. The dynamic scene can then be interactively viewed by selectively decompressing the data file and rendering it. ONLINE Selective Decompression Render

Representation Main Boundary Strip Width Boundary Layer: Main Layer:
Background Boundary Strip Width Foreground Boundary Layer: Color Depth Alpha Main Layer: Color Depth

System overview Video Capture Stereo Representation Representation
OFFLINE Stereo Representation Representation Compression Compression File Our rendering system consists of offline and online components. The video capture and processing are done offline. Video processing consists of stereo computation and data compression. The dynamic scene can then be interactively viewed by selectively decompressing the data file and rendering it. ONLINE Selective Decompression Render

Compression Camera 1 Camera 2 Camera 3 Camera 4 Time = 0 Time = 1
Compression is used to reduce the large data-set to a manageable size and to allow fast playback from disk. We developed our own codec to make use of both temporal and between-camera redundancy. Temporal prediction is used to compress the reference camera’s data in terms of previously decoded results from an earlier frame time. Spatial prediction makes use of the reference camera’s disparity map to transform its texture and disparity data into the viewpoint of a spatially adjacent camera. The differences between predicted and actual images are coded using a novel transform-based compression scheme which can simultaneously handle texture, disparity and alpha-map data. To obtain real-time interactivity, the overall decoding scheme is highly optimized for speed and makes use of the GPU where possible. Camera 4 Time = 0 Time = 1

Compression Camera 1 Temporal Prediction Camera 2 Camera 3 Camera 4

Compression Camera 1 Camera 2 Camera 3 Spatial Prediction Camera 4

Spatial prediction Depth and Texture Reference Camera Predicted Camera

Spatial prediction Depth and Texture Warped Reference Camera Predicted

Spatial prediction Warped Depth and Texture Error Signal _ + Reference
Camera Error Signal _ Predicted Camera +

Spatial prediction Warped Depth and Texture
Reference Camera Reconstructed (after error signal is added) Predicted Camera

Boundary layer coding Depth Color Texture Alpha Matte
Use our own shape coding method similar to MPEG-4

System overview Video Capture Stereo Representation Compression
OFFLINE Stereo Representation Compression Compression File File Our rendering system consists of offline and online components. The video capture and processing are done offline. Video processing consists of stereo computation and data compression. The dynamic scene can then be interactively viewed by selectively decompressing the data file and rendering it. ONLINE Selective Decompression Selective Decompression Render Render

Rendering Source Cameras
Next we describe the process of using the GPU to render a novel viewpoint from the compressed data. Given a novel viewpoint <highlight virtual camera> the rendering program determines the two nearest cameras <highlight two nearest>. The data from these two cameras is blended to create the new viewpoint. A block diagram of the rendering process is shown here <pause>.

Rendering Composite Project Boundary Layer Project Boundary Layer
Main Layer Project Main Layer Next we describe the process of using the GPU to render a novel viewpoint from the compressed data. Given a novel viewpoint <highlight virtual camera> the rendering program determines the two nearest cameras <highlight two nearest>. The data from these two cameras is blended to create the new viewpoint. A block diagram of the rendering process is shown here <pause>.

Rendering the main layer (Step 1)
Depth Color Video of background depth Video of background color Projected Color Buffer At every frame time the background depth map is used to create a dense mesh which is texture mapped with the background color map. The mesh is projected into the virtual cameras view. A pixel shader program rejects any pixels that are at large depth gradients. Vertex Shader Pixel Shader Position, Texture Coord GPU Z-Buffer

Rendering the main layer (Step 2)
Depth Projected Color Buffer Locate Depth Discontinuities Need to remove main layer triangles connecting background to foreground (1 pixel wide). Avoid modifying the mesh on a frame by frame basis. Set Z-buffer to far away and colors to transparent for boundary. Generate Erase Mesh Pixel Shader CPU GPU Z-Buffer

Rendering boundary layer
Depth Boundary RGBA Projected Main Layer Projected Color Buffer Vertex Colors At every frame time the background depth map is used to create a dense mesh which is texture mapped with the background color map. The mesh is projected into the virtual cameras view. A pixel shader program rejects any pixels that are at large depth gradients. Generate Boundary Mesh Compositing CPU GPU Z-Buffer

Graphics for Vision Use the GPU for vision.
Real-time stereo – (Yang and Pollefeys, CVPR 03)

Rendering Composite Project Boundary Layer Project Boundary Layer
Main Layer Project Main Layer Next we describe the process of using the GPU to render a novel viewpoint from the compressed data. Given a novel viewpoint <highlight virtual camera> the rendering program determines the two nearest cameras <highlight two nearest>. The data from these two cameras is blended to create the new viewpoint. A block diagram of the rendering process is shown here <pause>.

Compositing views Pixel Shader Camera 1 Camera 2 Final composite
Weights based on proximity to virtual viewpoint Final composite Final Result Normalization Pixel Shader GPU

DEMO Running in real time on a xxx machine. Pause, interpolate, with without playback. Decompressed and rendered in real time. 640x480 x N frames = 300MB.

“Massive Arabesque” videoclip

Future work Mesh simplification More complicated scenes
Temporal interpolation (use optical flow) Wider range of virtual motion 2D grid of cameras

Summary Sparse camera configuration High-quality depth recovery
Automatic matting New two-layer representation Inter-camera compression Real-time rendering

High-Quality Video View Interpolation

Similar presentations

Presentation on theme: "High-Quality Video View Interpolation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

High-Quality Video View Interpolation

Similar presentations

Presentation on theme: "High-Quality Video View Interpolation"— Presentation transcript:

Similar presentations

About project

Feedback