Sequence-to-Sequence Alignment and Applications
Video > Collection of image frames
= Space-time volume X Y Time
Sequence-to-Sequence Alignment [work with Yaron Caspi] Sequence 1 Frame 1 Frame 2 Frame 3 Frame n Sequence 2 Frame 1 Frame 2 Frame 3 Frame n
Video 2 Video 1 Frame 1 Frame 2 Frame 3 Frame n Frame 1 Frame 2 Frame 3 Frame n (a) Find temporal correspondences (b) Find spatial correspondences (x,y,t) (x’,y’,t’) x y t Align and Integrate Space-Time Info [work with Yaron Caspi]
Spatial resolution Temporal resolution Spectral range Depth of focus Dynamic range Field-of-View (FOV) View point “Super Sensors” Exceed Optical Bounds of Visual Sensors: Align and Integrate space-time info
Not enough info for alignment in individual frames Image 1 Image 2 Image-to-Image Alignment
Information in Video: Alignment uniquely defined Appearance info Dynamic info within frames between frames Moving objects Non rigid motion Varying illumination
Where: Problem Formulation
Spatio-Temporal Alignment SSD Minimization: Gauss-Newton (coarse-to-fine) iterations
Coarse-to-Fine Minimization time Sequence 1 time Sequence 2 Pyramid of Sequence 2 Pyramid of Sequence … …
Sequence 1Sequence 2 Before AlignmentAfter Alignment
Sequence 1Sequence 2 Before AlignmentAfter Alignment
Sequence 1Sequence 2 Before AlignmentAfter Alignment
Sequence 1Sequence 2 Before AlignmentAfter Alignment Illumination changes:
Sequence 1Sequence 2 Before Alignment After Alignment
time Super-resolution in space and in time. time High-resolution output sequence: time Low-resolution input sequences Increasing Space-Time Resolution in Video [work with Eli Shechtman & Yaron Caspi]
Spatial Super-Resolution Multiple low-resolution input images: High-resolution output image: Recover small details
What is Super-Resolution in Time? Recover dynamic events that are “faster” than frame-rate (Generate a “high-speed” camera) Application areas: sports events, scientific imaging, etc... Effects of “fast” events imaged by “slow cameras”: (1) Motion aliasing (2) Motion blur
(1) Motion Aliasing The “Wagon wheel” effect:Slow-motion: time Continuous signal time Sub-sampled in time time “Slow motion”
(2) Motion Blur
S h (x h,y h,t h ) Space-Time Super-Resolution x y t y x t Blur kernel: PSF Exposure time Low resolution input sequences High-resolution space-time volume
Super Resolution in Time Input 1Input 2 Input 3Input 4 (25 frames/sec)
Input sequence in slow motion: Super Resolution in Time Output sequence (super-resolved) : (75 frames/sec)
Motion Blur
Overlay of frames Simulated sequences of “fast” event: Very long exposure-time Very low frame-rate One low-res sequence: Another low-res sequence:And another one... Motion Blur
Output trajectory: (overlay of frames) Deblurring: 3 out of 18 low-resolution input sequences: (frame overlays) Output: Input: Output sequence: (x15 frame-rate) Without estimating motion of the ball!
Input (low-res) frames at collision: 4 input sequences: Output (high-res) frame at collision: Motion-Blur Video 1 Video 3 Video 2 Video 4
Spatial resolution Temporal resolution Spectral range Depth of focus Dynamic range Field-of-View (FOV) View point Optical Limits of Visual Sensors: Very little common visual information!!!
Alignment of Non-Overlapping Sequences Coherent appearance (Image-to-Image Alignment) Sequence-to-Sequence Alignment: Alignment in time and in space Coherent camera behavior Coherent scene dynamics (Seq-to-Seq Alignment) [work with Yaron Caspi]
The scene When is it possible? 2) cameras fixed relative to each other 1) same center of projection
H=? Problem formulation H H Input: Output: and such that Sequence 1Sequence 2 Conjugate matrices have the same eigenvalues:
Recovering Temporal Alignment =? T and S have the same eigenvalues, up to scale: Search for the temporal shift which minimizes:
Recovering Spatial Transformation Given : Solve a homogeneous set of linear equations in H
Sequence 1: Sequence 2: Exceed Limited FOV Combined Sequence:
Sequence 1: Sequence 2: Exceed Limited Field of View – Wide-Screen Movies Wide- screen movie:
Fused Sequence: Visible light (video): Infra-Red: Exceed Limited Spectral Range – Day and Night Vision
Zoomed-outZoomed-in Exceed Limited Focal Length –
Zoomed-in Zoomed-out Exceed Limited Focal Length
Copyright, 1996 © Dale Carnegie & Associates, Inc. Summary Forget image frames Video = space-time volume >> collection of images Use all available spatio-temporal info for analysis, representation, and exploitation. Applies to many problem areas: 1. Quick search in video. 2. Alignment and integration of information to exceed optical bounds of visual sensors. 3. Action analysis and recognition 4. Synthesis of video data and many more…
Copyright, 1996 © Dale Carnegie & Associates, Inc. A few comments and clarifications regarding Exercise 4 ON THE BOARD (Please ask a friend if you were not in class)