Kernel-based tracking and video patch replacement Igor Guskov
Overview Research areas –Geometry processing Compression of geometry Feature-based matching –Template matching in video
Projects Geometry processing –Semi-regular remeshing Parameterization Add structure to meshes Do wavelet compression –Dynamic mesh compression Soft-body animations Extract Do wavelet compression
Projects Matching –3D matching Automatic scan alignment Shape recognition –Tracking non-rigid geometry in video For geometry reconstruction –Real-time reconstruction For video editing and surveillance
Approximate Surface Alignment Approximate alignment Find approximate alignment automatically Registration: ICP Optimal alignment Joint work with Xinju Li
Video tracking Feature tracking –Classical approach: Lucas&Kanade tracker Based on mean-square error minimization We want to track larger patches
Tracking features Point features –Given point-to-point correspondences Can do reconstruction of 3D geometry, many other things Linear features –Track stick figures: limbs Reconstruct articulated characters –Recognize activity –Silhouettes Patch features –Active appearance models (AAMs) Geometry + texture + appearance –Face tracking –Video editing: monet from Imagineer Systems
Error-based tracking Mean-square error Image I(x) Template T(y) Warp map x=W[z](y) –For instance: W[p](y) = y+p –Small patch translated around W[( ,t)](y) = y+t –Translation + uniform scaling W[h](y) = h(y) –Homography h min p || I(W[p](y)) – T(y) || 2 T(y) I(x)
Quad-marked surface tracking Collection of quads SCA 2003 –Real-time tracking and reconstruction –Four cameras
Mean-shift tracking Formulate tracking as mean-shift problem –Comaniciu, Ramesh, Meer CVPR 2000 Replace a pixel by the distribution of color values in a neighborhood –Histogram –Best match of a histogram Robust to noisy data Very fast algorithm
Histogram matching Bhattacharya coefficient (p,q) –Given two distributions p(z) and q(z) –Related to bounds on the probability of classification error between these two distributions P(error) ≤ (p,q) –For matching, we want P(error)=1
Distance between distributions Metric space of histograms Not that important in the original paper Implement as a simple sum
Where is mean-shift? The way the histograms are computed –Weighted histograms Pixels at the blob center contribute more Setting the gradient of Bhattacharya coefficient to zero one gets –Each pixel contributes its opinion on how relevant it is to be the center of the blob
Mean-shift clustering Comaniciu, Meer PAMI 2002 –Kernel density estimation –Sum of bumps of width h
Extensions Previous work –Translation + scale [Collins 03] –Particle-tracking [Perez et al 02] –Multiple collaborating trackers [Hager et al 04] Template alignment –More general warps –Warp is the key Translation does not really warp Need to account for that properly
Templates I Multiple blobs tracked together –Each has its own histogram p k [t] Easy to do by considering squared sum of distances
Templates II: warp Where is that weighted histogram coming from? –Random variable X Displacement from the blob’s center –Histogram bin p a –With translation –General warp
Triangles Affine warps –Six parameters –Cannot account for perspective distortion Okay for weak perspective Multiple triangles needed –Relations among the collection of triangles Multiscale
A formula Histogram bin value All the pixels y in the image which fall into bin a Warp the pixel position back into canonical space and take its probability density Jacobian of the inverse warp
Simple illumination model Cannot rely on colors being constant –Illumination changes Outdoors: clouds etc. –Shadowing –Cameras set on automatic exposure Always collect relative colors –Average illumination locally L(x) –Histogram of I(W(X))-L(W(X)) This requires some texture to be present Roll-ball video
Optimization Bhattacharya coefficient Take the gradient w.r.t. z –Explicit formula Feed to the optimization library
Implementation YUV video –Histogram in two channels out of three –Y is luminance Higher resolution –UV is color Histograms 16x16 bins Templates have 120 blobs (16*15/2)
Results Videos –About one second per frame Extend to masked template
Video augmentation Previous work –Bartoli & Zisserman 2004 RBF estimation & grid –Pilet et al Keypoint features Real-time detection –Lin 2005 Near-regular textures
User input Masks for tracking and replacement Tracking of the templates –Warping of the replacement grid Poisson edit on the replacement region
Warping the grid Blend affine transformations –Warping of the replacement grid
Masks and grids
Replacement image Select replace Poisson edit
Motion blur Necessary for visual quality Smear the replacement region –Perform Poisson gradient fitting in a larger region
Results Videos
Conclusions Basic tracking procedure –Imperfect match –Non-rigid patches –Large areas Replacement in videos –Simple user input –Warping and Poisson edit
Better tracking of noisy videos Coming up… –Multiple primitives Multilevel within each primitive –Editing pipeline –Texture replacement