Tracking Hands with Distance Transforms Dave Bargeron Noah Snavely
The problem Input: A video with a (rigid) hand Output: A sequence of hand locations and orientations
Approach 1.Generate hand templates for all possible orientations 2.Find the edges in each input image 3.For each edge image, find the template and location which minimizes the chamfer distance
Step 1: Generate Templates 1.Create 3D hand model 2.Render in a set of orientations 3.Use depth buffer to find silhouette and contours
Steps 2 + 3: Find the hand 1.Compute the distance transform of the edge image 2.Slide each template over the distance transform, compute the chamfer distance 3.Pick the template with the minimum chamfer distance
Problems Large number of templates –(3211 rotations) x (3072 translations) x (5 scales) = 49,320,960 templates In a cluttered image: –Chamfer distance has many local optima –Global optimum may not be correct Solving each frame separately not a good idea
Solution Part 1: Template Tree Coarse-to-fine search in parameter space
Solution Part 2: Tracking Detect the hand in frame 0 For each frame k > 0: –Compute the most likely transition from state in frame k-1 Use chamfer distance as a likelihood Use transition probability (assumed Gaussian) as a prior –Use transition probabilities to prune branches of the search tree
Results – Hand Detection InputEdge imageDistance transformOutput
Results – Video
Edge ImagesDistance Transforms
Results – Video
Extensions Better tracking –Use color in addition to shape –Use edge orientations –More templates; allow for on-line generation for refinement More flexible tracking –Track deformable hand –Automatically determine hand parameters (e.g. finger length)
References Björn Stenger’s Ph.D thesis – B. Stenger, et. al. “Filtering Using a Tree-Based Estimator.” ICCV Pedro Felzenszwalb and Dan Huttenlocher. “Distance Trasforms of Sampled Functions.”