Face Recognition and Tracking for Human-Robot Interaction using PeCoFiSH Alex Eisner This material is based upon work supported by the National Science Foundation under Grant No. IIS/REU/ Mentor: Dr. Andrew Fagg
Overview Mobile Manipulator Navigation, Grasping Human-Robot Interaction Intuitive cooperation Body language, Hand gestures Face Tracking Engage humans, ignore non-humans
Approach OpenCV implementation of Haar Cascades Finds faces in each frame Haar-like features (Haar Wavelets) Find signals of different frequencies, like FFT Locates patterns of intensity gradients Preserves spatial information Inexpensive to convolve over an image
Haar Features Image source: morph.cs.st-andrews.ac.ukImage source: twoday.tuwien.ac.at
Approach Haar Cascades: highly effective, but noisy High false positive rate, caused by: Noise in video stream “Face-like” objects in environment PeCoFiSH: Persistence and Color Filter Selection for Haar cascades –Prefer objects that persist between frames –Prefer objects that are skin-colored
Persistence Filter Store a certain number of past frames Identify locations of face candidates in each frame Locations that are candidates in at least N frames are considered faces Salvador Dalí, Image source:
Use this picture to discuss the raw Haar wavelet approach Move it up…
Persistence Filter Successfully filters for noise –Not for face-like objects in environment For that we turn to color filering Image source: optcorp.com
Color Filtering Construct a color model For a given image, assigns a likelihood to each pixel that that pixel is part of a face A statistical model using Gaussian mixture elements
Color Filtering: Overview Offline Sample images captured with Biclops –Variety of skin tones, lighting conditions, etc. User tags locations in images containing skin Color values at each pixel used to create color model Color model passed to PeCoFiSH
Color Filtering: Overview Online (PeCoFiSH): Mixture model applied to give relative likelihood each pixel is skin Pixels above a certain threshold are considered Series of dilate/erode operations –Remove clusters which are very thin –Connect clusters which are very close to each other Image source:
User-selected skin pixels represented in YCbCr space Luma (brightness), blue-difference, red-difference 3-Dimensional Gaussian mixture model created from N Gaussians EM algorithm finds the parameters of the Gaussian mixture model Color Filtering: Implementation
Pixel Samples in YCrCb Space
Combined Filters: PeCoFiSH Threshold applied to persistence filter to give binary windows of persistent faces Pixel areas which match both color and persistence filters are considered candidates Largest set of connected pixels found Minimum size threshold is applied Largest face is probably the closest face
Analysis True Positive (TP): an existing face was tagged False Positive (FP): a location with no face was tagged False Negative (FN): a location with a face was not tagged
Analysis Accuracy –How frequently the existing faces were tagged –Accuracy = TP / ( TP + FN ) Precision –How frequently the tagged locations were faces –Precision = TP / ( TP + FP )
Video: alex240-boxes
Analysis: Simple Case
Video: alexwalk2-raw
Video: alexwalk2-filter
Video: alexwalk2-boxes
Analysis: Complex Case
Analysis The really cool part: Throughout all these tests, At no time did PeCoFiSH select a major face location which was not a face! Robot won't try and talk to the hat rack
Performance By applying a scale factor, real-time processing is obtainable without sacrificing accuracy At 640x480, 2 FPS At 320x240, 10 FPS Scale Factor Frames per Second
Current Work Saccadic face tracking with Biclops Human-Robot cooperation Image source: tangledwing.wordpress.com
Questions? This material is based upon work supported by the National Science Foundation under Grant No. IIS/REU/