On-the-fly Specific Person Retrieval University of Oxford 24 th May 2012 Omkar M. Parkhi, Andrea Vedaldi and Andrew Zisserman
Overview People “Barack Obama” “George Bush” “Courtney Cox” Ranked ShotsTextual Queries Search for: On-the-fly i.e. with no previous knowledge or model for these queries Large collection of un-annotated videos Large collection of un-annotated videos
Scrubs Data Set 12 Episodes from Seasons 1-5 and 8 5 hours of video data About 400k frames, partitioned into 5k shots About 300k near frontal face detections 768 x 576 MPEG2 format
Demo Search for “Courteney Cox” in Scrubs dataset. Steps: 1.Download example images from Google 1.Train a ranking function 1.Apply ranking function to video collection
DEMO
Demo- Scrubs Data Set
On the fly person retrieval system Text Query “Courteney Cox” Negative Training Images Video Collection Face Tracks Facial Features & Descriptors Facial Features & Descriptors Results ON-LINE PROCESSING OFF-LINE PROCESSING Google Image Search “Courteney Cox” Facial Features & Descriptors Fast Linear Classifier Ranking
Detection and Tracking Viola-Jones face detection on each frame Tracking measures “ connectedness ” of a pair of faces by point tracks intersecting both Doesn ’ t require contiguous detections No drift Faces clustered into tracks [Everingham et al. 2006, Apostoloff & Zisserman, 2007]
Scrubs Data Set 12 Episodes from Seasons 1-5 and 8 5 hours of video data About 400k frames, partitioned into 5k shots 300k face detections 6k face tracks 768 x 576 MPEG2 format
Detecting facial feature points Pictorial structure model Joint model of feature appearance and position [Felzenszwalb and Huttenlocher’2004, Everingham et al ]
Face Appearance Representation Affine transformation of face to canonical frame Independent photometric normalization of parts Represent gradients over circle centred on facial feature points Feature descriptor is a 3849 dimensional vector [ Everingham et al ]
Negative Training Images Combination of faces from Random downloaded images Labeled Faces in the Wild dataset Caltech Faces dataset About 16k face detections. Caltech 10, 000 Web Faces: Labeled Faces in the Wild:
On-the-fly Person Retrieval Text Query “Courteney Cox” Negative Training Images Video Collection Face Tracks Facial Features & Descriptors Facial Features & Descriptors Results ON-LINE PROCESSING OFF-LINE PROCESSING Google Image Search “Courteney Cox” Facial Features & Descriptors Fast Linear Classifier Ranking
DEMO
Demo- Scrubs Data Set
TRECVid 2011 (IACC.1.B) About 200 hours of video data. 8k videos. MPEG4, 320x240 pixels 130k shots, About 3 million face detections 25,535 face tracks.
DEMO
Demo - TRECVid 2011 (IACC.1.B)
Facial attributes – FaceTracer project Examples: gender: male, female age: baby, child, youth, middle age, senior race: white, black, asian smiling, mustache, eye-wear, hair colour N. Kumar, P. N. Belhumeur and S. K. Nayar, FaceTracer: A Search Engine for Large Collections of Images with Faces, European Conference on Computer Vision (ECCV), Method person independent training set with attribute facial feature representation discriminative training of classifier for attribute
DEMO
Facial attributes – Glasses
DEMO Facial attributes – Beard
Facial attributes – Eyes Closed
Quantitative Performance - Scrubs Dataset Performance evaluation for 3 guest actors (Brendan Fraser, Courteney Cox and Michael J Fox) 12 dataset videos split into training and test sets (3 Training, 9 Testing) Annotations: Manual labeling of training and test set for each actor Manual labeling of positive training images from Google Negative training images from Caltech Faces dataset.
Quantitative Performance - Scrubs Dataset Retrieval Average Precision (AP) Training Examples SourceAverage Precision Positive Negative Brendan Fraser Courteney Cox Michael J Fox Scrubs GoogleScrubs GoogleCaltech
Quantitative Performance - Scrubs Dataset Using more training data per track Training Examples Source # samples per track Average Precision Positive Negative Brendan Fraser Courteney Cox Michael J Fox Scrubs Single Scrubs Multiple
Future Work Exploring sources for positive examples Better feature representations Combination of attributes and identities
Any Questions?