Perceptual Annotation: Measuring Human Vision to Improve Computer Vision Walter J. Scheirer, Samuel E. Anthony, Ken Nakayama & David D. Cox IEEE Transactions on Pattern Analysis and Machine Intelligence (2014), 36(8), 1679-1686 Presented by: Talia Retter
Human performance vs. computer vision Introduction: “For many classes of problems, the goal of computer vision is to solve visual challenges for which human observers have effortless expertise…” (Even further, problems are defined by human perception: the goal of computer vision not to uncover “ground truths” of an image, but to analyze it in a way that corresponds to human vision, i.e., is functionally useful for us.)
“A case study in face detection” Face detection: Detecting whether an image contains a face or not Performance: measured in accuracy and speed from Fig. 7
Human performance > computer vision Especially in challenging views/environments (“in the wild”)
The inspiration of human vision Past: computers learn by simple coding = “face” or “no face” Present: Enrich computer learning (support vector machines) with “perceptual annotation” (guidance from human “learnability” of faces)
Visual psychophysics for perceptual annotation Steps 1&2) Two experiments: “Face in the branches” Only 10-30% of face visible 3-alternative forced choice: “which of 3 images presented together contains a face?” (450 or 900 ms) 102 trials per subject (~1,000 or 2,000 face images) > 3,000 subjects in ~7 weeks with TestMyBrain website “Fast face finder” Images from AFLW dataset 50 ms: face or non-face? 204 trials per subject (1/3 faces) (~4,000 difference face images) > 400 subjects in ~2 weeks Measure: accuracy and response time
Perceptual annotation for SVMs Step 3) Train a SVM classifier to detect faces (Non-convex) human-weighted loss function that defines the cost of misclassification (for perceptually annotated images) Leads to fewer vectors than a hinge-defined loss function Hinge Human # of s vectors
Augment the face detector with the annotation Steps 4&5) Improve classifier Stage 1: Filter using Haar features instead of a sliding window and varying spatial scale Stage 2: Filter with perceptually annotated SVM Detection predictions
Results (1/3) *new dataset: FDDB faces Human-weighted loss function performs better than hinge-weighted at face detection (across stimulus sets, feature definitions, and behavioral accuracy and RT)
*new dataset: FDDB faces Results (2/3) Perceptually annotated classifier with biologically-defined features outperforms all others
*new dataset: FDDB faces Results (3/3) Perceptually annotated classifier with biologically-defined features outperforms all others
Conclusions Human perceptual annotation is informative for machine learning (SVM classification) Could be applied with neurophysiological human data Could also be applied with other classifier techniques (e.g., neural networks) Interplay between computer science and human perception: but might there be instances in which computers can perform unlike humans to perform better? (e.g., incorporating infrared imaging, Pavlidis & Symosek, 2000; Bebis et al., 2006)