Face Alignment at 3000 FPS via Regressing Local Binary Features

Slides:

Advertisements

Similar presentations

Real-Time Detection, Alignment and Recognition of Human Faces

Advertisements

Joint Face Alignment The Recognition Pipeline

The Layout Consistent Random Field for detecting and segmenting occluded objects CVPR, June 2006 John Winn Jamie Shotton.

Active Appearance Models

Active Shape Models Suppose we have a statistical shape model –Trained from sets of examples How do we use it to interpret new images? Use an “Active Shape.

Face Alignment by Explicit Shape Regression

Ignas Budvytis*, Tae-Kyun Kim*, Roberto Cipolla * - indicates equal contribution Making a Shallow Network Deep: Growing a Tree from Decision Regions of.

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Face detection Behold a state-of-the-art face detector! (Courtesy Boris Babenko)Boris Babenko.

Efficient Large-Scale Structured Learning

Sami Romdhani Volker Blanz Thomas Vetter University of Freiburg

Face Alignment with Part-Based Modeling

Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.

Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.

Su-A Kim 3 rd June 2014 Danhang Tang, Tsz-Ho Yu, Tae-kyun Kim Imperial College London, UK Real-time Articulated Hand Pose Estimation using Semi-supervised.

Face Alignment by Explicit Shape Regression

LOCUS (Learning Object Classes with Unsupervised Segmentation) A variational approach to learning model- based segmentation. John Winn Microsoft Research.

AdaBoost & Its Applications

Robust Object Tracking via Sparsity-based Collaborative Model

Face detection Many slides adapted from P. Viola.

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

AAM based Face Tracking with Temporal Matching and Face Segmentation Dalong Du.

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Large-Scale Object Recognition with Weak Supervision

DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan.

Real-Time Non-Rigid Shape Recovery via AAMs for Augmented Reality Jackie Zhu Oct. 24, 2006.

Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.

Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.

Spatial Pyramid Pooling in Deep Convolutional

(Fri) Young Ki Baik Computer Vision Lab.

Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.

FACE DETECTION AND RECOGNITION By: Paranjith Singh Lohiya Ravi Babu Lavu.

Facial Feature Detection

A Scale and Rotation Invariant Approach to Tracking Human Body Part Regions in Videos Yihang BoHao Jiang Institute of Automation, CAS Boston College.

Face Alignment Using Cascaded Boosted Regression Active Shape Models

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Face detection Slides adapted Grauman & Liebe’s tutorial

#MOTION ESTIMATION AND OCCLUSION DETECTION #BLURRED VIDEO WITH LAYERS

ECE738 Advanced Image Processing Face Detection IEEE Trans. PAMI, July 1997.

A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.

21 June 2009Robust Feature Matching in 2.3μs1 Simon Taylor Edward Rosten Tom Drummond University of Cambridge.

Geodesic Saliency Using Background Priors

Real-Time Detection, Alignment and Recognition of Human Faces Rogerio Schmidt Feris Changbo Hu Matthew Turk Pattern Recognition Project June 12, 2003.

Peter Henry1, Michael Krainin1, Evan Herbst1,

Lecture 09 03/01/2012 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.

Face Alignment at 3000fps via Regressing Local Binary Features CVPR14 Shaoqing Ren, Xudong Cao, Yichen Wei, Jian Sun Presented by Sung Sil Kim.

Discussion of Pictorial Structures Pedro Felzenszwalb Daniel Huttenlocher Sicily Workshop September, 2006.

AAM based Face Tracking with Temporal Matching and Face Segmentation Mingcai Zhou 1 、 Lin Liang 2 、 Jian Sun 2 、 Yangsheng Wang 1 1 Institute of Automation.

Face detection Behold a state-of-the-art face detector! (Courtesy Boris Babenko)Boris Babenko slides adapted from Svetlana Lazebnik.

Cascade Region Regression for Robust Object Detection

CS 548 Spring 2016 Model and Regression Trees Showcase by Yanran Ma, Thanaporn Patikorn, Boya Zhou Showcasing work by Gabriele Fanelli, Juergen Gall, and.

Face detection Many slides adapted from P. Viola.

PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2 Manohar Paluri 1 Marć Aurelio Ranzato 1 Trevor Darrell 2 Lumbomir Boudev 1 1 Facebook.

Jo˜ao Carreira, Abhishek Kar, Shubham Tulsiani and Jitendra Malik University of California, Berkeley CVPR2015 Virtual View Networks for Object Reconstruction.

Strong Supervision from Weak Annotation: Interactive Training of Deformable Part Models S. Branson, P. Perona, S. Belongie.

Robust and Fast Collaborative Tracking with Two Stage Sparse Optimization Authors: Baiyang Liu, Lin Yang, Junzhou Huang, Peter Meer, Leiguang Gong and.

Real-Time Soft Shadows with Adaptive Light Source Sampling

Compositional Human Pose Regression

Nonparametric Semantic Segmentation

Unsupervised Face Alignment by Robust Nonrigid Mapping

Object detection as supervised classification

Iterative Optimization

Part-based visual tracking with online latent structural learning -Rui Yao et al. ICCV 2013 Cvlab Jung ilchae.

Outline Background Motivation Proposed Model Experimental Results

Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu

RCNN, Fast-RCNN, Faster-RCNN

Jie Chen, Shiguang Shan, Shengye Yan, Xilin Chen, Wen Gao

Presentation transcript:

Face Alignment at 3000 FPS via Regressing Local Binary Features Shaoqing Ren, Xudong Cao, Yichen Wei, and Jian Sun Visual Computing Group Microsoft Research Asia

What is Face Alignment? Find face shape S, or semantic facial points 𝑆= 𝑥 1 , 𝑦 1 ,…, 𝑥 𝐿 , 𝑦 𝐿 Crucial for: Recognition Modeling Tracking Animation Editing

Challenges Accuracy: robust to Speed: critical for complex variations phone/tablet system API occlusion pose lighting expression

Traditional Approaches Active Shape Model (ASM) detect points from local features sensitive to noise Active Appearance Model (AAM) sensitive to initialization fragile to appearance change Regression based [Cootes et. al. 1992] [Milborrow et. al. 2008] … [Cootes et. al. 1998] [Matthews et. al. 2004] ... [Saragih et. al. 2007] (AAM) [Sauer et. al. 2011] (AAM) [Cristinacce et. al. 2007] (ASM)

Cascade Shape Regression Framework Stage t = 0 t = 3 t = 5 𝑅 1 … 𝑅 3 𝑅 4 , 𝑅 5 𝑆 𝑡 = 𝑆 𝑡−1 + 𝑅 𝑡 (𝐼, 𝑆 𝑡−1 ) Cascaded pose regression, Dollar et. al., CVPR 2010 Regressor 𝑅 𝑡 𝐼, 𝑆 𝑡−1 is learnt to minimize the shape residual on training data 𝑅 𝑡 = argmin 𝑅 𝑖 ∆ 𝑆 𝑖 −𝑅 𝐼 𝑖 , 𝑆 𝑖 𝑡−1 ∆ 𝑆 = 𝑆 − 𝑆 𝑡−1 : ground truth shape residual

Analysis of Previous Methods Explicit shape regression, Cao et. al., CVPR 2012 Robust Cascade Regression, Burgos et.al., ICCV 2013 Supervised Descent Method, Xiong and Torre, CVPR 2013 Learning method Boosted regression trees local optimization Linear regression global optimization X √ Feature Pixel difference fast learned from data too weak for the hard problem SIFT on landmarks slow hand crafted √ X √ X X

Overview of Our Approach Tree Induced Local Binary Features learned from data global optimization much stronger than previous regression trees efficient training / testing Best accuracy on challenging benchmarks 3,000 FPS on desktop, or 300 FPS on mobile first face tracking method on mobile

Tracking in Real World Videos https://www.youtube.com/watch?v=TOVFOYrXdIQ Face tracking = per-frame alignment + classification

Our Approach A simple form Novel two step learning sum of a large number of regression trees Novel two step learning Local learning of tree structure learn an easier task and better features Global optimization of tree output enforce dependence between points and reduce local estimation errors 𝑅 𝑡 𝐼, 𝑆 𝑡−1 = 𝑘=1 𝐾 𝑟𝑒𝑔_𝑡𝑟𝑒𝑒 𝑘 (𝐼, 𝑆 𝑡−1 )

Local Learning of Tree Structure Estimated Shape 𝑆 𝑡 Ground Truth Shape 𝑆 Random forest Target: one point … learn standard random forests for each local point standard regression tree using pixel difference features only use pixels in the local patch around the point regularization of feature selection

Adaptive Local Region Size Shrink local region size during cascade regression learning

From Local to Global Estimated Shape 𝑆 𝑡 Ground Truth Shape 𝑆 Target: one point Random forest … … Fix tree structures and optimize tree leave’s output

Global Optimization of Tree Output Estimated Shape 𝑆 𝑡 Ground Truth Shape 𝑆 Regression Target Feature Mapping Function … …

Global Optimization of Tree Output Δ 𝑥 1 ,Δ 𝑦 1 →Δ𝑆 Δ 𝑥 5 ,Δ 𝑦 5 →Δ𝑆 point offset → face shape increment optimize all leaves simultaneously by minimizing argmin 𝑅 𝑖 ∆ 𝑆 𝑖 − 𝑅 𝑡 𝐼 𝑖 , 𝑆 𝑖 𝑡−1 is linear to 𝑅 𝑡 𝑅 𝑡 𝐼 𝑖 , 𝑆 𝑖 𝑡−1 = 𝑘=1 𝐾 𝑟𝑒𝑔_𝑡𝑟𝑒𝑒 𝑘 ( 𝐼 𝑖 , 𝑆 𝑖 𝑡−1 ) is linear to unknowns Simply linear regression and global optimal solution!

Tree Induced Binary Features Each leave is a binary indicator function 1 if the image sample arrives at the leaf 0 otherwise Trees -> high dimension sparse binary features Efficient training using linear SVM Efficient testing by adding N leaves N: number of trees, usually a few hundreds

Experiments Two variants of our method Benchmark #landmarks #training images #testing images LFPW 29 717 249 Helen 194 2000 330 300-W 68 3149 689 Two variants of our method Accurate: LBF 1200 trees with depth 7 Fast: LBF fast 300 trees with depth 5

Comparison with other methods Cascade shape regression methods Explicit Shape Regression (ESR) [2] Robust Cascade Pose Regression (PCPR) [3] Supervised Descent Method (SDM) [4] Other methods Exemplar based methods [1, 5] AAM or ASM based methods [6, 7] [1] P. N. Belhumeur, D. W. Jacobs, D. J. Kriegman, and N. Kumar. Localizing parts of faces using a consensus of exemplars (CVPR11) [2] X. Cao, Y. Wei, F. Wen, and J. Sun. Face Alignment by Explicit Shape Regression (CVPR12) [3] X. P. Burgos-Artizzu, P. Perona, and P. Dollar. Robust face landmark estimation under occlusion (ICCV13) [4] X. Xiong and F. De la Torre. Supervised descent method and its applications to face alignment (CVPR13) [5] F. Zhou, J. Brandt, and Z. Lin. Exemplar-based Graph Matching for Robust Facial Landmark Localization (ICCV13) [6] S. Milborrow and F. Nicolls. Locating facial features with an extended active shape model (ECCV08) [7] V. Le, J. Brandt, Z. Lin, L. Bourdev, and T. S. Huang. Interactive Facial Feature Localization (ECCV12)

LBF is much more accurate and a few times faster LFPW (29 landmarks) Method Error FPS [1] 3.99 ≈1 ESR [2] 3.47 220 RCPR [3] 3.50 - SDM [4] 3.49 160 EGM [5] 3.98 <1 LBF 3.35 460 LBF fast 4200 Helen (194 landmarks) Method Error FPS STASM [6] 11.1 - CompASM [7] 9.10 ESR [2] 5.70 70 PCPR [3] 6.50 SDM [4] 5.85 21 LBF 5.41 200 LBF fast 5.80 1500 300-W (68 landmarks) Method Fullset Common Subset Challenging Subset FPS ESR [2] 7.58 5.28 17.00 120 SDM [4] 7.52 5.60 15.40 70 LBF 6.32 4.95 11.98 320 LBF fast 7.37 5.38 15.50 3100 LBF is much more accurate and a few times faster LBF fast is slightly more accurate and dozens of times faster

Local Learning > Global Learning Global Feature Learning : using the whole face region Local Feature Learning : using the local patch (our method)

Binary Feature is Effective Local Forest Regression: use local random forest’s output as features for global linear regression Tree Induced Binary Features : our method

Examples

Summary State-of-the-art face alignment Best accuracy on challenging benchmarks Dozens of times faster than previous methods faster than real time face tracking on mobile Thank you! Welcome to try our live demo!