Face Alignment by Explicit Shape Regression


Similar presentations
Real-Time Detection, Alignment and Recognition of Human Faces

The Layout Consistent Random Field for detecting and segmenting occluded objects CVPR, June 2006 John Winn Jamie Shotton.
Active Appearance Models
Active Shape Models Suppose we have a statistical shape model –Trained from sets of examples How do we use it to interpret new images? Use an “Active Shape.
Face Alignment by Explicit Shape Regression
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
3 Small Comments Alex Berg Stony Brook University I work on recognition: features – action recognition – alignment – detection – attributes – hierarchical.
Scene Labeling Using Beam Search Under Mutex Constraints ID: O-2B-6 Anirban Roy and Sinisa Todorovic Oregon State University 1.
Object class recognition using unsupervised scale-invariant learning Rob Fergus Pietro Perona Andrew Zisserman Oxford University California Institute of.
Sami Romdhani Volker Blanz Thomas Vetter University of Freiburg
Face Alignment with Part-Based Modeling
Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.
Face Alignment at 3000 FPS via Regressing Local Binary Features
Tom-vs-Pete Classifiers and Identity- Preserving Alignment for Face Verification Thomas Berg Peter N. Belhumeur Columbia University 1.
Wangfei Ningbo University A Brief Introduction to Active Appearance Models.
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
AAM based Face Tracking with Temporal Matching and Face Segmentation Dalong Du.
Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.
Contour Based Approaches for Visual Object Recognition Jamie Shotton University of Cambridge Joint work with Roberto Cipolla, Andrew Blake.
The Viola/Jones Face Detector (2001)
Model-Based Organ Segmentation: Recent Methods Jiun-Hung Chen General Exam Paper
Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition.
Principal Component Analysis
A Study of Approaches for Object Recognition
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Face Recognition using PCA (Eigenfaces) and LDA (Fisherfaces)
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
4EyesFace-Realtime face detection, tracking, alignment and recognition Changbo Hu, Rogerio Feris and Matthew Turk.
Presented by Pat Chan Pik Wah 28/04/2005 Qualifying Examination
Face Detection and Recognition
Face Detection using the Viola-Jones Method
Face Alignment Using Cascaded Boosted Regression Active Shape Models
MITRE Corporation is a federally-funded research-and- development corporation that has developed their own facial recognition system, known as MITRE Matcher.
8/16/99 Computer Vision and Modeling. 8/16/99 Principal Components with SVD.
Recognition Part II Ali Farhadi CSE 455.
Face Recognition and Feature Subspaces
Last tuesday, you talked about active shape models Data set of 1,500 hand-labeled faces 20 facial features (eyes, eye brows, nose, mouth, chin) Train 40.
Face Recognition and Feature Subspaces
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.
Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.
Generalized Hough Transform
ECE738 Advanced Image Processing Face Detection IEEE Trans. PAMI, July 1997.
Face Detection Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Geodesic Saliency Using Background Priors
Real-Time Detection, Alignment and Recognition of Human Faces Rogerio Schmidt Feris Changbo Hu Matthew Turk Pattern Recognition Project June 12, 2003.
The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.
Face Alignment at 3000fps via Regressing Local Binary Features CVPR14 Shaoqing Ren, Xudong Cao, Yichen Wei, Jian Sun Presented by Sung Sil Kim.
Face recognition via sparse representation. Breakdown Problem Classical techniques New method based on sparsity Results.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
AAM based Face Tracking with Temporal Matching and Face Segmentation Mingcai Zhou 1 、 Lin Liang 2 、 Jian Sun 2 、 Yangsheng Wang 1 1 Institute of Automation.
COMP24111: Machine Learning Ensemble Models Gavin Brown
Object Recognition by Discriminative Combinations of Line Segments and Ellipses Alex Chia ^˚ Susanto Rahardja ^ Deepu Rajan ˚ Maylor Leung ˚ ^ Institute.
Object Recognition by Parts
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Compositional Human Pose Regression
COMP61011 : Machine Learning Ensemble Models
Model-Based Organ Segmentation: Recent Methods
Unsupervised Face Alignment by Robust Nonrigid Mapping
Object Recognition by Parts
Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”
Object Recognition by Parts
Object Recognition by Parts
Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
RCNN, Fast-RCNN, Faster-RCNN
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Object Recognition by Parts
Object Recognition with Interest Operators
Presentation transcript:

Face Alignment by Explicit Shape Regression Xudong Cao Yichen Wei Fang Wen Jian Sun Visual Computing Group Microsoft Research Asia

Problem: face shape estimation Find semantic facial points 𝑆= 𝑥 𝑖 , 𝑦 𝑖 Crucial for: Recognition Modeling Tracking Animation Editing

training: minutes / testing: milliseconds Desirable properties Robust complex appearance rough initialization Accurate error: || 𝑆 −𝑆|| Efficient occlusion pose lighting expression 𝑆 : ground truth shape training: minutes / testing: milliseconds

All use a parametric (PCA) shape model Previous approaches Active Shape Model (ASM) detect points from local features sensitive to noise Active Appearance Model (AAM) sensitive to initialization fragile to appearance change [Cootes et. al. 1992] [Milborrow et. al. 2008] … [Cootes et. al. 1998] [Matthews et. al. 2004] ... All use a parametric (PCA) shape model

Previous approaches: cont. Boosted regression for face alignment predict model parameters; fast [Saragih et. al. 2007] (AAM) [Sauer et. al. 2011] (AAM) [Cristinacce et. al. 2007] (ASM) Cascaded pose regression [Dollar et. al. 2010] pose indexed feature also use parametric pose model

Parametric shape model is dominant But, it has drawbacks Parameter error ≠ alignment error minimizing parameter error is suboptimal Hard to specify model capacity usually heuristic and fixed, e.g., PCA dim not flexible for an iterative alignment strict initially? flexible finally?

Can we discard a parametric model? Directly estimate shape 𝑆 by regression? Overcome the challenges? high-dimensional output highly non-linear large variations in facial appearance large training data and feature space Still preserve the shape constraint? Yes Yes Yes

Our approach: Explicit Shape Regression Directly estimate shape 𝑆 by regression? boosted (cascade) regression framework minimize || 𝑆 −𝑆|| from coarse to fine Overcome the challenges? two level cascade for better convergence efficient and effective features fast correlation based feature selection Still preserve shape constraint? automatic and adaptive shape constraint Yes Yes Yes

Approach overview t = 0 t = 1 t = 2 … t = 10 … 𝑆 𝑡−1 + 𝑅 𝑡 𝐼, 𝑆 𝑡−1 initialized from face detector … affine transform transform back 𝐼: image 𝑆 𝑡−1 + 𝑅 𝑡 𝐼, 𝑆 𝑡−1 =𝑆 𝑡 Regressor 𝑅 𝑡 updates previous shape 𝑆 𝑡−1 incrementally 𝑅 𝑡 = argmin 𝑅 ∆ 𝑆 −𝑅 𝐼, 𝑆 𝑡−1 , over all training examples ∆ 𝑆 = 𝑆 − 𝑆 𝑡−1 : ground truth shape residual

Regressor learning What’s the structure of 𝑅 𝑡 What are the features? 𝑆 0 𝑆 1 𝑆 𝑡−1 𝑆 𝑡 𝑆 𝑇−1 𝑆 𝑇 𝑅 1 𝑅 𝑡 𝑅 𝑇 …... …... What’s the structure of 𝑅 𝑡 What are the features? How to select features?

Regressor learning What’s the structure of 𝑅 𝑡 What are the features? 𝑆 0 𝑆 1 𝑆 𝑡−1 𝑆 𝑡 𝑆 𝑇−1 𝑆 𝑇 𝑅 1 𝑅 𝑡 𝑅 𝑇 …... …... What’s the structure of 𝑅 𝑡 What are the features? How to select features?

× Two level cascade 𝑟 1 𝑟 𝑘 𝑟 𝐾 too weak 𝑅 𝑡 → slow convergence and poor generalization a simple regressor, e.g., a decision tree 𝑆 0 𝑆 1 𝑆 𝑡−1 𝑆 𝑡 𝑆 𝑇−1 𝑆 𝑇 𝑅 1 𝑅 𝑡 𝑅 𝑇 …... …... 𝑆 𝑡−1 𝑟 1 𝑟 𝑘 𝑟 𝐾 …… ..…. 𝑆 𝑡 two level cascade: stronger 𝑅 𝑡 → rapid convergence

Trade-off between two levels #stages in top level 5000 #stages in bottom level 1 error ( ×10 −2 ) 5.2 100 50 4.5 10 500 3.3 5 1000 6.2 with the fixed number (5,000) of regressor 𝑟 𝑘

Regressor learning What’s the structure of 𝑅 𝑡 What are the features? 𝑆 0 𝑆 1 𝑆 𝑡−1 𝑆 𝑡 𝑆 𝑇−1 𝑆 𝑇 𝑅 1 𝑅 𝑡 𝑅 𝑇 …... …... What’s the structure of 𝑅 𝑡 What are the features? How to select features?

Pixel difference feature Powerful on large training data Extremely fast to compute no need to warp image just transform pixel coord. [Ozuysal et. al. 2010], key point recognition [Dollar et. al. 2010], object pose estimation [Shotton et. al. 2011], body part recognition … 𝐼 𝑙𝑒𝑓𝑡 𝑒𝑦𝑒 ≈ 𝐼 𝑟𝑖𝑔ℎ𝑡 𝑒𝑦𝑒 𝐼 𝑚𝑜𝑢𝑡ℎ ≫ 𝐼 𝑛𝑜𝑠𝑒 𝑡𝑖𝑝

× How to index pixels? Global coordinate (𝑥, 𝑦) in (normalized) image Sensitive to personal variations in face shape

Shape indexed pixels √ Relative to current shape (∆𝑥,∆𝑦, 𝑛𝑒𝑎𝑟𝑒𝑠𝑡 𝑝𝑜𝑖𝑛𝑡) More robust to personal geometry variations

Tree based regressor 𝑟 𝑘 Node split function: 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 > 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 select (𝑓𝑒𝑎𝑡𝑢𝑟𝑒, 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑) to maximize the variance reduction after split 𝐼 𝑥 1 − 𝐼 𝑦 1 > 𝑡 1 ? 𝐼 𝑥 2 − 𝐼 𝑦 2 >𝑡 2 ? 𝐼 𝑥 1 𝐼 𝑥 2 𝐼 𝑦 2 𝐼 𝑦 1 ∆ 𝑆 𝑙𝑒𝑎𝑓 = argmin ∆𝑆 𝑖∈𝑙𝑒𝑎𝑓 | 𝑆 𝑖 −( 𝑆 𝑖 +∆𝑆)| = 𝑖∈𝑙𝑒𝑎𝑓 ( 𝑆 𝑖 − 𝑆 𝑖 ) 𝑙𝑒𝑎𝑓 𝑠𝑖𝑧𝑒 𝑆 𝑖 : ground truth 𝑆 𝑖 : from last step

Non-parametric shape constraint ∆ 𝑆 𝑙𝑒𝑎𝑓 = argmin ∆𝑆 𝑖∈𝑙𝑒𝑎𝑓 | 𝑆 𝑖 −( 𝑆 𝑖 +∆𝑆)| = 𝑖∈𝑙𝑒𝑎𝑓 ( 𝑆 𝑖 − 𝑆 𝑖 ) 𝑙𝑒𝑎𝑓 𝑠𝑖𝑧𝑒 𝑆 𝑡 = 𝑆 0 + 𝑤 𝑖 𝑆 𝑖 𝑆 𝑡+1 = 𝑆 𝑡 + ∆𝑆 All shapes 𝑆 𝑡 are in the linear space of all training shapes 𝑆 𝑖 if initial shape 𝑆 0 is Unlike PCA, it is learned from data automatically coarse-to-fine

Learned coarse-to-fine constraint stage #PCs Apply PCA (keep 95% variance) to all ∆ 𝑆 𝑙𝑒𝑎𝑓 in each first level stage Stage 1 Stage 10 #1 PC #2 PC #3 PC

Regressor learning What’s the structure of 𝑅 𝑡 What are the features? 𝑆 0 𝑆 1 𝑆 𝑡−1 𝑆 𝑡 𝑆 𝑇−1 𝑆 𝑇 𝑅 1 𝑅 𝑡 𝑅 𝑇 …... …... What’s the structure of 𝑅 𝑡 What are the features? How to select features?

Challenges in feature selection Large feature pool: 𝑁 pixels → 𝑁 2 features N = 400 → 160,000 features Random selection: pool accuracy Exhaustive selection: too slow

Correlation based feature selection Discriminative feature is also highly correlated to the regression target correlation computation is fast: 𝑂(𝑁) time For each tree node (with samples in it) Project regression target ∆𝑆 to a random direction Select the feature with highest correlation to the projection Select best threshold to minimize variation after split

More Details Fast correlation computation Training data augmentation 𝑂(𝑁) instead of 𝑂( 𝑁 2 ), 𝑁: number of pixels Training data augmentation introduce sufficient variation in initial shapes Multiple initialization merge multiple results: more robust

Performance Testing is extremely fast pixel access and comparison #points 5 29 87 Training (2000 images) 5 mins 10 mins 21 mins Testing (per image) 0.32 ms 0.91 ms 2.9 ms ≈300+ FPS Testing is extremely fast pixel access and comparison vector addition (SIMD)

Results on challenging web images Comparison to [Belhumeur et. al. 2011] P. Belhumeur, D. Jacobs, D. Kriegman, and N. Kumar. Localizing parts of faces using a concensus of exemplars. In CVPR, 2011. 29 points, LFPW dataset 2000 training images from web the same 300 testing images Comparison to [Liang et. al. 2008] L. Liang, R. Xiao, F. Wen, and J. Sun. Face alignment via component-based discriminative search. In ECCV, 2008. 87 points, LFW dataset the same training (4002) and test (1716) images

Compare with [Belhumeur et. al. 2011] Our method is 2,000+ times faster relative error reduction by our approach point radius: mean error 1 3 2 4 7 5 6 8 9 10 11 12 13 16 15 14 17 19 18 20 22 21 23 25 24 27 26 28 29 better by >10% better by <10% worse

Results of 29 points

Compare with [Liang et. al. 2008] 87 points, many are texture-less Shape constraint is more important Mean error < 5 pixels < 7.5 pixels < 10 pixels Method in [2] 74.7% 93.5% 97.8% Our Method 86.1% 95.2% 98.2% percentage of test images with 𝑒𝑟𝑟𝑜𝑟<𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑

Results of 87 points

Summary Challenges: Our techniques: Non-parametric shape constraint Heuristic and fixed shape model (e.g., PCA) Large variation in face appearance/geometry Large training data and feature space Non-parametric shape constraint Cascaded regression and shape indexed features Correlation based feature selection