Face Alignment at 3000fps via Regressing Local Binary Features CVPR14 Shaoqing Ren, Xudong Cao, Yichen Wei, Jian Sun Presented by Sung Sil Kim
What is face alignment?
Challenges
Traditional Approaches Active Shape Model (ASM) Continuously track features that form shape of an object Sensitive to noise, sample(person) variation Active Appearance Model (AAM) Sensitive to initialization Highly discriminative feature needed (fragile to appearance change) Rough estimation Regression-based method
Cascade Shape Regression Framework
Issues with cascade shape regression 1.Practical issue Using entire face region as training input -> extremely large feature pool -> unaffordable training cost 2.Generalization issue Large feature pool has many noisy features ->cause overfitting ->hurt performance in testing
Approach overview Tree induced local binary features Learned from data pixel difference vs. SIFT on landmarks Efficient training/testing 3000 FPS on desktop
Approach Overveiw
Demo
The “locality” principle 1.Most discriminative texture information lies in local region around the estimated landmark from the previous stage 2.Shape context (location of other landmarks) and local texture of this landmark provide sufficient information So 1) learn a forest for each landmark independently and 2) only consider the pixel features in the local region of a landmark
Cascade Shape Regression Framework Linear regression matrix Feature mapping function Shape increment Input image Shape from previous stage
Approach: 2 step learning process
Experiment Dataset: LFPW, Helen, 300-W Comparison: Explicit Shape Regression X. Cao, Y. Wei, F. Wen, and J. Sun. Face alignment by explicit shape regression. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.) Supervised Descent Method X. Xiong and F. De la Torre. Supervised descent method and its applications to face alignment. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013.
Result Comparison on accuracy achieves significant error reduction with respect to ESR and SDM of 30%
Result Comparison on speed
Result Validation of proposed approach Learning binary features over the local region (not global shape) reduces error by 25%
Result Tree Induced Binary Features Locally learned high dimensional binary features outperforms simple 2D offset vectors of each landmark for global regression
Conclusion State-of-the-art face alignment Faster and more accurate on challenging benchmarks Opens new opportunities for face application in mobile devices
Insights SmartMirror: face augmentation/makeup simulation Requires both accuracy and speed