Regularizing Face Verification Nets To Discrete-Valued Pain Regression

Slides:



Advertisements
Similar presentations
Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Advertisements

Advanced topics.
PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang1;2, Manohar Paluri1, Marc’Aurelio Ranzato1, Trevor Darrell2, Lubomir Bourdev1 1: Facebook.
Classification and Prediction: Regression Analysis
Detection, Segmentation and Fine-grained Localization
Deep face recognition Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman.
Fully Convolutional Networks for Semantic Segmentation
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Unsupervised Visual Representation Learning by Context Prediction
Cascade Region Regression for Robust Object Detection
Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.
PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2 Manohar Paluri 1 Marć Aurelio Ranzato 1 Trevor Darrell 2 Lumbomir Boudev 1 1 Facebook.
Deep Residual Learning for Image Recognition
Gaussian Conditional Random Field Network for Semantic Segmentation
Hierarchical Motion Evolution for Action Recognition Authors: Hongsong Wang, Wei Wang, Liang Wang Center for Research on Intelligent Perception and Computing,
Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Recent developments in object detection
Big data classification using neural network
Deep Residual Learning for Image Recognition
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
A Discriminative Feature Learning Approach for Deep Face Recognition
Deeply learned face representations are sparse, selective, and robust
Object Detection based on Segment Masks
Compact Bilinear Pooling
Data Mining, Neural Network and Genetic Programming
Data Mining, Neural Network and Genetic Programming
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
The Problem: Classification
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
A Neural Approach to Blind Motion Deblurring
Combining CNN with RNN for scene labeling (segmentation)
YOLO9000:Better, Faster, Stronger
Compositional Human Pose Regression
COMP61011 : Machine Learning Ensemble Models
Supervised Training of Deep Networks
Training Techniques for Deep Neural Networks
Deep Residual Learning for Image Recognition
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Object detection.
State-of-the-art face recognition systems
Multiple Organ Detection in CT Volumes using CNN Week 2
Computer Vision James Hays
Introduction to Neural Networks
Face Recognition with Deep Learning Method
Image Classification.
NormFace:
CS 4501: Introduction to Computer Vision Training Neural Networks II
Object Detection + Deep Learning
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
Neural Networks Geoff Hulten.
Outline Background Motivation Proposed Model Experimental Results
Deep Cross-media Knowledge Transfer
RCNN, Fast-RCNN, Faster-RCNN
Iterative Crowd Counting
View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.
Neural Network Pipeline CONTACT & ACKNOWLEDGEMENTS
边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University
Heterogeneous convolutional neural networks for visual recognition
Course Recap and What’s Next?
Department of Computer Science Ben-Gurion University of the Negev
Natalie Lang Tomer Malach
Learning and Memorization
End-to-End Facial Alignment and Recognition
Eliminating Background-Bias for Robust Person Re-identification
CVPR2019 Jiahe Li SiamRPN introduces the region proposal network after the Siamese network and performs joint classification and regression.
Presentation transcript:

Regularizing Face Verification Nets To Discrete-Valued Pain Regression Feng Wang, Xiang Xiang, Chang Liu, Trac D. Tran, Austin Reiter, Gregory D. Hager, Harry Quon, Jian Cheng, Alan L. Yuille Johns Hopkins University &  University of Electronic Science and Technology of China

Monitoring pain through cameras Patients are typically not monitored in the waiting room. - Deteriorations happen unnoticed. Pain is a sign of deterioration. e.g., In US 1.3 million deteriorations in waiting rooms per year. 1. It simplifies pain reporting process; reduce the strain on manual efforts. 2. It standardizes the feedback mechanism by ensuring a single automated metric that performs all assessments and thus reduces bias

Limited data with labeled pain intensities 1. It needs expertise to label the pain and per frame labeling is tedious. 2. Easily overfitting small dataset and not generalizing well to real data. 3. Label is discrete and majority is 0.

CNNs need large dataset CNNs have incredible power of image representation. CNNs need lots of data to train the huge number of parameters. Transfer learning is a good technique for small datasets, e.g. ImageNet (1M)->PASCAL VOC (10K) for image detection. transfer Yosinski J, Clune J, Bengio Y, et al. How transferable are features in deep neural networks? NIPS 2014 Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014

Datasets for Pain Regression is small UNBC Shoulder-Pain: 200 videos, each has ~200 frames. Video frames are redundant.

Datasets for Face Recognition is big CASIA-Webface: 10k identities, 0.5M images. VGG Face: 2,622 identities, 2.6M images. MS-Celeb-1M: 1M identities, 8M images. Available for Deeper CNNs In this work, we use a 28 layer ResNet. Wen Y, Zhang K, Li Z, et al. A discriminative feature learning approach for deep face recognition. ECCV 2016

Transfer from Face Recognition Net to Pain Regression Net Original model: a 28-layer ResNet, trained on 0.5M images with identity labels, with recognition loss, regularized by center loss, get ~99% on LFW 6,000 pairs protocol. Transfer to: UNBC Shoulder-Pain Dataset, trained on 10K images with pain level labels, with regression loss regularized by center loss.

Center loss for face verification Proposed in ECCV 2016. Used as regularization on Softmax loss for verification tasks. Smaller intra-class variance  better generalization. Wen Y, Zhang K, Li Z, et al. A discriminative feature learning approach for deep face recognition. ECCV 2016

Center loss for discrete label regression Training on small datasets usually needs stronger regularization. Generalization is also needed by regression tasks. Center loss is also good at regression. Best for discrete label.

Other experimental details Face images are aligned to be upright. Less neurons on newly added fully-connected layer (less overfitting). 50% dropout is applied before the fully-connected layer (less overfitting). Output is truncated by 5 * sigmoid function to fit the pain level 0-5. Smooth L1 loss [1] is used instead of L2 loss (numerical stable). Here we use t=1. [1] Girshick R. Fast R-CNN. ICCV 2015.

Experimental results MAE: Mean Absolute Error. MSE: Mean Square Error. PCC: Pearson Correlation Coefficient. Interesting result: All zero prediction already surpass several state-of-the-art model!

Imbalance problem 91%+ images are labeled as no pain. Its safe for models to predict more zeros. From MAE and MSE, we can’t distinguish if the model fits better or just predicts more zeros.

New evaluation metrics We propose wMAE and wMSE to fairly evaluate the performance. “w” means weighted. MAE and MSE are weighted by the inverse frequency of each pain level.

Data sampling Remove redundant images. - Only train images when pain level changes. Uniform sampling. - Sample one image from each pain level in sequence. i.e. 0 1 2 3 4 5 0 1 2 3 4 5 ……

Summary Contribution: 1. Transfer from face recognition model. 2. Center loss for regression tasks. 3. New metrics to fairly evaluation the performance. Future work: 1. Labels are expensive -> semi-supervised learning. 2. Modeling the temporal information.

Thank you!