Regularizing Face Verification Nets To Discrete-Valued Pain Regression

Slides:

Advertisements

Similar presentations

Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)

Advertisements

Advanced topics.

PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang1;2, Manohar Paluri1, Marc’Aurelio Ranzato1, Trevor Darrell2, Lubomir Bourdev1 1: Facebook.

Classification and Prediction: Regression Analysis

Detection, Segmentation and Fine-grained Localization

Deep face recognition Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman.

Fully Convolutional Networks for Semantic Segmentation

Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.

Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.

Unsupervised Visual Representation Learning by Context Prediction

Cascade Region Regression for Robust Object Detection

Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.

PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2 Manohar Paluri 1 Marć Aurelio Ranzato 1 Trevor Darrell 2 Lumbomir Boudev 1 1 Facebook.

Deep Residual Learning for Image Recognition

Gaussian Conditional Random Field Network for Semantic Segmentation

Hierarchical Motion Evolution for Action Recognition Authors: Hongsong Wang, Wei Wang, Liang Wang Center for Research on Intelligent Perception and Computing,

Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Recent developments in object detection

Big data classification using neural network

Deep Residual Learning for Image Recognition

CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.

A Discriminative Feature Learning Approach for Deep Face Recognition

Deeply learned face representations are sparse, selective, and robust

Object Detection based on Segment Masks

Compact Bilinear Pooling

Data Mining, Neural Network and Genetic Programming

Data Mining, Neural Network and Genetic Programming

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

The Problem: Classification

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

A Neural Approach to Blind Motion Deblurring

Combining CNN with RNN for scene labeling (segmentation)

YOLO9000:Better, Faster, Stronger

Compositional Human Pose Regression

COMP61011 : Machine Learning Ensemble Models

Supervised Training of Deep Networks

Training Techniques for Deep Neural Networks

Deep Residual Learning for Image Recognition

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Object detection.

State-of-the-art face recognition systems

Multiple Organ Detection in CT Volumes using CNN Week 2

Computer Vision James Hays

Introduction to Neural Networks

Face Recognition with Deep Learning Method

Image Classification.

CS 4501: Introduction to Computer Vision Training Neural Networks II

Object Detection + Deep Learning

A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE

Neural Networks Geoff Hulten.

Outline Background Motivation Proposed Model Experimental Results

Deep Cross-media Knowledge Transfer

RCNN, Fast-RCNN, Faster-RCNN

Iterative Crowd Counting

View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.

Neural Network Pipeline CONTACT & ACKNOWLEDGEMENTS

边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University

Heterogeneous convolutional neural networks for visual recognition

Course Recap and What’s Next?

Department of Computer Science Ben-Gurion University of the Negev

Natalie Lang Tomer Malach

Learning and Memorization

End-to-End Facial Alignment and Recognition

Eliminating Background-Bias for Robust Person Re-identification

CVPR2019 Jiahe Li SiamRPN introduces the region proposal network after the Siamese network and performs joint classification and regression.

Presentation transcript:

Regularizing Face Verification Nets To Discrete-Valued Pain Regression Feng Wang, Xiang Xiang, Chang Liu, Trac D. Tran, Austin Reiter, Gregory D. Hager, Harry Quon, Jian Cheng, Alan L. Yuille Johns Hopkins University & University of Electronic Science and Technology of China

Monitoring pain through cameras Patients are typically not monitored in the waiting room. - Deteriorations happen unnoticed. Pain is a sign of deterioration. e.g., In US 1.3 million deteriorations in waiting rooms per year. 1. It simplifies pain reporting process; reduce the strain on manual efforts. 2. It standardizes the feedback mechanism by ensuring a single automated metric that performs all assessments and thus reduces bias

Limited data with labeled pain intensities 1. It needs expertise to label the pain and per frame labeling is tedious. 2. Easily overfitting small dataset and not generalizing well to real data. 3. Label is discrete and majority is 0.

CNNs need large dataset CNNs have incredible power of image representation. CNNs need lots of data to train the huge number of parameters. Transfer learning is a good technique for small datasets, e.g. ImageNet (1M)->PASCAL VOC (10K) for image detection. transfer Yosinski J, Clune J, Bengio Y, et al. How transferable are features in deep neural networks? NIPS 2014 Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014

Datasets for Pain Regression is small UNBC Shoulder-Pain: 200 videos, each has ~200 frames. Video frames are redundant.

Datasets for Face Recognition is big CASIA-Webface: 10k identities, 0.5M images. VGG Face: 2,622 identities, 2.6M images. MS-Celeb-1M: 1M identities, 8M images. Available for Deeper CNNs In this work, we use a 28 layer ResNet. Wen Y, Zhang K, Li Z, et al. A discriminative feature learning approach for deep face recognition. ECCV 2016

Transfer from Face Recognition Net to Pain Regression Net Original model: a 28-layer ResNet, trained on 0.5M images with identity labels, with recognition loss, regularized by center loss, get ~99% on LFW 6,000 pairs protocol. Transfer to: UNBC Shoulder-Pain Dataset, trained on 10K images with pain level labels, with regression loss regularized by center loss.

Center loss for face verification Proposed in ECCV 2016. Used as regularization on Softmax loss for verification tasks. Smaller intra-class variance  better generalization. Wen Y, Zhang K, Li Z, et al. A discriminative feature learning approach for deep face recognition. ECCV 2016

Center loss for discrete label regression Training on small datasets usually needs stronger regularization. Generalization is also needed by regression tasks. Center loss is also good at regression. Best for discrete label.

Other experimental details Face images are aligned to be upright. Less neurons on newly added fully-connected layer (less overfitting). 50% dropout is applied before the fully-connected layer (less overfitting). Output is truncated by 5 * sigmoid function to fit the pain level 0-5. Smooth L1 loss [1] is used instead of L2 loss (numerical stable). Here we use t=1. [1] Girshick R. Fast R-CNN. ICCV 2015.

Experimental results MAE: Mean Absolute Error. MSE: Mean Square Error. PCC: Pearson Correlation Coefficient. Interesting result: All zero prediction already surpass several state-of-the-art model!

Imbalance problem 91%+ images are labeled as no pain. Its safe for models to predict more zeros. From MAE and MSE, we can’t distinguish if the model fits better or just predicts more zeros.

New evaluation metrics We propose wMAE and wMSE to fairly evaluate the performance. “w” means weighted. MAE and MSE are weighted by the inverse frequency of each pain level.

Data sampling Remove redundant images. - Only train images when pain level changes. Uniform sampling. - Sample one image from each pain level in sequence. i.e. 0 1 2 3 4 5 0 1 2 3 4 5 ……

Summary Contribution: 1. Transfer from face recognition model. 2. Center loss for regression tasks. 3. New metrics to fairly evaluation the performance. Future work: 1. Labels are expensive -> semi-supervised learning. 2. Modeling the temporal information.

Thank you!