Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601.

Slides:



Advertisements
Similar presentations
Neural networks Introduction Fitting neural networks
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Analysis of Contour Motions Ce Liu William T. Freeman Edward H. Adelson Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.
Face Alignment with Part-Based Modeling
Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Self-Organizing Hierarchical Neural Network
Tracking Video Objects in Cluttered Background
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Radial-Basis Function Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.
Multiclass object recognition
Face Alignment Using Cascaded Boosted Regression Active Shape Models
Presented by: Kamakhaya Argulewar Guided by: Prof. Shweta V. Jain
Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab
3D SLAM for Omni-directional Camera
Detecting Pedestrians Using Patterns of Motion and Appearance Paul Viola Microsoft Research Irfan Ullah Dept. of Info. and Comm. Engr. Myongji University.
Inference in generative models of images and video John Winn MSR Cambridge May 2004.
Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.
City Forensics: Using Visual Elements to Predict Non-Visual City Attributes Sean M. Arietta, Alexei A. Efros, Ravi Ramamoorthi, Maneesh Agrawala Presented.
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Big data classification using neural network
Hybrid Deep Learning for Reflectance Confocal Microscopy Skin Images
Learning to Compare Image Patches via Convolutional Neural Networks
Deeply learned face representations are sparse, selective, and robust
Deep Learning Amin Sobhani.
Data Mining, Neural Network and Genetic Programming
Data Mining, Neural Network and Genetic Programming
Data Mining, Neural Network and Genetic Programming
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Article Review Todd Hricik.
Recognizing Deformable Shapes
Tracking Objects with Dynamics
Compositional Human Pose Regression
Ajita Rattani and Reza Derakhshani,
Final Year Project Presentation --- Magic Paint Face
CS6890 Deep Learning Weizhen Cai
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules
State-of-the-art face recognition systems
A New Approach to Track Multiple Vehicles With the Combination of Robust Detection and Two Classifiers Weidong Min , Mengdan Fan, Xiaoguang Guo, and Qing.
A Convolutional Neural Network Cascade For Face Detection
Layer-wise Performance Bottleneck Analysis of Deep Neural Networks
Image Segmentation Techniques
Introduction to Neural Networks
Two-Stream Convolutional Networks for Action Recognition in Videos
Counting in Dense Crowds using Deep Learning
Deep Learning Hierarchical Representations for Image Steganalysis
KFC: Keypoints, Features and Correspondences
Analysis of Contour Motions
Outline Background Motivation Proposed Model Experimental Results
Object Tracking: Comparison of
CNN-based Action Recognition Using Adaptive Multiscale Depth Motion Maps And Stable Joint Distance Maps Junyou He, Hailun Xia, Chunyan Feng, Yunfei Chu.
Introduction to Object Tracking
CSSE463: Image Recognition Day 18
Topological Signatures For Fast Mobility Analysis
Human-object interaction
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Learning and Memorization
Object Detection Implementations
Multi-UAV to UAV Tracking
Week 3 Volodymyr Bobyr.
SFNet: Learning Object-aware Semantic Correspondence
Shengcong Chen, Changxing Ding, Minfeng Liu 2018
Presentation transcript:

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601

Introduction Face ID is complicated by alterations to an individual’s appearance Beard, glasses, sunglasses, wig, hairstyle, hair color, hat, etc. Results in decreased performance Facial keypoints are required to analyze the shape of the face Two main state-of-the-art methods: Use feature extraction algorithm (e.g. Gabor features) with texture-based and shape-based features to detect different facial key-points Use probabilistic graphical models to capture relationship between pixels and features to detect facial key-points DNN used in this way is very challenge because datasets are small Larger training dataset = better performance

Transfer Learning Lack of data means designers have to use transfer learning Transfer Learning is machine learning research problem where knowledge gained from solving a problem is applied to a different but related problem (e.g. knowledge gained identifying cars can be used to identify trucks) Performance might be sufficient but may under-perform because of data insufficiency resulting in inability to fine tune pre-trained DNNs

Contributions of this Paper Disguised Face Identification (DFI) Framework: Use Spatial Fusion Deep Convolutional Network (DCN) to extract 14 key-point (essential to describe facial structure) Extracted points connected to form star-net and orientations of points are used by classification framework for face ID Simple and Complex Face Disguise Datasets: Proposed 2 simple and complex Face Disguise (FG) datasets that can be used by researchers in future to train DCN for facial key-point detection

14 Essential Facial key-points (S. Zhang et al. 2016)

Simple and Complex Face Disguise Datasets Databases for disguise related research have limited disguise variations DCN requires images of people with beard, glasses, different hairstyles, scarf, cap, etc. Propose two Face Disguise datasets of 2000 photos each with Simple and Complex backgrounds and varied illuminations 8 different backgrounds, 25 subjects, 10 different disguises Notice how complex backgrounds = higher % of background in picture as a whole

Convolutional Neural Networks: A Review

Overview of DCN Process 8 convolution layers to extract increasingly specific data End in Loss 1 function (solves regression problems by comparing output with ground truth) 5 spatial fusion layers End in Loss 2 function (solves classification problem by finding mean squared error) Heat Maps generated of 14 key-points and forms star-net structure Classification based on star-net orientation of points

Disguised Face Identification (DIC) Framework Spatial Fusion Convolutional Network predicts and temporally aligns the facial key points of all neighboring frames to a particular frame by warping backwards and forwards in time using tracks from dense optical flow Optical flow is pattern of apparent motion caused by relative motion between observer and a scene Dense optical flow takes into account every pixel while sparse optical flow picks a portion of all the pixels The confidence in the particular frame is strengthened with a set of “expert opinions” ( with corresponding confidences) from frames in the neighborhood, from which the facial key points can be estimated accurately Spatial fusion network more accurate in this respect when compared to other DNNs Points connected to a star-net and used in classification

Facial KeyPoint Detection Regression problem modeled by Spatial Fusion Convolutional network CNN takes an image and outputs pixel coordinates of each key-point Output of last layer is i x j x k dimensional cube (here is 64 x 64 x 14 = 14 key-points) Training objective: estimate network weights lambda (λ) with available training data set D = (x, y) and regressor: Φ() is the activation function (rate of action potential firing inn the neurons) Where the Gaussian function Gi,j,k(yk) is: CNNs aren’t scale/shift invariant so we apply Gaussian distribution to put feature values in a known range Loss 2 function on squared pixel-wise differences between predicted and ground truth heat-map Use MatConvNet to train and validate Fusion Convolutional Network in MATLAB

Facial KeyPoint Detection Cont. Locations (coordinates) produced by networks from last slide are connected into a star network with “angles” used later for classification Nose key point is used as the reference point in determining angles for other points

Disguised Face Classification Compare disguised face to 5 non-disguised faces (including the person in the disguise) Classification is accurate is tau (τ) is the minimum for analysis between disguised image and non-disguised image of the same person Similarity is estimated by computing L1 norm between orientation of different key points (from net structure): τ is similarity, θi is orientation of the ith key point of disguised image, and φi is corresponding angles in the non-disguised image

Experimental Results Split between Simple Background Face Disguise data set and Complex Background Face Disguise data set Individual key point accuracy is presented along with comparison with other architecture Analyze classification performance

Spatial Fusion ConvNet Training Spatial Fusion CNN trained on 1000 images (500 validation images and 500 test images) Network trained for 90 cycles with batch size of 20 248x248 sub-image randomly cropped from every input image, randomly flipped, randomly rotated between -40 and 40 degrees and resized to 256x256 to be passed as input into CNN Variance of Gaussian set to 1.5 Heat-map size is 64x64 Base learning rate is 10^(-5), decreased to 10^(-6) after 20 iterations Momentum is 0.9 Momentum update results in better convergence on deep networks (based on physical perspective of the optimization problem)

Key Point Detection Row 1: disguised images Row 2: key point mapping Row 3: net-star construction

Key-Point Detection Performance Key point deemed correct is located within d pixels from marked key point Accuracy increases as d increases Green: Complex background Red: Simple background

Key-Point Detection Performance Cont. Simple background higher accuracy than complex background Complex has lower performance b/c background clutter interferes with identifying outer region facial key points

Key-Point Performance Analysis with Reference to Background Clutter Background clutter significantly interferes with key point detection performance Background clutter observed by analyzing key-point detection in lips, nose and eye regions

Eye Region Key-Points Detection Relevant key points: P1 – P10 P1, P4, P5, and P10 prominently affected (closest to face border) Accuracy at pixel distance closer to ground-truth is significantly higher for simple vs complex background

Nose Key-Point Detection Performance Nose key-point (P11) is not affected by background clutter Probably because P11 is buffered by surrounding key points

Lips Region Key-Point Detection Performance P12, P13, P14 comprise the lips region P12 and P14 are affected by background clutter while P13 is not P12 and P14 affected because they are closer to face edge than P13

Facial Key-Points Detection: Multiple Persons Use Viola Jones Face Detector to find all faces in the image Use DIC on each face The key-point detection classification performance for each simple and complex datasets: 2 faces in the image are 80% and 50% 3 faces in the image are 76% and 43% Single face: 85% and 56% Decrease in accuracy as number of faces increase

Comparison of KeyPoint Detection Performance with Other Architecture CN = CoordinateNet CNE = CoordinateNet Extended SpatialNet d = 5 from ground-truth In accordance with findings from other architectures, background clutter decreases accuracy

Classification Performance and comparison with the state-of-the-art More heavily disguise = accuracy decrease State-of-the-art is unnamed This paper’s framework outperforms current state- of-the-art

Conclusion Proposed two datasets that can be used to train future disguised face recognition networks Background clutter affects outer region key points Images taken should have the simplest background possible for highest accuracy Disguised Face Identification (DFI) Framework outperforms state-of- the-art by first detecting 14 facial key points and connects them to net-star

References https://arxiv.org/pdf/1708.09317.pdf

Thank you!