Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis Speaker: ZHAO Jian (zhaojian90@u.nus.edu)

Slides:



Advertisements
Similar presentations
A brief review of non-neural-network approaches to deep learning
Advertisements

Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.
Sparse vs. Ensemble Approaches to Supervised Learning
Human tracking and counting using the KINECT range sensor based on Adaboost and Kalman Filter ISVC 2013.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie.
Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.
Learning to Compare Image Patches via Convolutional Neural Networks SERGEY ZAGORUYKO & NIKOS KOMODAKIS.
Michael Xie, Neal Jean, Stefano Ermon
Learning to Compare Image Patches via Convolutional Neural Networks
What Convnets Make for Image Captioning?
Summary of “Efficient Deep Learning for Stereo Matching”
Deep Neural Net Scenery Generation
Deep Learning Amin Sobhani.
Automatic Lung Cancer Diagnosis from CT Scans (Week 2)
Face Detection EE368 Final Project Group 14 Ping Hsin Lee
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
A Neural Approach to Blind Motion Deblurring
Article Review Todd Hricik.
Registration of Pathological Images
Tracking Objects with Dynamics
Depth estimation and Plane detection
کاربرد نگاشت با حفظ تنکی در شناسایی چهره
Fast Preprocessing for Robust Face Sketch Synthesis
Deep learning and applications to Natural language processing
Efficient Deep Model for Monocular Road Segmentation
Incremental Boosting Incremental Learning of Boosted Face Detector ICCV 2007 Unsupervised Incremental Learning for Improved Object Detection in a Video.
CS6890 Deep Learning Weizhen Cai
Presenter: Hajar Emami
Adversarially Tuned Scene Generation
Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601.
By: Kevin Yu Ph.D. in Computer Engineering
Presenter: Usman Sajid
Deep Learning Hierarchical Representations for Image Steganalysis
Image recognition: Defense adversarial attacks
KFC: Keypoints, Features and Correspondences
Image to Image Translation using GANs
Neural Networks Geoff Hulten.
On Convolutional Neural Network
Neural Speech Synthesis with Transformer Network
Deep Robust Unsupervised Multi-Modal Network
Tuning CNN: Tips & Tricks
Lip movement Synthesis from Text
Iterative Crowd Counting
View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.
Graph Neural Networks Amog Kamsetty January 30, 2019.
CSSE463: Image Recognition Day 18
Dor Granat, Ran yeheskel ADvisor: matan sela
Machine learning overview
Advances in Deep Audio and Audio-Visual Processing
Gradient Domain Salience-preserving Color-to-gray Conversion
TPGAN overview.
Department of Computer Science Ben-Gurion University of the Negev
Chuan Wang1, Haibin Huang1, Xiaoguang Han2, Jue Wang1
The Updated experiment based on LSTM
Introduction to Neural Networks
Angel A. Cantu, Nami Akazawa Department of Computer Science
Task Produce artistic portrait drawings.. Task Produce artistic portrait drawings.
Report 7 Brandon Silva.
GhostLink: Latent Network Inference for Influence-aware Recommendation
End-to-End Speech-Driven Facial Animation with Temporal GANs
Point Set Representation for Object Detection and Beyond
Week 5 Cecilia La Place.
CRCV REU 2019 Aaron Honculada.
Directional Occlusion with Neural Network
CVPR 2019 Poster.
Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision.
Shengcong Chen, Changxing Ding, Minfeng Liu 2018
Deep CNN for breast cancer histology Image Analysis
Presentation transcript:

Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis Speaker: ZHAO Jian (zhaojian90@u.nus.edu) Homepage: https://zhaoj9014.github.io/ Affiliation: Learning and Vision Group, ECE, NUS

S 90° GT S 90° GT 90° S 75° S 45° S

𝐿 𝑡𝑣 = 𝑖,𝑗 ( ( 𝑥 𝑖,𝑗+1 − 𝑥 𝑖,𝑗 ) 2 +( 𝑥 𝑖+1,𝑗 − 𝑥 𝑖,𝑗 ) 2 ) 𝛽 2

Implementation Details LR: 0.0001, BATCH_SIZE =10, INPUT_SIZE = 128*128*3, BETA = 1, ALPHA = 0.001, LAMBDA_1 = 0.3, LAMBDA_2 = 0.001, LAMBDA_3 = 0.003, LAMBDA_4 = 0.0001

90° 75° 60° 45° 30° 15° GT

Quantitative Results

Underlying Problems for Re-implementation The “Template Landmark Location” for patch position aggregation is not given. How to fuse the feature maps of the 4 Landmark Located Patch Networks then (estimate the template with the frontal GT)? What if we cannot detect the 4 patches (left eye <-> right eye)? The network architecture for the discriminator is not given, which might be as same as the encoder of the global network of the generator. Training details are not given. Re-implementation is tricky, time-consuming, and cannot promise to work properly. **The weight for the pixel loss is too large, while the weights for other losses are too small. Thus, other losses seem to act as gimmick in this paper, which is not reasonable for GAN-based framework, and leading to severe overfitting problem. But we can have some improvements (e.g., acceleration, adaptive aggregation of losses, Siamese structure, learning to learn, more training data) and have a try. Q & A

Re-implementation results (preliminary) Dataset: Multi-PIE (4324 img, 250 sub, 0-90°) LR: 0.0001, BATCH_SIZE =10, INPUT_SIZE = 128*128*3, BETA = 1, ALPHA = 0.001, LAMBDA_1 = 0.3, LAMBDA_2 = 0.001, LAMBDA_3 = 0.003, LAMBDA_4 = 0.0001 Iterations: ~100k Training time: ~1d; Testing time: 20ms / img

Re-implementation results (preliminary) Problems: poor generalization capacity -> overfitting? / sub-optimal hyper parameters ? / unreasonable weights for losses? w/o pixel loss

Re-implementation results (preliminary) Problems: poor generalization capacity -> overfitting? / sub-optimal hyper parameters ? / unreasonable weights for losses? TP-GAN on other data

Present results

Possible solutions Incorporate all available Multi-PIE data for training TP-GAN. Improve and modify the pixel loss (l1 loss), since this supervision signal is too strong, leading the network overfitting quickly. In original version of TP-GAN, the discriminator seems not contribute too much to the optimization of the generator, which means that the authors are using the pixel loss (with large weight of 1.0) to make the generator memorize each frontal ground truth! Improvement on generator: Domain-Adversarial Training -> global generator & 4 local patch generator.

Possible solutions Improvement on discriminator: adopt Siamese-like discriminator and inject the dynamic convolution to capture more information with domain transfer learning (use pre-trained LightCNN to predict dynamic kernel weights of the discriminator to replace the ip loss) to optimize the generator for photorealistic frontal face synthesis. Tune relevant parameters.

Thank you!