Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis Speaker: ZHAO Jian (zhaojian90@u.nus.edu)

Slides:

Advertisements

Similar presentations

A brief review of non-neural-network approaches to deep learning

Advertisements

Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.

Sparse vs. Ensemble Approaches to Supervised Learning

Human tracking and counting using the KINECT range sensor based on Adaboost and Kalman Filter ISVC 2013.

Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3

Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie.

Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.

Learning to Compare Image Patches via Convolutional Neural Networks SERGEY ZAGORUYKO & NIKOS KOMODAKIS.

Michael Xie, Neal Jean, Stefano Ermon

Learning to Compare Image Patches via Convolutional Neural Networks

What Convnets Make for Image Captioning?

Summary of “Efficient Deep Learning for Stereo Matching”

Deep Neural Net Scenery Generation

Deep Learning Amin Sobhani.

Automatic Lung Cancer Diagnosis from CT Scans (Week 2)

Face Detection EE368 Final Project Group 14 Ping Hsin Lee

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

A Neural Approach to Blind Motion Deblurring

Article Review Todd Hricik.

Registration of Pathological Images

Tracking Objects with Dynamics

Depth estimation and Plane detection

کاربرد نگاشت با حفظ تنکی در شناسایی چهره

Fast Preprocessing for Robust Face Sketch Synthesis

Deep learning and applications to Natural language processing

Efficient Deep Model for Monocular Road Segmentation

Incremental Boosting Incremental Learning of Boosted Face Detector ICCV 2007 Unsupervised Incremental Learning for Improved Object Detection in a Video.

CS6890 Deep Learning Weizhen Cai

Presenter: Hajar Emami

Adversarially Tuned Scene Generation

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601.

By: Kevin Yu Ph.D. in Computer Engineering

Presenter: Usman Sajid

Deep Learning Hierarchical Representations for Image Steganalysis

Image recognition: Defense adversarial attacks

KFC: Keypoints, Features and Correspondences

Image to Image Translation using GANs

Neural Networks Geoff Hulten.

On Convolutional Neural Network

Neural Speech Synthesis with Transformer Network

Deep Robust Unsupervised Multi-Modal Network

Tuning CNN: Tips & Tricks

Lip movement Synthesis from Text

Iterative Crowd Counting

View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.

Graph Neural Networks Amog Kamsetty January 30, 2019.

CSSE463: Image Recognition Day 18

Dor Granat, Ran yeheskel ADvisor: matan sela

Machine learning overview

Advances in Deep Audio and Audio-Visual Processing

Gradient Domain Salience-preserving Color-to-gray Conversion

TPGAN overview.

Department of Computer Science Ben-Gurion University of the Negev

Chuan Wang1, Haibin Huang1, Xiaoguang Han2, Jue Wang1

The Updated experiment based on LSTM

Introduction to Neural Networks

Angel A. Cantu, Nami Akazawa Department of Computer Science

Task Produce artistic portrait drawings.. Task Produce artistic portrait drawings.

Report 7 Brandon Silva.

GhostLink: Latent Network Inference for Influence-aware Recommendation

End-to-End Speech-Driven Facial Animation with Temporal GANs

Point Set Representation for Object Detection and Beyond

Week 5 Cecilia La Place.

CRCV REU 2019 Aaron Honculada.

Directional Occlusion with Neural Network

CVPR 2019 Poster.

Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision.

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Deep CNN for breast cancer histology Image Analysis

Presentation transcript:

Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis Speaker: ZHAO Jian (zhaojian90@u.nus.edu) Homepage: https://zhaoj9014.github.io/ Affiliation: Learning and Vision Group, ECE, NUS

S 90° GT S 90° GT 90° S 75° S 45° S

𝐿 𝑡𝑣 = 𝑖,𝑗 ( ( 𝑥 𝑖,𝑗+1 − 𝑥 𝑖,𝑗 ) 2 +( 𝑥 𝑖+1,𝑗 − 𝑥 𝑖,𝑗 ) 2 ) 𝛽 2

Implementation Details LR: 0.0001, BATCH_SIZE =10, INPUT_SIZE = 128*128*3, BETA = 1, ALPHA = 0.001, LAMBDA_1 = 0.3, LAMBDA_2 = 0.001, LAMBDA_3 = 0.003, LAMBDA_4 = 0.0001

90° 75° 60° 45° 30° 15° GT

Quantitative Results

Underlying Problems for Re-implementation The “Template Landmark Location” for patch position aggregation is not given. How to fuse the feature maps of the 4 Landmark Located Patch Networks then (estimate the template with the frontal GT)? What if we cannot detect the 4 patches (left eye <-> right eye)? The network architecture for the discriminator is not given, which might be as same as the encoder of the global network of the generator. Training details are not given. Re-implementation is tricky, time-consuming, and cannot promise to work properly. **The weight for the pixel loss is too large, while the weights for other losses are too small. Thus, other losses seem to act as gimmick in this paper, which is not reasonable for GAN-based framework, and leading to severe overfitting problem. But we can have some improvements (e.g., acceleration, adaptive aggregation of losses, Siamese structure, learning to learn, more training data) and have a try. Q & A

Re-implementation results (preliminary) Dataset: Multi-PIE (4324 img, 250 sub, 0-90°) LR: 0.0001, BATCH_SIZE =10, INPUT_SIZE = 128*128*3, BETA = 1, ALPHA = 0.001, LAMBDA_1 = 0.3, LAMBDA_2 = 0.001, LAMBDA_3 = 0.003, LAMBDA_4 = 0.0001 Iterations: ~100k Training time: ~1d; Testing time: 20ms / img

Re-implementation results (preliminary) Problems: poor generalization capacity -> overfitting? / sub-optimal hyper parameters ? / unreasonable weights for losses? w/o pixel loss

Re-implementation results (preliminary) Problems: poor generalization capacity -> overfitting? / sub-optimal hyper parameters ? / unreasonable weights for losses? TP-GAN on other data

Present results

Possible solutions Incorporate all available Multi-PIE data for training TP-GAN. Improve and modify the pixel loss (l1 loss), since this supervision signal is too strong, leading the network overfitting quickly. In original version of TP-GAN, the discriminator seems not contribute too much to the optimization of the generator, which means that the authors are using the pixel loss (with large weight of 1.0) to make the generator memorize each frontal ground truth! Improvement on generator: Domain-Adversarial Training -> global generator & 4 local patch generator.

Possible solutions Improvement on discriminator: adopt Siamese-like discriminator and inject the dynamic convolution to capture more information with domain transfer learning (use pre-trained LightCNN to predict dynamic kernel weights of the discriminator to replace the ip loss) to optimize the generator for photorealistic frontal face synthesis. Tune relevant parameters.

Thank you!