Download presentation
Presentation is loading. Please wait.
Published bySuhendra Atmadjaja Modified over 5 years ago
1
Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis Speaker: ZHAO Jian Homepage: Affiliation: Learning and Vision Group, ECE, NUS
2
S 90° GT S 90° GT 90° S 75° S 45° S
3
𝐿 𝑡𝑣 = 𝑖,𝑗 ( ( 𝑥 𝑖,𝑗+1 − 𝑥 𝑖,𝑗 ) 2 +( 𝑥 𝑖+1,𝑗 − 𝑥 𝑖,𝑗 ) 2 ) 𝛽 2
4
Implementation Details
LR: , BATCH_SIZE =10, INPUT_SIZE = 128*128*3, BETA = 1, ALPHA = 0.001, LAMBDA_1 = 0.3, LAMBDA_2 = 0.001, LAMBDA_3 = 0.003, LAMBDA_4 =
5
90° 75° 60° 45° 30° 15° GT
7
Quantitative Results
8
Underlying Problems for Re-implementation
The “Template Landmark Location” for patch position aggregation is not given. How to fuse the feature maps of the 4 Landmark Located Patch Networks then (estimate the template with the frontal GT)? What if we cannot detect the 4 patches (left eye <-> right eye)? The network architecture for the discriminator is not given, which might be as same as the encoder of the global network of the generator. Training details are not given. Re-implementation is tricky, time-consuming, and cannot promise to work properly. **The weight for the pixel loss is too large, while the weights for other losses are too small. Thus, other losses seem to act as gimmick in this paper, which is not reasonable for GAN-based framework, and leading to severe overfitting problem. But we can have some improvements (e.g., acceleration, adaptive aggregation of losses, Siamese structure, learning to learn, more training data) and have a try. Q & A
9
Re-implementation results (preliminary)
Dataset: Multi-PIE (4324 img, 250 sub, 0-90°) LR: , BATCH_SIZE =10, INPUT_SIZE = 128*128*3, BETA = 1, ALPHA = 0.001, LAMBDA_1 = 0.3, LAMBDA_2 = 0.001, LAMBDA_3 = 0.003, LAMBDA_4 = Iterations: ~100k Training time: ~1d; Testing time: 20ms / img
10
Re-implementation results (preliminary)
Problems: poor generalization capacity -> overfitting? / sub-optimal hyper parameters ? / unreasonable weights for losses? w/o pixel loss
11
Re-implementation results (preliminary)
Problems: poor generalization capacity -> overfitting? / sub-optimal hyper parameters ? / unreasonable weights for losses? TP-GAN on other data
12
Present results
13
Possible solutions Incorporate all available Multi-PIE data for training TP-GAN. Improve and modify the pixel loss (l1 loss), since this supervision signal is too strong, leading the network overfitting quickly. In original version of TP-GAN, the discriminator seems not contribute too much to the optimization of the generator, which means that the authors are using the pixel loss (with large weight of 1.0) to make the generator memorize each frontal ground truth! Improvement on generator: Domain-Adversarial Training -> global generator & 4 local patch generator.
14
Possible solutions Improvement on discriminator: adopt Siamese-like discriminator and inject the dynamic convolution to capture more information with domain transfer learning (use pre-trained LightCNN to predict dynamic kernel weights of the discriminator to replace the ip loss) to optimize the generator for photorealistic frontal face synthesis. Tune relevant parameters.
15
Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.