Generative Adversarial Network (GAN)

Slides:



Advertisements
Similar presentations
Advanced topics.
Advertisements

Supervised Learning Recap
Chapter 4: Linear Models for Classification
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
Visual Recognition Tutorial
Machine Learning CMPT 726 Simon Fraser University
Optimal Adaptation for Statistical Classifiers Xiao Li.
Visual Recognition Tutorial
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Linear Models for Classification
Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
Logistic Regression William Cohen.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
RECONSTRUCTION OF MULTI- SPECTRAL IMAGES USING MAP Gaurav.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Conditional Generative Adversarial Networks
Improving Generative Adversarial Network
Machine Learning – Classification David Fenyő
Learning Deep Generative Models by Ruslan Salakhutdinov
Deep Feedforward Networks
Lecture 8. Generative Adversarial Network
Environment Generation with GANs
Deep Neural Net Scenery Generation
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Classification: Logistic Regression
Adversarial Learning for Neural Dialogue Generation
Restricted Boltzmann Machines for Classification
Generative Adversarial Networks
CSCI 5922 Neural Networks and Deep Learning Generative Adversarial Networks Mike Mozer Department of Computer Science and Institute of Cognitive Science.
Multimodal Learning with Deep Boltzmann Machines
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Machine Learning Basics
Generative adversarial networks (GANs) for edge detection
Overview of Supervised Learning
Structure learning with deep autoencoders
Authors: Jun-Yan Zhu*, Taesun Park*, Phillip Isola, Alexei A. Efros
CS 4/527: Artificial Intelligence
Probabilistic Models for Linear Regression
Low Dose CT Image Denoising Using WGAN and Perceptual Loss
Statistical Learning Dong Liu Dept. EEIS, USTC.
INF 5860 Machine learning for image classification
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
Image to Image Translation using GANs
GAN Applications.
Deep Learning for Non-Linear Control
Lip movement Synthesis from Text
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Model generalization Brief summary of methods
Autoencoders hi shea autoencoders Sys-AI.
Machine learning overview
EM Algorithm and its Applications
Ch 14. Generative adversarial networks (GANs) for edge detection
Introduction to Neural Networks
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
CSC 578 Neural Networks and Deep Learning
Generative adversarial networks (GANs)
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Generative Adversarial Network (GAN) http://www.inference.vc/my-summary-of-adversarial-training-nips-workshop/ I should follow this https://epsilon-lee.github.io/blog/GANs-Basics.html F-GAN W-GAN x 2 LS-GAN Loss sentisive EB-GAN Restricted Boltzmann Machine: http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/RBM%20(v2).ecm.mp4/index.html Gibbs Sampling: http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/MRF%20(v2).ecm.mp4/index.html Outlook:

NIPS 2016 Tutorial: Generative Adversarial Networks Author: Ian Goodfellow Paper: https://arxiv.org/abs/1701.00160 Video: https://channel9.msdn.com/Events/Neural- Information-Processing-Systems- Conference/Neural-Information-Processing- Systems-Conference-NIPS-2016/Generative- Adversarial-Networks https://github.com/zhangqianhui/AdversarialNetsPapers You can find tips for training GAN here: https://github.com/soumith/ganhacks

Review

Generation Writing Poems? Drawing? http://www.rb139.com/index.php?s=/Lot/44547 Generation Writing Poems? 就像教小朋友畫圖 Drawing?

Review: Auto-encoder Image ? As close as possible NN NN Encoder code Decoder code Randomly generate a vector as code NN Decoder code Image ?

Review: Auto-encoder 2D NN Decoder code NN Decoder 1.5 0 -1.5 1.5 −1.5 0 SOM: http://alaric-research.blogspot.tw/2011/02/self-organizing-map.html NN Decoder

Review: Auto-encoder -1.5 1.5 SOM: http://alaric-research.blogspot.tw/2011/02/self-organizing-map.html

Auto-encoder VAE NN Encoder NN Decoder input output code Minimize reconstruction error m1 m2 m3 NN Encoder 𝑐 1 NN Decoder input + 𝑐 2 output 𝜎 1 exp 𝜎 2 𝑐 3 𝜎 3 X 𝑐 𝑖 =𝑒𝑥𝑝 𝜎 𝑖 × 𝑒 𝑖 + 𝑚 𝑖 𝑒 3 𝑒 1 𝑒 2 From a normal distribution Minimize 𝑖=1 3 𝑒𝑥𝑝 𝜎 𝑖 − 1+ 𝜎 𝑖 + 𝑚 𝑖 2 Auto-Encoding Variational Bayes, https://arxiv.org/abs/1312.6114

Problems of VAE It does not really try to simulate real images code NN Decoder Output As close as possible One pixel difference from the target One pixel difference from the target Realistic Fake

The evolution of generation NN Generator v1 NN Generator v2 NN Generator v3 Discri-minator v1 Discri-minator v2 Discri-minator v3 Binary Classifier Real images:

The evolution of generation NN Generator v1 NN Generator v2 NN Generator v3 Discri-minator v1 Discri-minator v2 Discri-minator v3 Binary Classifier Real images:

GAN - Discriminator NN Generator v1 Randomly sample a vector Real images: Something like Decoder in VAE 1 1 1 1 Discri-minator v1 image 1/0 (real or fake)

Randomly sample a vector GAN - Generator NN Generator v1 Updating the parameters of generator v2 The output be classified as “real” (as close to 1 as possible) Generator + Discriminator = a network Discri-minator v1 Using gradient descent to update the parameters in the generator, but fix the discriminator 1.0 0.13

GAN – 二次元人物頭像鍊成 Source of images: https://zhuanlan.zhihu.com/p/24767059 DCGAN: https://github.com/carpedm20/DCGAN-tensorflow

GAN – 二次元人物頭像鍊成 100 rounds

GAN – 二次元人物頭像鍊成 1000 rounds

GAN – 二次元人物頭像鍊成 2000 rounds

GAN – 二次元人物頭像鍊成 5000 rounds

GAN – 二次元人物頭像鍊成 10,000 rounds

GAN – 二次元人物頭像鍊成 20,000 rounds

GAN – 二次元人物頭像鍊成 50,000 rounds

Basic Idea of GAN

Maximum Likelihood Estimation Given a data distribution 𝑃 𝑑𝑎𝑡𝑎 𝑥 We have a distribution 𝑃 𝐺 𝑥;𝜃 parameterized by 𝜃 E.g. 𝑃 𝐺 𝑥;𝜃 is a Gaussian Mixture Model, 𝜃 are means and variances of the Gaussians We want to find 𝜃 such that 𝑃 𝐺 𝑥;𝜃 close to 𝑃 𝑑𝑎𝑡𝑎 𝑥 Sample 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑚 from 𝑃 𝑑𝑎𝑡𝑎 𝑥 We can compute 𝑃 𝐺 𝑥 𝑖 ;𝜃 Likelihood of generating the samples 𝐿= 𝑖=1 𝑚 𝑃 𝐺 𝑥 𝑖 ;𝜃 Find 𝜃 ∗ maximizing the likelihood

Maximum Likelihood Estimation 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑚 from 𝑃 𝑑𝑎𝑡𝑎 𝑥 ≈𝑎𝑟𝑔 max 𝜃 𝐸 𝑥~ 𝑃 𝑑𝑎𝑡𝑎 [𝑙𝑜𝑔 𝑃 𝐺 𝑥;𝜃 ] =𝑎𝑟𝑔 max 𝜃 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔 𝑃 𝐺 𝑥;𝜃 𝑑𝑥 − 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑑𝑥 How to have a very general 𝑃 𝐺 𝑥;𝜃 ? =𝑎𝑟𝑔 min 𝜃 𝐾𝐿 𝑃 𝑑𝑎𝑡𝑎 𝑥 || 𝑃 𝐺 𝑥;𝜃

Now 𝑃 𝐺 𝑥;𝜃 is a NN 𝑃 𝐺 𝑥 = 𝑧 𝑃 𝑝𝑟𝑖𝑜𝑟 𝑧 𝐼 𝐺 𝑧 =𝑥 𝑑𝑧 𝑃 𝐺 𝑥;𝜃 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝐺 𝑥;𝜃 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝐺 𝑧 =𝑥 𝑥 𝑃 𝐺 𝑥 = 𝑧 𝑃 𝑝𝑟𝑖𝑜𝑟 𝑧 𝐼 𝐺 𝑧 =𝑥 𝑑𝑧 It is difficult to compute the likelihood. https://blog.openai.com/generative-models/

Basic Idea of GAN Generator G G is a function, input z, output x Given a prior distribution Pprior(z), a probability distribution PG(x) is defined by function G Discriminator D D is a function, input x, output scalar Evaluate the “difference” between PG(x) and Pdata(x) There is a function V(G,D). Hard to learn by maximum likelihood 𝐺 ∗ =𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺,𝐷

Basic Idea 𝐺 ∗ =𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺,𝐷 𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 Given a generator G, max 𝐷 𝑉 𝐺,𝐷 evaluate the “difference” between 𝑃 𝐺 and 𝑃 𝑑𝑎𝑡𝑎 Pick the G defining 𝑃 𝐺 most similar to 𝑃 𝑑𝑎𝑡𝑎 𝑉 𝐺 1 ,𝐷 𝑉 𝐺 2 ,𝐷 𝑉 𝐺 3 ,𝐷 𝐷 𝐷 𝐷 𝐺 1 𝐺 2 𝐺 3

max 𝐷 𝑉 𝐺,𝐷 𝐺 ∗ =𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺,𝐷 Given G, what is the optimal D* maximizing Given x, the optimal D* maximizing 𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 = 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝐷 𝑥 𝑑𝑥+ 𝑥 𝑃 𝐺 𝑥 𝑙𝑜𝑔 1−𝐷 𝑥 𝑑𝑥 = 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝐷 𝑥 + 𝑃 𝐺 𝑥 𝑙𝑜𝑔 1−𝐷 𝑥 𝑑𝑥 Assume that D(x) can have any value here 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝐷 𝑥 + 𝑃 𝐺 𝑥 𝑙𝑜𝑔 1−𝐷 𝑥

max 𝐷 𝑉 𝐺,𝐷 𝐺 ∗ =𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺,𝐷 Given x, the optimal D* maximizing Find D* maximizing: 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔𝐷 𝑥 + 𝑃 𝐺 𝑥 𝑙𝑜𝑔 1−𝐷 𝑥 a D b D f 𝐷 =a𝑙𝑜𝑔(𝐷)+𝑏𝑙𝑜𝑔 1−𝐷 𝑑f 𝐷 𝑑𝐷 =𝑎× 1 𝐷 +𝑏× 1 1−𝐷 × −1 =0 sigmoid 𝑎× 1 𝐷 ∗ =𝑏× 1 1− 𝐷 ∗ 𝑎× 1− 𝐷 ∗ =𝑏× 𝐷 ∗ 𝑎−𝑎 𝐷 ∗ =𝑏 𝐷 ∗ 𝐷 ∗ 𝑥 = 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 𝐷 ∗ = 𝑎 𝑎+𝑏 0 < < 1

max 𝐷 𝑉 𝐺,𝐷 𝐺 ∗ =𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺,𝐷 𝐷 𝐷 𝐷 𝑉 𝐺 1 ,𝐷 𝑉 𝐺 2 ,𝐷 𝐷 1 ∗ 𝑥 = 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 1 𝑥 𝐷 2 ∗ 𝑥 = 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 2 𝑥 “difference” between 𝑃 𝐺 1 and 𝑃 𝑑𝑎𝑡𝑎 𝑉 𝐺 1 , 𝐷 1 ∗ 𝐷 𝐷 𝐷 𝑉 𝐺 1 ,𝐷 𝑉 𝐺 2 ,𝐷 𝑉 𝐺 3 ,𝐷

max 𝐷 𝑉 𝐺,𝐷 𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 max 𝐷 𝑉 𝐺,𝐷 𝐷 ∗ 𝑥 = 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 max 𝐷 𝑉 𝐺,𝐷 =𝑉 𝐺, 𝐷 ∗ = 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 𝑃 𝐺 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 1 2 = 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 𝑑𝑥 Converge 1 2 2 + 𝑥 𝑃 𝐺 𝑥 𝑙𝑜𝑔 𝑃 𝐺 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 𝑑𝑥 +2𝑙𝑜𝑔 1 2 −2𝑙𝑜𝑔2 2

max 𝐷 𝑉 𝐺,𝐷 𝐷 ∗ 𝑥 = 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 max 𝐷 𝑉 𝐺,𝐷 =𝑉 𝐺, 𝐷 ∗ 𝐷 ∗ 𝑥 = 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 max 𝐷 𝑉 𝐺,𝐷 =𝑉 𝐺, 𝐷 ∗ =−2𝑙𝑜𝑔2+ 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑙𝑜𝑔 𝑃 𝑑𝑎𝑡𝑎 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 /2 𝑑𝑥 + 𝑥 𝑃 𝐺 𝑥 𝑙𝑜𝑔 𝑃 𝐺 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 + 𝑃 𝐺 𝑥 /2 𝑑𝑥 =−2log2+KL P data x || P data x + P G x 2 +KL P G x || P data x + P G x 2 =−2𝑙𝑜𝑔2+2𝐽𝑆𝐷 𝑃 𝑑𝑎𝑡𝑎 𝑥 || 𝑃 𝐺 𝑥 Jensen-Shannon divergence

In the end …… Generator G, Discriminator D Looking for G* such that 𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 Generator G, Discriminator D Looking for G* such that Given G, max 𝐷 𝑉 𝐺,𝐷 What is the optimal G? 𝐺 ∗ =𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺,𝐷 0 < < log 2 =−2𝑙𝑜𝑔2+2𝐽𝑆𝐷 𝑃 𝑑𝑎𝑡𝑎 𝑥 || 𝑃 𝐺 𝑥 𝑃 𝐺 𝑥 = 𝑃 𝑑𝑎𝑡𝑎 𝑥

Algorithm 𝐺 ∗ =𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺,𝐷 𝐿 𝐺 To find the best G minimizing the loss function 𝐿 𝐺 , 𝜃 𝐺 ← 𝜃 𝐺 −𝜂 𝜕𝐿 𝐺 𝜕 𝜃 𝐺 𝜃 𝐺 defines G 𝑑𝑓 𝑥 𝑑𝑥 =? 𝑓 𝑥 =max⁡{ 𝐷 1 𝑥 , 𝐷 2 𝑥 , 𝐷 3 𝑥 } 𝑑 𝐷 𝑖 𝑥 𝑑𝑥 If 𝐷 𝑖 𝑥 is the max one 𝐷 1 𝑥 𝐷 3 𝑥 𝐷 2 𝑥 𝑑 𝐷 1 𝑥 𝑑𝑥 𝑑 𝐷 2 𝑥 𝑑𝑥 𝑑 𝐷 3 𝑥 𝑑𝑥

Algorithm 𝐺 ∗ =𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺,𝐷 𝐿 𝐺 Given 𝐺 0 Find 𝐷 0 ∗ maximizing 𝑉 𝐺 0 ,𝐷 𝜃 𝐺 ← 𝜃 𝐺 −𝜂 𝜕𝑉 𝐺, 𝐷 0 ∗ 𝜕 𝜃 𝐺 Obtain 𝐺 1 Find 𝐷 1 ∗ maximizing 𝑉 𝐺 1 ,𝐷 𝜃 𝐺 ← 𝜃 𝐺 −𝜂 𝜕𝑉 𝐺, 𝐷 1 ∗ 𝜕 𝜃 𝐺 Obtain 𝐺 2 …… 𝑉 𝐺 0 , 𝐷 0 ∗ is the JS divergence between 𝑃 𝑑𝑎𝑡𝑎 𝑥 and 𝑃 𝐺 0 𝑥 Decrease JS divergence(?) 𝑉 𝐺 1 , 𝐷 1 ∗ is the JS divergence between 𝑃 𝑑𝑎𝑡𝑎 𝑥 and 𝑃 𝐺 1 𝑥 Decrease JS divergence(?)

Algorithm 𝐺 ∗ =𝑎𝑟𝑔 min 𝐺 max 𝐷 𝑉 𝐺,𝐷 𝐿 𝐺 Given 𝐺 0 Find 𝐷 0 ∗ maximizing 𝑉 𝐺 0 ,𝐷 𝜃 𝐺 ← 𝜃 𝐺 −𝜂 𝜕𝑉 𝐺, 𝐷 0 ∗ 𝜕 𝜃 𝐺 Obtain 𝐺 1 𝑉 𝐺 0 , 𝐷 0 ∗ is the JS divergence between 𝑃 𝑑𝑎𝑡𝑎 𝑥 and 𝑃 𝐺 0 𝑥 Decrease JS divergence(?) 𝑉 𝐺 0 , 𝐷 0 ∗ smaller 這頁對嗎? 𝑉 𝐺 1 , 𝐷 0 ∗ 𝑉 𝐺 1 , 𝐷 1 ∗ …… Assume 𝐷 0 ∗ ≈ 𝐷 1 ∗ 𝐷 0 ∗ 𝐷 0 ∗ Don’t update G too much 𝑉 𝐺 0 ,𝐷 𝑉 𝐺 1 ,𝐷

In practice … 𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 Given G, how to compute max 𝐷 𝑉 𝐺,𝐷 Sample 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑚 from 𝑃 𝑑𝑎𝑡𝑎 𝑥 , sample 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑚 from generator 𝑃 𝐺 𝑥 𝑉 = 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 + 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔 1−𝐷 𝑥 𝑖 Maximize Binary Classifier Output is D(x) Minimize Cross-entropy Minimize –log D(x) If x is a positive example If x is a negative example Minimize –log(1-D(x))

= Binary Classifier Output is f(x) Minimize Cross-entropy Minimize –log f(x) If x is a positive example If x is a negative example Minimize –log(1-f(x)) D is a binary classifier (can be deep) with parameters 𝜃 𝑑 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑚 from 𝑃 𝑑𝑎𝑡𝑎 𝑥 Positive examples 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑚 from 𝑃 𝐺 𝑥 Negative examples 𝐿= 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 + 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔 1−𝐷 𝑥 𝑖 Minimize = 𝑉 = 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 + 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔 1−𝐷 𝑥 𝑖 Maximize

Algorithm Initialize 𝜃 𝑑 for D and 𝜃 𝑔 for G Can only find lower found of max 𝐷 𝑉 𝐺,𝐷 In each training iteration: Sample m examples 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑚 from data distribution 𝑃 𝑑𝑎𝑡𝑎 𝑥 Sample m noise samples 𝑧 1 , 𝑧 2 ,…, 𝑧 𝑚 from the prior 𝑃 𝑝𝑟𝑖𝑜𝑟 𝑧 Obtaining generated data 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑚 , 𝑥 𝑖 =𝐺 𝑧 𝑖 Update discriminator parameters 𝜃 𝑑 to maximize 𝑉 = 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 + 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔 1−𝐷 𝑥 𝑖 𝜃 𝑑 ← 𝜃 𝑑 +𝜂𝛻 𝑉 𝜃 𝑑 Sample another m noise samples 𝑧 1 , 𝑧 2 ,…, 𝑧 𝑚 from the prior 𝑃 𝑝𝑟𝑖𝑜𝑟 𝑧 Update generator parameters 𝜃 𝑔 to minimize 𝑉 = 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 + 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔 1−𝐷 𝐺 𝑧 𝑖 𝜃 𝑔 ← 𝜃 𝑔 −𝜂𝛻 𝑉 𝜃 𝑔 Learning D Repeat k times 位什麼要 1- 啊 Learning G Only Once

Objective Function for Generator in Real Implementation 𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 −𝑙𝑜𝑔 𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 Slow at the beginning 𝑉= 𝐸 𝑥∼ 𝑃 𝐺 −𝑙𝑜𝑔 𝐷 𝑥 𝐷 𝑥 Real implementation: label x from PG as positive 𝑙𝑜𝑔 1−𝐷 𝑥

Demo The code used in demo from: https://github.com/osh/KerasGAN/blob/master/MNIST_ CNN_GAN_v2.ipynb http://140.112.21.35:6001/notebooks/GAN%20MNIST%20demo.ipynb keras-adversarial https://github.com/bstriner/keras-adversarial https://github.com/ProofByConstruction/better-explanations/blob/master/experiments/generative-adversarial-nets/Simple%20GAN.ipynb Simple example of TF version I cannot run it, probably because of version issue

Issue about Evaluating the Divergence

Evaluating JS divergence Martin Arjovsky, Léon Bottou, Towards Principled Methods for Training Generative Adversarial Networks, 2017, arXiv preprint

Evaluating JS divergence https://arxiv.org/abs/1701.07875 JS divergence estimated by discriminator telling little information Weak Generator Strong Generator

Discriminator 1 𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 ≈ 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 + 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔 1−𝐷 𝑥 𝑖 max 𝐷 𝑉 𝐺,𝐷 =−2𝑙𝑜𝑔2+2𝐽𝑆𝐷 𝑃 𝑑𝑎𝑡𝑎 𝑥 || 𝑃 𝐺 𝑥 = 0 log2 Reason 1. Approximate by sampling Weaken your discriminator? Can weak discriminator compute JS divergence?

Discriminator 1 𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 𝑉= 𝐸 𝑥∼ 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔𝐷 𝑥 + 𝐸 𝑥∼ 𝑃 𝐺 𝑙𝑜𝑔 1−𝐷 𝑥 ≈ 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔𝐷 𝑥 𝑖 + 1 𝑚 𝑖=1 𝑚 𝑙𝑜𝑔 1−𝐷 𝑥 𝑖 max 𝐷 𝑉 𝐺,𝐷 =−2𝑙𝑜𝑔2+2𝐽𝑆𝐷 𝑃 𝑑𝑎𝑡𝑎 𝑥 || 𝑃 𝐺 𝑥 = 0 log2 Reason 2. the nature of data Both 𝑃 𝑑𝑎𝑡𝑎 𝑥 and 𝑃 𝐺 𝑥 are low-dim manifold in high-dim space Usually they do not have any overlap

Evaluation http://www.guokr.com/post/773890/ Better

Evaluation Better 𝑃 𝐺 0 𝑥 𝐽𝑆 𝑃 𝐺 1 || 𝑃 𝑑𝑎𝑡𝑎 =𝑙𝑜𝑔2 𝑃 𝑑𝑎𝑡𝑎 𝑥 …… 𝑃 𝐺 0 𝑥 𝐽𝑆 𝑃 𝐺 1 || 𝑃 𝑑𝑎𝑡𝑎 =𝑙𝑜𝑔2 𝑃 𝑑𝑎𝑡𝑎 𝑥 …… 𝐽𝑆 𝑃 𝐺 2 || 𝑃 𝑑𝑎𝑡𝑎 =𝑙𝑜𝑔2 𝑃 𝐺 50 𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑥 Not really better …… …… 𝑃 𝐺 100 𝑥 𝐽𝑆 𝑃 𝐺 2 || 𝑃 𝑑𝑎𝑡𝑎 =0 𝑃 𝑑𝑎𝑡𝑎 𝑥

Add Noise Add some artificial noise to the inputs of discriminator Make the labels noisy for the discriminator Discriminator cannot perfectly separate real and generated data 𝑃 𝑑𝑎𝑡𝑎 𝑥 and 𝑃 𝐺 𝑥 have some overlap Noises decay over time

Mode Collapse

Generated Distribution Mode Collapse Generated Distribution Data Distribution

𝑃 𝑑𝑎𝑡𝑎 Mode Collapse What we want … In reality …

Flaw in Optimization? 𝐾𝐿= 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔 𝑃 𝑑𝑎𝑡𝑎 𝑃 𝐺 𝑑𝑥 Modified from Ian Goodfellow’s tutorial Flaw in Optimization? 𝐾𝐿= 𝑃 𝑑𝑎𝑡𝑎 𝑙𝑜𝑔 𝑃 𝑑𝑎𝑡𝑎 𝑃 𝐺 𝑑𝑥 Reverse 𝐾𝐿= 𝑃 𝐺 𝑙𝑜𝑔 𝑃 𝐺 𝑃 𝑑𝑎𝑡𝑎 𝑑𝑥 𝑃 𝑑𝑎𝑡𝑎 𝑃 𝑑𝑎𝑡𝑎 𝑃 𝐺 𝑃 𝐺 Maximum likelihood (minimize KL( 𝑃 𝑑𝑎𝑡𝑎 || 𝑃 𝐺 )) Minimize KL( 𝑃 𝐺 || 𝑃 𝑑𝑎𝑡𝑎 ) (reverse KL) This may not be the reason (based on Ian Goodfellow’s tutorial)

So many GANs …… Modifying the Optimization of GAN fGAN WGAN Least-square GAN Loss Sensitive GAN Energy-based GAN Boundary-seeking GAN Unroll GAN …… Different Structure from the Original GAN Conditional GAN Semi-supervised GAN InfoGAN BiGAN Cycle GAN Disco GAN VAE-GAN …… Application: deep convolutional generative adversarial networks (DCGANs)

Conditional GAN

Motivation Generator Text Image Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee, “Generative Adversarial Text-to-Image Synthesis”, ICML 2016 https://github.com/paarthneekhara/text-to-image Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiaogang Wang, Dimitris Metaxas, “StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks”, arXiv prepring, 2016 Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, Honglak Lee, “Learning What and Where to Draw”, NIPS 2016

Motivation Challenge (a point, not a distribution) NN Text Image 𝑐 𝑥 Text: “train” NN output

Conditional GAN Learn to approximate P(x|c) scalar scalar Training data: 𝑐 , 𝑥 Learn to approximate P(x|c) condition G 𝑐 𝑥 Prior distribution 𝑧 Learn to ignore this term … 𝑐 ,𝑥=𝐺( 𝑐 ) classified as positive dropout D (v1) scalar 𝑥 D (v2) scalar 𝑐 𝑥 Can generated x not related to c Positive example: 𝑐 , 𝑥 Negative example: 𝑐 ,𝐺( 𝑐 ) , 𝑐′ , 𝑥

Text to Image - Results

Text to Image - Results "red flower with black center"

Image-to-image Translation Façade門面,正面,外觀,虛設的外表 That is the facade of the Palace. 那是宮殿的正面 Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros, “Image-to-Image Translation with Conditional Adversarial Networks”, arXiv preprint, 2016

Image-to-image Translation - Results

Speech Enhancement GAN https://arxiv.org/abs/1703.09452

Speech Enhancement GAN Using Least-square GAN

Least-square GAN For discriminator D has linear output For Generator D has linear output min 𝐷 1 2 𝐸 𝑥~ 𝑃 𝑑𝑎𝑡𝑎 𝐷 𝑥 −𝑏 2 + 1 2 𝐸 𝑥~ 𝑃 𝐺 𝐷 𝑥 −𝑎 2 1 min 𝐷 1 2 𝐸 𝑧~ 𝑃 𝑑𝑎𝑡𝑎 𝐷 𝐺 𝑧 −𝑐 2 1

Least-square GAN The code used in demo from: https://github.com/osh/KerasGAN/blob/master/MNIST_ CNN_GAN_v2.ipynb http://140.112.21.35:6001/notebooks/GAN%20MNIST%20demo.ipynb keras-adversarial https://github.com/bstriner/keras-adversarial https://github.com/ProofByConstruction/better-explanations/blob/master/experiments/generative-adversarial-nets/Simple%20GAN.ipynb Simple example of TF version I cannot run it, probably because of version issue