Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Taiwan University

Similar presentations


Presentation on theme: "National Taiwan University"— Presentation transcript:

1 National Taiwan University
Deep Neural Networks J.-S. Roger Jang (張智星) MIR Lab, CSIE Dept. National Taiwan University

2 Concept of Modeling Modeling Two steps in modeling x1 xn . . . y y*
Given desired i/o pairs (training set) of the form (x1, ..., xn; y), construct a model to match the i/o pairs Two steps in modeling Structure identification: input selection, model complexity Parameter identification: optimal parameters x1 xn . . . Unknown target system y Model y*

3 Neural Networks Supervised Learning Unsupervised Learning Others
Multilayer perceptrons Radial basis function networks Modular neural networks LVQ (learning vector quantization) Unsupervised Learning Competitive learning networks Kohonen self-organizing networks ART (adaptive resonant theory) Others Hopfield networks

4 Single-layer Perceptrons
Proposed by Widrow & Hoff in 1960 AKA ADALINE (Adaptive Linear Neuron) or single-layer perceptron Training data x1 w0 w1 y w2 x2 (voice freq.) x2 Quiz! x1 (hair length)

5 Multilayer Perceptrons (MLPs)
Extension of SLP to MLP to have complex decision boundaries How to train MLPs? Use logistic function to replace signum function Use gradient descent for updating parameters x1 y1 x2 y2

6 Continuous Activation Functions
In order to use gradient descent, we need to replace the signum function by its continuous versions Logistic Hyper-tangent Identity y = 1/(1+exp(-x)) y = tanh(x/2) y = x

7 Classical MLPs Typical 2-layer MLPs: Learning rule x1 y1 x2 y2
Gradient descent (Backpropagation) Conjugate gradient method All optim. methods using first derivative Derivative-free optim. x1 y1 x2 y2

8 MLP Examples XOR problem Training data Network Arch. x1 x2 y x1 y x2
x1 y x2 x2 x1 y x2 x1

9 MLP Decision Boundaries
Single-layer: Half planes Exclusive-OR problem Meshed regions Most general regions A B A B B A

10 MLP Decision Boundaries
Two-layer: Convex regions Exclusive-OR problem Meshed regions Most general regions A B A B B A

11 MLP Decision Boundaries
Three-layer: Arbitrary regions Exclusive-OR problem Meshed regions Most general regions A B A B B A

12 Summary: MLP Decision Boundaries
XOR Interwined General 1-layer: Half planes A B A B B A 2-layer: Convex A B A B B A 3-layer: Arbitrary A B A B B A

13 MLP Configurations

14 Deep Neural Networks

15 Training an MLP Methods for training MLP
Gradient descent Gauss-Newton method Levenberg-Marquart method Backpropagation: A systematic way to compute gradients, starting from the NN’s output

16 Simple Derivatives Review of chain rule Network representation x y
f(x) x y

17 Chain Rule for Composite Functions
Review of chain rule Network representation f(x) y g(x) x z

18 Chain Rule for Network Representation
Review of chain rule Network representation y f(x) u h(y,z) x g(x) z

19 Backpropagation Backpropagation Example
A way to systematic computing the gradient Compute the gradient from output toward input Example

20 Use of Mini-batch in Gradient Descent
Goal: To speed up the training with large dataset Approach: Update by mini-batch instead of epoch If dataset size is 1000 Batch size = 10  100 updates in an epoch Batch size = 100  10 updates in an epoch Epoch: Process of going through all data Update by epoch Update by mini-batch Slow! Faster!

21 Use of Momentum Term in Gradient Descent
Purpose of using momentum term Avoid oscillations in gradient descent (banana function!) Escape from local minima Formula Original Updated Contours of banana function Momentum term

22 Optimizer in Keras Choices of optimization methods in Keras
SDG: Stochastic gradient descent Adagrad: Adaptive learning rate RMSprop: Similar to Adagrad Adam: Similar to RMSprop + momentum Nadam: Adam + Nesterov momentum

23 Loss Functions for Regression

24 Loss Functions for Classification

25 Activation Functions

26 Learning Rate Selection

27 Exercises Express the derivative of y=f(x) in terms of y:
Derive the derivative of tanh(x/2) in terms of sigmoid(x) Express tanh(x/2) in terms of sigmoid(x). Given y=sigmoid(x) and y’=(1+y)(1-y), find the derivative of tanh(x/2).


Download ppt "National Taiwan University"

Similar presentations


Ads by Google