Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deep Neural Networks (DNN)

Similar presentations


Presentation on theme: "Deep Neural Networks (DNN)"— Presentation transcript:

1 Deep Neural Networks (DNN)
J.-S. Roger Jang (張智星) MIR Lab, CSIE Dept. National Taiwan University 2019/1/16

2 Concept of Modeling Modeling Two steps in modeling x1 xn . . . y y*
Given desired i/o pairs (training set) of the form (x1, ..., xn; y), construct a model to match the i/o pairs Two steps in modeling Structure identification: input selection, model complexity Parameter identification: optimal parameters x1 xn . . . Unknown target system y Model y*

3 Neural Networks Supervised Learning Unsupervised Learning Others
Multilayer perceptrons Radial basis function networks Modular neural networks LVQ (learning vector quantization) Unsupervised Learning Competitive learning networks Kohonen self-organizing networks ART (adaptive resonant theory) Others Hopfield networks

4 Single-layer Perceptrons
Proposed by Widrow & Hoff in 1960 AKA ADALINE (Adaptive Linear Neuron) or single-layer perceptron Training data x1 w0 w1 y w2 x2 (voice freq.) x2 Quiz! x1 (hair length) perceptronDemo.m

5 Multilayer Perceptrons (MLPs)
Extension of SLP to MLP to have complex decision boundaries How to train MLPs? Use sigmoidal function to replace signum function Use gradient descent for updating parameters x1 y1 x2 y2

6 Continuous Activation Functions
In order to use gradient descent, we need to replace the signum function by its continuous versions Sigmoid Hyper-tangent Identity y = 1/(1+exp(-x)) y = tanh(x/2) y = x

7 Activation Functions

8 Classical MLPs Typical 2-layer MLPs: Learning rule x1 y1 x2 y2
Gradient descent (Backpropagation) Conjugate gradient method All optim. methods using first derivative Derivative-free optim. x1 y1 x2 y2

9 MLP Examples XOR problem Training data Network Arch. x1 x2 y x1 y x2
x1 y x2 x2 x1 y x2 x1

10 MLP Decision Boundaries
Single-layer: Half planes Exclusive-OR problem Meshed regions Most general regions A B A B B A

11 MLP Decision Boundaries
Two-layer: Convex regions Exclusive-OR problem Meshed regions Most general regions A B A B B A

12 MLP Decision Boundaries
Three-layer: Arbitrary regions Exclusive-OR problem Meshed regions Most general regions A B A B B A

13 Summary: MLP Decision Boundaries
Quiz! XOR Intertwined General 1-layer: Half planes A B A B B A 2-layer: Convex A B A B B A 3-layer: Arbitrary A B A B B A

14 MLP Configurations

15 Deep Neural Networks

16 Training an MLP Methods for training MLP
Gradient descent Gauss-Newton method Levenberg-Marquart method Backpropagation: A systematic way to compute gradients, starting from the NN’s output

17 Simple Derivatives Review of chain rule Network representation x y

18 Chain Rule for Composite Functions
Review of chain rule Network representation y f(.) g(.) x z

19 Chain Rule for Network Representation
Review of chain rule Network representation y f(.) h(. , .) x u g(,) z

20 Backpropagation in Adaptive Networks (1/3)
A way to compute gradient from output toward input Adaptive network 1 u x 3 o 2 y v

21 Backpropagation in Adaptive Networks (2/3)
1 x u 3 o 2 y v

22 Backpropagation in Adaptive Networks (3/3)
1 P 3 x u 5 o 2 4 y v q You don’t need to !

23 Summary of Backpropagation
General formula for backpropagation, assuming “o” is the network’s final output “a” is a parameter in node 1 Backpropagation! y1 x y2 1 y3 a

24 Backpropagation in NN (1/2)
x1 1 y1 1 o 2 x2 y2

25 Backpropagation in NN (2/2)
1 y1 x1 1 z1 2 1 x2 o y2 2 z2 3 x3 y3

26 Use of Mini-batch in Gradient Descent
Goal: To speed up the training with large dataset Approach: Update by mini-batch instead of epoch If dataset size is 1000 Batch size = 10  100 updates in an epoch  mini batch Batch size = 100  10 updates in an epoch  mini batch Batch size=1000  1 update in an epoch  full batch A process of going through all data Update by epoch Update by mini-batch Slower update! Faster update!

27 Use of Momentum Term in Gradient Descent
Purpose of using momentum term Avoid oscillations in gradient descent (banana function!) Escape from local minima (???) Formula Original: With momentum term: Contours of banana function Momentum term

28 Learning Rate Selection

29 Optimizer in Keras Choices of optimization methods in Keras
SGD: Stochastic gradient descent Adagrad: Adaptive learning rate RMSprop: Similar to Adagrad Adam: Similar to RMSprop + momentum Nadam: Adam + Nesterov momentum

30 Loss Functions for Regression

31 Loss Functions for Classification

32 Exercises Express the derivative of y=f(x) in terms of y:
Derive the derivative of tanh(x/2) in terms of sigmoid(x) Express tanh(x/2) in terms of sigmoid(x). Given y=sigmoid(x) and y’=y(1-y), find the derivative of tanh(x/2). Quiz!


Download ppt "Deep Neural Networks (DNN)"

Similar presentations


Ads by Google