Deep Neural Networks (DNN)

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Introduction to Neural Networks Computing
Artificial Neural Networks
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Perceptron.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Decision Support Systems
Simple Neural Nets For Pattern Classification
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
The back-propagation training algorithm
An Illustrative Example
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
Artificial Neural Networks
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Lecture 3 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 3/1 Dr.-Ing. Erwin Sitompul President University
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 8: Neural Networks.
Non-Bayes classifiers. Linear discriminants, neural networks.
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Robert J. Marks II CIA Lab Baylor University School of Engineering CiaLab.org Artificial Neural Networks: Supervised Models.
Chapter 2 Single Layer Feedforward Networks
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Linear Classifiers (LC) J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Fall 2004 Backpropagation CS478 - Machine Learning.
National Taiwan University
Neural Networks and Its Deep Structures
The Gradient Descent Algorithm
Adavanced Numerical Computation 2008, AM NDHU
Learning with Perceptrons and Neural Networks
Intro to Machine Learning
Gradient Descent 梯度下降法
Chapter 2 Single Layer Feedforward Networks
Derivation of a Learning Rule for Perceptrons
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Neural Networks and Backpropagation
CSC 578 Neural Networks and Deep Learning
ECE 471/571 - Lecture 17 Back Propagation.
Disadvantages of Discrete Neurons
Neuro-Computing Lecture 4 Radial Basis Function Network
Intro to Machine Learning
Neural Networks Chapter 5
Neural Network - 2 Mayank Vatsa
Chap 8: Adaptive Networks
Chapter 9: Supervised Learning Neural Networks
Neural Networks Geoff Hulten.
Backpropagation.
Multilayer Perceptron: Learning : {(xi, f(xi)) | i = 1 ~ N} → W
Backpropagation Disclaimer: This PPT is modified based on
Backpropagation David Kauchak CS159 – Fall 2019.
Chapter - 3 Single Layer Percetron
Backpropagation.
Gradient Descent 梯度下降法
Presentation transcript:

Deep Neural Networks (DNN) J.-S. Roger Jang (張智星) jang@mirlab.org http://mirlab.org/jang MIR Lab, CSIE Dept. National Taiwan University 2019/1/16

Concept of Modeling Modeling Two steps in modeling x1 xn . . . y y* Given desired i/o pairs (training set) of the form (x1, ..., xn; y), construct a model to match the i/o pairs Two steps in modeling Structure identification: input selection, model complexity Parameter identification: optimal parameters x1 xn . . . Unknown target system y Model y*

Neural Networks Supervised Learning Unsupervised Learning Others Multilayer perceptrons Radial basis function networks Modular neural networks LVQ (learning vector quantization) Unsupervised Learning Competitive learning networks Kohonen self-organizing networks ART (adaptive resonant theory) Others Hopfield networks

Single-layer Perceptrons Proposed by Widrow & Hoff in 1960 AKA ADALINE (Adaptive Linear Neuron) or single-layer perceptron Training data x1 w0 w1 y w2 x2 (voice freq.) x2 Quiz! x1 (hair length) perceptronDemo.m

Multilayer Perceptrons (MLPs) Extension of SLP to MLP to have complex decision boundaries How to train MLPs? Use sigmoidal function to replace signum function Use gradient descent for updating parameters x1 y1 x2 y2

Continuous Activation Functions In order to use gradient descent, we need to replace the signum function by its continuous versions Sigmoid Hyper-tangent Identity y = 1/(1+exp(-x)) y = tanh(x/2) y = x

Activation Functions

Classical MLPs Typical 2-layer MLPs: Learning rule x1 y1 x2 y2 Gradient descent (Backpropagation) Conjugate gradient method All optim. methods using first derivative Derivative-free optim. x1 y1 x2 y2

MLP Examples XOR problem Training data Network Arch. x1 x2 y x1 y x2 0 0 0 0 1 1 1 0 1 1 1 0 x1 y x2 x2 x1 y x2 x1

MLP Decision Boundaries Single-layer: Half planes Exclusive-OR problem Meshed regions Most general regions A B A B B A

MLP Decision Boundaries Two-layer: Convex regions Exclusive-OR problem Meshed regions Most general regions A B A B B A

MLP Decision Boundaries Three-layer: Arbitrary regions Exclusive-OR problem Meshed regions Most general regions A B A B B A

Summary: MLP Decision Boundaries Quiz! XOR Intertwined General 1-layer: Half planes A B A B B A 2-layer: Convex A B A B B A 3-layer: Arbitrary A B A B B A

MLP Configurations

Deep Neural Networks

Training an MLP Methods for training MLP Gradient descent Gauss-Newton method Levenberg-Marquart method Backpropagation: A systematic way to compute gradients, starting from the NN’s output

Simple Derivatives Review of chain rule Network representation x y

Chain Rule for Composite Functions Review of chain rule Network representation y f(.) g(.) x z

Chain Rule for Network Representation Review of chain rule Network representation y f(.) h(. , .) x u g(,) z

Backpropagation in Adaptive Networks (1/3) A way to compute gradient from output toward input Adaptive network 1 u x 3 o 2 y v

Backpropagation in Adaptive Networks (2/3) 1 x u 3 o 2 y v

Backpropagation in Adaptive Networks (3/3) 1 P 3 x u 5 o 2 4 y v q You don’t need to !

Summary of Backpropagation General formula for backpropagation, assuming “o” is the network’s final output “a” is a parameter in node 1 Backpropagation! y1 x y2 1 y3 a

Backpropagation in NN (1/2) x1 1 y1 1 o 2 x2 y2

Backpropagation in NN (2/2) 1 y1 x1 1 z1 2 1 x2 o y2 2 z2 3 x3 y3

Use of Mini-batch in Gradient Descent Goal: To speed up the training with large dataset Approach: Update by mini-batch instead of epoch If dataset size is 1000 Batch size = 10  100 updates in an epoch  mini batch Batch size = 100  10 updates in an epoch  mini batch Batch size=1000  1 update in an epoch  full batch A process of going through all data Update by epoch Update by mini-batch Slower update! Faster update!

Use of Momentum Term in Gradient Descent Purpose of using momentum term Avoid oscillations in gradient descent (banana function!) Escape from local minima (???) Formula Original: With momentum term: Contours of banana function Momentum term

Learning Rate Selection

Optimizer in Keras Choices of optimization methods in Keras SGD: Stochastic gradient descent Adagrad: Adaptive learning rate RMSprop: Similar to Adagrad Adam: Similar to RMSprop + momentum Nadam: Adam + Nesterov momentum

Loss Functions for Regression

Loss Functions for Classification

Exercises Express the derivative of y=f(x) in terms of y: Derive the derivative of tanh(x/2) in terms of sigmoid(x) Express tanh(x/2) in terms of sigmoid(x). Given y=sigmoid(x) and y’=y(1-y), find the derivative of tanh(x/2). Quiz!