Lecture 14 – Neural Networks

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Beyond Linear Separability
A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Navneet Goyal, BITS-Pilani Perceptrons. Labeled data is called Linearly Separable Data (LSD) if there is a linear decision boundary separating the classes.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Supervised Learning Recap
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 2: Review Part 2.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Classification Neural Networks 1
Machine Learning Neural Networks
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Support Vector Machines (and Kernel Methods in general)
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
x – independent variable (input)
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Neural Networks Marco Loog.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Data mining and statistical learning - lecture 12 Neural networks (NN) and Multivariate Adaptive Regression Splines (MARS)  Different types of neural.
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Classification Part 3: Artificial Neural Networks
Explorations in Neural Networks Tianhui Cai Period 3.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.
Classification / Regression Neural Networks 2
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Neural Networks –Backprop –Modular Design.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
CS 188: Artificial Intelligence Learning II: Linear Classification and Neural Networks Instructors: Stuart Russell and Pat Virtue University of California,
Each neuron has a threshold value Each neuron has weighted inputs from other neurons The input signals form a weighted sum If the activation level exceeds.
Perceptrons Michael J. Watts
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
MAXIMUM ENTROPY, SUPPORT VECTOR MACHINES, CONDITIONAL RANDOM FIELDS, NEURAL NETWORKS Heng Ji 04/12, 04/15, 2016.
Lecture 12. Outline of Rule-Based Classification 1. Overview of ANN 2. Basic Feedforward ANN 3. Linear Perceptron Algorithm 4. Nonlinear and Multilayer.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Neural networks and support vector machines
Deep Feedforward Networks
Artificial Neural Networks
Learning with Perceptrons and Neural Networks
ECE 5424: Introduction to Machine Learning
COMP24111: Machine Learning and Optimisation
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Classification / Regression Neural Networks 2
Goodfellow: Chap 6 Deep Feedforward Networks
Classification Neural Networks 1
Artificial Intelligence Methods
Artificial Neural Networks
Neural Network - 2 Mayank Vatsa
Lecture Notes for Chapter 4 Artificial Neural Networks
Neural networks (1) Traditional multi-layer perceptrons
CSSE463: Image Recognition Day 17
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
Image recognition.
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Lecture 14 – Neural Networks Machine Learning March 18, 2010

Last Time Perceptrons Perceptron Loss vs. Logistic Regression Loss Training Perceptrons and Logistic Regression Models using Gradient Descent

Today Multilayer Neural Networks Feed Forward Error Back-Propagation

Recall: The Neuron Metaphor Neurons accept information from multiple inputs, transmit information to other neurons. Multiply inputs by weights along edges Apply some function to the set of inputs at each node

Types of Neurons Linear Neuron Logistic Neuron Potentially more. Require a convex loss function for gradient descent training. Perceptron

Multilayer Networks Cascade Neurons together The output from one layer is the input to the next Each Layer has its own sets of weights

Linear Regression Neural Networks What happens when we arrange linear neurons in a multilayer network?

Linear Regression Neural Networks Nothing special happens. The product of two linear transformations is itself a linear transformation. f(x,\vec{\theta})&=&\sum_{i=0}^D\theta_{1,i}\sum_{n=0}^{N-1}\theta_{0,i,n} {x_n}\\f(x,\vec{\theta})&=&\sum_{i=0}^D\theta_{1,i}[\theta_{0,i}^T\vec{x}]\\f(x,\vec{\theta})&=&\sum_{i=0}^D[\hat\theta_{i}^T\vec{x}]\\

Neural Networks We want to introduce non-linearities to the network. Non-linearities allow a network to identify complex regions in space

Linear Separability 1-layer cannot handle XOR More layers can handle more complicated spaces – but require more parameters Each node splits the feature space with a hyperplane If the second layer is AND a 2-layer network can represent any convex hull.

Feed-Forward Networks Predictions are fed forward through the network to classify

Feed-Forward Networks Predictions are fed forward through the network to classify

Feed-Forward Networks Predictions are fed forward through the network to classify

Feed-Forward Networks Predictions are fed forward through the network to classify

Feed-Forward Networks Predictions are fed forward through the network to classify

Feed-Forward Networks Predictions are fed forward through the network to classify

Error Backpropagation We will do gradient descent on the whole network. Training will proceed from the last layer to the first.

Error Backpropagation Introduce variables over the neural network \vec{\theta}=\{w_{ij}, w_{jk}, w_{kl}\}

Error Backpropagation Introduce variables over the neural network Distinguish the input and output of each node \vec{\theta}=\{w_{ij}, w_{jk}, w_{kl}\}

Error Backpropagation \vec{\theta}=\{w_{ij}, w_{jk}, w_{kl}\}

Error Backpropagation Training: Take the gradient of the last component and iterate backwards \vec{\theta}=\{w_{ij}, w_{jk}, w_{kl}\}

Error Backpropagation Empirical Risk Function

Error Backpropagation Optimize last layer weights wkl Calculus chain rule \frac{\partial R}{\partial w_{kl}}=\frac{1}{N}\sum_n\left[\frac{\partial L_n}{\partial a_{l,n}}\right]\left[\frac{\partial a_{l,n}}{\partial w_{kl}}\right]

Error Backpropagation Optimize last layer weights wkl Calculus chain rule

Error Backpropagation Optimize last layer weights wkl Calculus chain rule

Error Backpropagation Optimize last layer weights wkl Calculus chain rule

Error Backpropagation Optimize last layer weights wkl Calculus chain rule

Error Backpropagation Optimize last hidden weights wjk \frac{\partial R}{\partial w_{kl}}=\frac{1}{N}\sum_n\left[\frac{\partial \frac{1}{2}(y_n-g(a_{l,n}))^2}{\partial a_{l,n}}\right]\left[\frac{\partial z_{k,n}w_{kl}}{\partial w_{kl}}\right]&=&\sum_n\left[-(y_n-z_{l,n})g'(a_{l,n})\right]z_{k,n}\\&=&\sum_n\delta_nz_{k,n}

Error Backpropagation Optimize last hidden weights wjk Multivariate chain rule \frac{\partial R}{\partial w_{kl}}=\frac{1}{N}\sum_n\left[\frac{\partial \frac{1}{2}(y_n-g(a_{l,n}))^2}{\partial a_{l,n}}\right]\left[\frac{\partial z_{k,n}w_{kl}}{\partial w_{kl}}\right]&=&\sum_n\left[-(y_n-z_{l,n})g'(a_{l,n})\right]z_{k,n}\\&=&\sum_n\delta_nz_{k,n}

Error Backpropagation Optimize last hidden weights wjk Multivariate chain rule \frac{\partial R}{\partial w_{kl}}=\frac{1}{N}\sum_n\left[\frac{\partial \frac{1}{2}(y_n-g(a_{l,n}))^2}{\partial a_{l,n}}\right]\left[\frac{\partial z_{k,n}w_{kl}}{\partial w_{kl}}\right]&=&\sum_n\left[-(y_n-z_{l,n})g'(a_{l,n})\right]z_{k,n}\\&=&\sum_n\delta_nz_{k,n}

Error Backpropagation Optimize last hidden weights wjk Multivariate chain rule \frac{\partial R}{\partial w_{kl}}=\frac{1}{N}\sum_n\left[\frac{\partial \frac{1}{2}(y_n-g(a_{l,n}))^2}{\partial a_{l,n}}\right]\left[\frac{\partial z_{k,n}w_{kl}}{\partial w_{kl}}\right]&=&\sum_n\left[-(y_n-z_{l,n})g'(a_{l,n})\right]z_{k,n}\\&=&\sum_n\delta_nz_{k,n}

Error Backpropagation Optimize last hidden weights wjk Multivariate chain rule \frac{\partial R}{\partial w_{jk}}=\frac{1}{N}\sum_n\left[\sum_l\delta_l w_{kl}g'(a_{k,n})\right]\left[z_{j,n}\right]\\

Error Backpropagation Repeat for all previous layers \frac{\partial R}{\partial w_{ij}}&=&\frac{1}{N}\sum_n\left[\frac{\partial L_n}{\partial a_{j,n}}\right]\left[\frac{\partial a_{j,n}}{\partial w_{ij}}\right]=\frac{1}{N}\sum_n\left[ \sum_k\delta_{k,n}w_{jk}g'(a_{j,n}) \right]z_{i,n} =\frac{1}{N} \sum_n\delta_{j,n}z_{i,n}

Error Backpropagation Now that we have well defined gradients for each parameter, update using Gradient Descent \frac{\partial R}{\partial w_{jk}}=\frac{1}{N}\sum_n\left[\sum_l\delta_l w_{kl}g'(a_{k,n})\right]\left[z_{j,n}\right]\\

Error Back-propagation Error backprop unravels the multivariate chain rule and solves the gradient for each partial component separately. The target values for each layer come from the next layer. This feeds the errors back along the network.

Problems with Neural Networks Interpretation of Hidden Layers Overfitting

Interpretation of Hidden Layers What are the hidden layers doing?! Feature Extraction The non-linearities in the feature extraction can make interpretation of the hidden layers very difficult. This leads to Neural Networks being treated as black boxes.

Overfitting in Neural Networks Neural Networks are especially prone to overfitting. Recall Perceptron Error Zero error is possible, but so is more extreme overfitting Logistic Regression Perceptron

Bayesian Neural Networks Bayesian Logistic Regression by inserting a prior on the weights Equivalent to L2 Regularization We can do the same here. Error Backprop then becomes Maximum A Posteriori (MAP) rather than Maximum Likelihood (ML) training

Handwriting Recognition Demo: http://yann.lecun.com/exdb/lenet/index.html

Convolutional Network The network is not fully connected. Different nodes are responsible for different regions of the image. This allows for robustness to transformations.

Other Neural Networks Multiple Outputs Skip Layer Network Recurrent Neural Networks

Multiple Outputs Used for N-way classification. Each Node in the output layer corresponds to a different class. No guarantee that the sum of the output vector will equal 1.

Skip Layer Network Input nodes are also sent directly to the output layer.

Recurrent Neural Networks Output or hidden layer information is stored in a context or memory layer. Output Layer Hidden Layer Context Layer Input Layer

Recurrent Neural Networks Output or hidden layer information is stored in a context or memory layer. Output Layer Hidden Layer Context Layer Input Layer

Time Delayed Recurrent Neural Networks (TDRNN) Output layer from time t are used as inputs to the hidden layer at time t+1. Output Layer With an optional decay Hidden Layer Input Layer

Maximum Margin Perceptron can lead to many equally valid choices for the decision boundary Are these really “equally valid”?

Max Margin How can we pick which is best? Maximize the size of the margin. Small Margin LargeMargin Are these really “equally valid”?

Next Time Maximum Margin Classifiers Support Vector Machines Kernel Methods