Classification / Regression Neural Networks 2

Slides:



Advertisements
Similar presentations
Beyond Linear Separability
Advertisements

A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
Slides from: Doug Gray, David Poole
Neural networks Introduction Fitting neural networks
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Kostas Kontogiannis E&CE
Reading for Next Week Textbook, Section 9, pp A User’s Guide to Support Vector Machines (linked from course website)
Classification Neural Networks 1
Machine Learning Neural Networks
Lecture 14 – Neural Networks
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Biological neuron artificial neuron.
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Artificial Neural Networks
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Collaborative Filtering Matrix Factorization Approach
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Multiple-Layer Networks and Backpropagation Algorithms
Artificial Neural Networks
Artificial Neural Networks An Overview and Analysis.
Machine Learning Chapter 4. Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Appendix B: An Example of Back-propagation algorithm
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
From Machine Learning to Deep Learning. Topics that I will Cover (subject to some minor adjustment) Week 2: Introduction to Deep Learning Week 3: Logistic.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Multi-Layer Perceptron
Non-Bayes classifiers. Linear discriminants, neural networks.
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
EEE502 Pattern Recognition
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Neural Network Terminology
Neural Networks 2nd Edition Simon Haykin
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Machine Learning Supervised Learning Classification and Regression
The Gradient Descent Algorithm
COMP24111: Machine Learning and Optimisation
CSE 473 Introduction to Artificial Intelligence Neural Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Classification / Regression Neural Networks 2
Prof. Carolina Ruiz Department of Computer Science
Machine Learning Today: Reading: Maria Florina Balcan
Classification Neural Networks 1
Synaptic DynamicsII : Supervised Learning
Artificial Neural Networks
Neural Networks Geoff Hulten.
Lecture Notes for Chapter 4 Artificial Neural Networks
Backpropagation.
Neural networks (1) Traditional multi-layer perceptrons
Artificial Intelligence 10. Neural Networks
Backpropagation.
Prof. Carolina Ruiz Department of Computer Science
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Classification / Regression Neural Networks 2

Neural networks Topics Perceptrons Multilayer networks structure training expressiveness Multilayer networks possible structures activation functions training with gradient descent and backpropagation

Neural network application ALVINN: An Autonomous Land Vehicle In a Neural Network (Carnegie Mellon University Robotics Institute, 1989-1997) ALVINN is a perception system which learns to control the NAVLAB vehicles by watching a person drive. ALVINN's architecture consists of a single hidden layer back-propagation network. The input layer of the network is a 30x32 unit two dimensional "retina" which receives input from the vehicles video camera. Each input unit is fully connected to a layer of five hidden units which are in turn fully connected to a layer of 30 output units. The output layer is a linear representation of the direction the vehicle should travel in order to keep the vehicle on the road.

Neural network application ALVINN drives 70 mph on highways!

General structure of multilayer neural network hidden unit i training multilayer neural network means learning the weights of inter-layer connections

Neural network architectures All multilayer neural network architectures have: At least one hidden layer Feedforward connections from inputs to hidden layer(s) to outputs but more general architectures also allow for: Multiple hidden layers Recurrent connections from a node to itself between nodes in the same layer between nodes in one layer and nodes in another layer above it

Neural network architectures Recurrent connections More than one hidden layer

Neural networks: roles of nodes A node in the input layer: distributes value of some component of input vector to the nodes in the first hidden layer, without modification A node in a hidden layer(s): forms weighted sum of its inputs transforms this sum according to some activation function (also known as transfer function) distributes the transformed sum to the nodes in the next layer A node in the output layer: (optionally) transforms this sum according to some activation function

Neural network activation functions

Neural network architectures The architecture most widely used in practice is fairly simple: One hidden layer No recurrent connections (feedforward only) Non-linear activation function in hidden layer (usually sigmoid or tanh) No activation function in output layer (summation only) This architecture can model any bounded continuous function.

Neural network architectures Regression Classification: two classes

Neural network architectures Classification: multiple classes

Classification: multiple classes When outcomes are one of k possible classes, they can be encoded using k dummy variables. If an outcome is class j, then j th dummy variable = 1, all other dummy variables = 0. Example with four class labels:

Algorithm for learning neural network Initialize the connection weights w = (w0, w1, …, wm) w includes all connections between all layers Usually small random values Adjust weights such that output of neural network is consistent with class label / dependent variable of training samples Typical loss function is squared error: Find weights wj that minimize above loss function

Sigmoid unit

Sigmoid unit: training We can derive gradient descent rules to train: A single sigmoid unit Multilayer networks of sigmoid units referred to as backpropagation

Backpropagation Example: stochastic gradient descent, feedforward network with two layers of sigmoid units Do until convergence For each training sample i =  xi, yi  Propagate the input forward through the network Calculate the output oh of every hidden unit Calculate the output ok of every network output unit Propagate the errors backward through the network For each network output unit k, calculate its error term k k = ok( 1 – ok )( yik – ok ) For each hidden unit h, calculate its error term h h = oh( 1 – oh ) k( whkk ) Update each network weight wba wba = wba + bzba where zba is the ath input to unit b

More on backpropagation

neural network classification of crab gender MATLAB interlude matlab_demo_14.m neural network classification of crab gender 200 samples 6 features 2 classes

Neural networks for data compression

Neural networks for data compression

Neural networks for data compression

Convergence of backpropagation

Overfitting in neural networks Robot perception task (example 1)

Overfitting in neural networks Robot perception task (example 2)

Avoiding overfitting in neural networks

Expressiveness of multilayer neural networks

Expressiveness of multilayer neural networks

Expressiveness of multilayer neural networks Trained two-layer network with three hidden units (tanh activation function) and one linear output unit. Blue dots: 50 data points from f( x ), where x uniformly sampled over range ( -1, 1 ). Grey dashed curves: outputs of the three hidden units. Red curve: overall network function. f( x ) = x2 f( x ) = sin( x )

Expressiveness of multilayer neural networks Trained two-layer network with three hidden units (tanh activation function) and one linear output unit. Blue dots: 50 data points from f( x ), where x uniformly sampled over range ( -1, 1 ). Grey dashed curves: outputs of the three hidden units. Red curve: overall network function. f( x ) = abs( x ) f( x ) = H( x ) Heaviside step function

Expressiveness of multilayer neural networks Two-class classification problem with synthetic data. Trained two-layer network with two inputs, two hidden units (tanh activation function) and one logistic sigmoid output unit. Blue lines: z = 0.5 contours for hidden units Red line: y = 0.5 decision surface for overall network Green line: optimal decision boundary computed from distributions used to generate data