Machine Learning Supervised Learning Classification and Regression

Slides:



Advertisements
Similar presentations
NEURAL NETWORKS Backpropagation Algorithm
Advertisements

Neural networks Introduction Fitting neural networks
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
For Wednesday Read chapter 19, sections 1-3 No homework.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Neural Networks Marco Loog.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Artificial Neural Networks
CS 4700: Foundations of Artificial Intelligence
Radial Basis Function Networks
Neural networks.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Radial Basis Function Networks
Artificial Neural Networks
Artificial Neural Networks
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Machine Learning Chapter 4. Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Non-Bayes classifiers. Linear discriminants, neural networks.
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Artificial Neural Networks (Cont.) Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Artificial Neural Network
Neural Networks 2nd Edition Simon Haykin
Chapter 6 Neural Network.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
CS 388: Natural Language Processing: Neural Networks
Linear separability Hyperplane In 2D: Feature 1 Feature 2 A perceptron can separate data that is linearly separable.
Machine Learning Clustering: K-means Supervised Learning
Neural Networks Winter-Spring 2014
Learning with Perceptrons and Neural Networks
第 3 章 神经网络.
Real Neurons Cell structures Cell body Dendrites Axon
Linear separability Hyperplane In 2D: Feature 1 Feature 2 A perceptron can separate data that is linearly separable.
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks CS 446 Machine Learning.
Artificial Neural Networks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CSE P573 Applications of Artificial Intelligence Neural Networks
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
ECE 471/571 - Lecture 17 Back Propagation.
General Aspects of Learning
Classification Neural Networks 1
Neuro-Computing Lecture 4 Radial Basis Function Network
Artificial Intelligence 13. Multi-Layer ANNs
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
CSE 573 Introduction to Artificial Intelligence Neural Networks
Artificial Neural Networks
Neural networks (1) Traditional multi-layer perceptrons
Machine Learning Support Vector Machine Supervised Learning
COSC 4335: Part2: Other Classification Techniques
Machine Learning Perceptron: Linearly Separable Supervised Learning
Seminar on Machine Learning Rada Mihalcea
Presentation transcript:

Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron: Linearly Separable Multilayer Perceptron & EBP & Deep Learning, RBF Network Support Vector Machine Ensemble Learning: Voting, Boosting(Adaboost) Unsupervised Learning Principle Component Analysis Independent Component Analysis Clustering: K-means Semi-supervised Learning & Reinforcement Learning

Multi-Layer Perceptron (MLP) Each layer receives its inputs from the previous layer and forwards its outputs to the next – feed forward structure Output layer: sigmoid activation function for classification, linear activation function for regression problem Referred to as a two-layer network (two layer of weights)

Architecture of MLP (N-H-M): Forward Propagation

Expressiveness of MLPs : Representational Power MLP with 3 layers can represent arbitrary function Each hidden unit represents a soft threshold function in the input space Combine two opposite-facing threshold functions to make a ridge Combine two perpendicular ridges to make a bump Add bumps of various sizes and locations to fit any surface Universal approximator A two layer (linear output) network can approximate any continuous function on a compact input domain to arbitrary accuracy given a sufficiently large number of hidden units

MLP Learning Start from a untrained network Present a training example Xi to the input layer, pass the signals through the network and determine the output at the output layer Compare output to the target values, any difference corresponds to an error. We will adjust the weights to reduce this error on the training data

Training of MLP : Error Back-Propagation Algorithm

The Error Backpropagation Algorithm Forward Pass Given X, compute hidden units and output units. Compute Errors Compute error for each output unit Compute Output Deltas Compute Hidden Deltas Compute Gradient – Compute gradient for hidden-to-output weights. – Compute gradient for input-to-hidden weights. For each weight take a gradient step

Training of MLP : Error Back-Propagation Algorithm

Training of MLP : Error Back-Propagation Algorithm

Error Backpropagation Training Create a three layer network with N hidden units, fully connect the network and assign small random weights Until all training examples produce correct output or MSE ceases to decrease – For all training examples, do Begin Epoch For each training example do – Compute the network output – Compute the error – Back-propagate this error from layer to layer and adjust weights to decrease this error End Epoch

Remarks on Training No guarantee of convergence, may oscillate or reach a local minima. However, in practice many large networks can be adequately trained on large amounts of data for realistic problems, e.g. – Driving a car – Recognizing handwritten zip codes – Play world championship level Backgammon Many epochs (thousands) may be needed for adequate training, large data sets may require hours or days of CPU time. Termination criteria can be: – Fixed number of epochs – Threshold on training set error – Increased error on a validation set To avoid local minima problems, can run several trials starting from different initial random weights and: – Take the result with the best training or validation performance. – Build a committee of networks that vote during testing, possibly weighting vote by training or validation accuracy

Notes on Proper Initialization Start in the “linear” regions keep all weights near zero, so that all sigmoid units are in their linear regions. Otherwise nodes can be initialized into flat regions of the sigmoid causing for very small gradients Break symmetry – If we start with the weights all equal, what will happen? – Ensure that each hidden unit has different input weights so that the hidden units move in different directions. Set each weight to a random number in the range where “fan-in” is the number of inputs to the unit.

Batch, Online, and Online with Momentum Batch: Sum the gradient for each example i. Then take a gradient descent step. Online: Take a gradient descent step with each input as it is computed (this is the algorithm we described) Momentum factor: Make the t+1-th update dependent on the t-th update α is called the momentum factor, and typically take values in the range [0.7, 0.95]. This tends to keep weight moving in the same direction and improves convergence..

Overtraining Prevention Running too many epochs may overtrain the network and result in overfitting. Tries too hard to exactly match the training data. Keep a validation set and test accuracy after every epoch. Maintain weights for best performing network on the validation set and return it when performance decreases significantly beyond this. To avoid losing training data to validation: – Use 10-fold cross-validation to determine the average number of epochs that optimizes validation performance. We will discuss cross-validation later in the course – Train on the full data set using this many epochs to produce the final result.

Over-fitting Prevention Too few hidden units prevent the system from adequately fitting the data and learning the concept. Too many hidden units leads to over-fitting. Can also use a validation set or cross-validation to decide an appropriate number of hidden units. Another approach to preventing over-fitting is weight decay, in which we multiply all weights by some fraction between 0 and 1 after each epoch. - Encourages smaller weights and less complex hypotheses. - Equivalent to including an additive penalty in the error function proportional to the sum of the squares of the weights of the network.

Example

Example

Learning Time Series Data Time-delay neural networks

Learning Time Series Data Recurrent neural networks Recurrent connection to self units, or to the previous layer Acts as a short-term memory

Deep Rectifier Networks In order to efficiently represent symmetric/antisymmetric behavior in the data, a rectifier network would need twice as many hidden units as a network of symmetric/antisymmetric activation functions.

Radial Basis Function Networks Locally-tuned units:

Local vs. Distributed Representation

Training RBF Network Parameters: Hybrid learning • First layer centers and spreads (Unsupervised k-means) • Second layer weights (Supervised gradient descent) Fully supervised •Similar to backpropagation in MLP, gradient descent for all parameters

RBF Network: Fully Supervised Method Similar to backpropagation in MLP