Lecture 08 Classification-based Learning

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
EE 690 Design of Embodied Intelligence
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
NEURAL NETWORKS Perceptron
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Simple Neural Nets For Pattern Classification
The back-propagation training algorithm
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Gini Index (IBM IntelligentMiner)
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Artificial neural networks:
Artificial neural networks:
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
© Negnevitsky, Pearson Education, Introduction, or what is data mining? Introduction, or what is data mining? Data warehouse and query tools Data.
© Negnevitsky, Pearson Education, Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works Introduction, or.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Multiple-Layer Networks and Backpropagation Algorithms
Artificial Neural Networks
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath.
© Negnevitsky, Pearson Education, Lecture 8 (chapter 6) Artificial neural networks: Supervised learning The perceptron Quick Review The perceptron.
1 Machine Learning The Perceptron. 2 Heuristic Search Knowledge Based Systems (KBS) Genetic Algorithms (GAs)
Artificial Neural Network Supervised Learning دكترمحسن كاهاني
NEURAL NETWORKS FOR DATA MINING
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
Classification / Regression Neural Networks 2
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Introduction, or what is data mining? Introduction, or what is data mining? Data warehouse and query tools Data warehouse and query tools Decision trees.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
10/17/2015Intelligent Systems and Soft Computing1 Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works Introduction,
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Multi-Layer Perceptron
Non-Bayes classifiers. Linear discriminants, neural networks.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
EEE502 Pattern Recognition
Neural Networks 2nd Edition Simon Haykin
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
 Negnevitsky, Pearson Education, Lecture 7 Artificial neural networks: Supervised learning n Introduction, or how the brain works n The neuron.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Other Classification Models: Neural Network
Artificial neural networks:
Classification with Perceptrons Reading:
Intelligent Systems and Soft Computing
Neural Network - 2 Mayank Vatsa
Presentation transcript:

Lecture 08 Classification-based Learning Topics Basics Decision Trees Multi-Layered Perceptrons Applications

Basics Machine Learning Classification Inductive learning Deductive learning Abductive learning (reasoning) Reinforcement learning Collaborative learning Classification Given a set of examples, each labelled as in a specific class, learn how we classify the examples Supervised learning

Basics Decision Trees Multi-Layered Perceptrons (MLP) A symbolic representation of of a reasoning process. It describes a data set by a tree-like structure. Multi-Layered Perceptrons (MLP) A subsymbolic representation architecture with multiple layers of preceptrons Parallel inference Learning by generalizing patterns to approximate functions

Decision Trees Example

Decision Trees Data representation: Attribute-based language Attribute-list and two examples: Alist: [Gender Age Blood Smoking Caffeine HT?] Ex1: [Male 50-59 high ≥1pack ≥3cups high] Ex2: [Female 50-59 low ≥1pack ≥3cups normal] Where Blood: Blood-pressure Caffeine: Caffeine-intake HT: Hypertension, class label

Decision Trees Knowledge representation: Tree-like structure The tree always starts from the root node and grows down by splitting the data at each level into new nodes according to some predictor (attribute, feature). The root node contains the entire data set (all data records), and child nodes hold respective subsets of that set. A split in a decision tree corresponds to the predictor with the maximum separating power. The best split does the best job in creating nodes where a single class dominates. Two of the best known methods of calculating the predictor’s power: Gini coefficient Entropy

Decision Trees The Gini coefficient is calculated as the area between the Lorenz curve and the diagonal divided by the area below the diagonal, i.e., (A-B)/B. The Gini coefficient ranges from 0 (perfect equality) to 1 (perfect inequality). Looking for largest Gini coefficient Brown formula:

Decision Trees Selecting an optimal tree with Gini splitting

Decision Trees Calculation of Gini coefficient for Class A (with 7 leaf nodes forming a Lorenz curve) (Non-increasingly) sorted instances in Class A 59, 23, 11, 4, 2, 1, 0 Corresponding cumulative percentages for Class A 59/100, 82/100, 93/100, 97/100, 99/100, 100/100, 100/100 Corresponding cumulative percentages for total population 60/150, 86/150, 97/150, 102/150, 105/150, 114/150, 150/150 G(A) = |1- [60/150 * 59/100 + (86-60)/150 * (59+82)/ 100 + (97-86)/150 * (82+93)/ 100 + (102-97)/150 * (97+93)/ 100 + (105-102)/150 * (99+97)/ 100 + (114-105)/150 * (100+99)/ 100 + (150-114)/150 * (100+ 100)/ 100]| =0.311

Decision Trees Gain chart of Class A

Decision Trees Selecting an optimal tree with random splitting

Decision Trees Extracting rules from decision trees The path from the root node to a bottom leaf reveals a decision rule For example, a rule associated with the right bottom leaf in the figure that represents Gini splits can be represented as follows: if (Predictor 1 = no) and (Predictor 4 = no) and (Predictor 6 = no) then class = Class A

Decision Trees Entropy Node A contains n classes, ci, i = 1, …, n, each with probability p(ci) - Entropy of node A: - Entropy of child nodes of A by some split (CNi: child node i; p(CNi): probability of CNi): - Gain of the split at node A:

Decision Trees Select the split with largest gain. When to end splitting? Calculate deviation D of the split against random splitting - p’ and n’: expected instances in class p and n if randomly distributed Null hypothesis: if random splitting, the deviation D will be distributed according to distribution with n-1 degrees of freedom

Decision Trees distribution (pdf) Stop splitting if D(n) ≤ r: degree of freedom χ2 p value r 0.25 0.20 0.15 0.10 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005 1 1.32 1.64 2.07 2.71 3.84 5.02 5.41 6.63 7.88 9.14 10.83 12.12 2 2.77 3.22 3.79 4.61 5.99 7.38 7.82 9.21 10.60 11.98 13.82 15.20 3 4.11 4.64 5.32 6.25 7.81 9.35 9.84 11.34 12.84 14.32 16.27 17.73 Stop splitting if D(n) ≤

Decision Trees The main advantage of the decision-tree approach to classification is it visualises the solution; it is easy to follow any path through the tree. Relationships learned by a decision tree can be expressed as a set of rules, which can then be used in developing an intelligent system.

Decision Trees Data preprocessing Continuous data, such as age or income, have to be grouped into ranges, which can unwittingly hide important patterns. Missing or inconsistent data have to be brought back or resolved. Inability to examine more than one variable at a time. This confines trees to only the problems that can be solved by dividing the solution space into several successive rectangles

Multi-Layered Perceptrons Example

Multi-Layered Perceptrons Emulating biological neural network

Multi-Layered Perceptrons Data representation: Coded attribute-based language Each input node takes an attribute (feature) Coded attribute-list and two examples: A-list: [Gender Age Blood Smoking Caffeine HT?] Ex1: [0 0.5 1 1 1 1] Ex2: [1 0.5 0 1 1 0] Gender (categorical data): 0: male; 1:female Age (continuous data ): age/100 (or 0.1:<10 yrs; 0.2: 20~21; …; 0.9: 90~99; 1.0: >99) Blood (Blood-pressure): 0: low; 1: high Smoking: cigarettes /40 (or 0 for 0 cigarettes; 0.1: <10cigarretes; 0.5: 10~19; 1: ≥ 1 pack) Caffeine (Caffeine-intake continuous data): cups/3 (or 0: 0 cups; 0.5: 1~2 cups; 1.0: ≥ 3cups ) HT (Hypertension, class label): 0: normal; 1: high

Multi-Layered Perceptrons Knowledge representation: weight, neuron, and network structure

Multi-Layered Perceptrons Neuron structure: A neuron computes the weighted sum of the input signals and compares the result with a threshold value, . If the net input is less than the threshold, the neuron output is –1. But if the net input is greater than or equal to the threshold, the neuron becomes activated and its output attains a value +1. The neuron uses the following transfer or activation function: Y(X) is called a sign function, Ysign.

Multi-Layered Perceptrons Sample activation functions

MLP - Perceptrons Network structure: Preceptrons A perceptron is the simplest form of a neural network, consisting of a single neuron with adjustable synaptic weights and a hard limiter A single-layer two input perceptron

MLP - Perceptrons The aim of the perceptron is to classify inputs, x1, x2, . . ., xn, into one of two classes, say Y=A1 or Y=A2. In the case of an elementary perceptron, the n-dimensional input space is divided by a hyperplane into two decision regions. The hyperplane is defined by the linearly separable function:

MLP - Perceptrons cf: ax+by=c cf: ax+by+cz=d

MLP - Perceptrons A perceptron learn its classification by making small adjustments in the weights to reduce the difference (error) between the desired and actual outputs of the perceptron. The initial weights are randomly assigned, usually in a small range, and then updated to obtain the output consistent with the training examples. If the error is positive, we need to increase perceptron output; if it is negative, we need to decrease perceptron output.

MLP - Perceptron learning algorithm Step 1: Initialization Set initial weights w1, w2,…, wn and threshold  to random numbers in the range [0.5, 0.5].

MLP - Perceptron learning algorithm Step 2: Activation (a) Activate the perceptron by applying inputs x1(p), x2(p),…, xn(p) and desired output Yd (p). Calculate the actual output at iteration p = 1: where n is the number of the perceptron inputs. (b) Calculate the output error:

MLP - Perceptron learning algorithm Step 3: Weight training Calculate the weight correction at iteration p, Dwi(p), using delta rule: Update the weights of the perceptron Step 4: Iteration Increase iteration p by one, go back to Step 2 and repeat the process until convergence Epoch: An epoch finishes the weight adjustment with the whole training examples.

Multi-Layered Perceptrons Linearly inseparable problem with perceptrons Increasing layers: from perceptrons to MLP An MLP is a feedforward neural network with one or more hidden layers The network consists of an input layer of source neurons, at least one middle or hidden layer of computational neurons, and an output layer of computational neurons. The input signals are propagated in a forward direction on a layer-by-layer basis.

Multi-Layered Perceptrons MLP with two hidden layers

Multi-Layered Perceptrons Learning in an MLP proceeds the same way as for a perceptron. First, a training input pattern is presented to the network input layer. The network propagates the input pattern from layer to layer until the output pattern is generated by the output layer. If this pattern is different from the desired output, an error is calculated and then propagated backwards through the network from the output layer to the input layer. The weights are modified as the error is propagated - Back-propagation learning algorithm (BP)

Multi-Layered Perceptrons

MLP - BP Learning Algorithm Step 1: Initialization Set all the weights and threshold levels of the network to random numbers uniformly distributed inside a small range: where Fi is the total number of inputs of neuron i in the network. The weight initialization is done on a neuron-by-neuron basis.

MLP - BP Learning Algorithm Step 2: Activation Activate MLP by applying inputs x1(p), x2(p),…, xn(p) and desired outputs yd,1(p), yd,2(p),…, yd,l(p). (a) Calculate the actual outputs of the neurons in the hidden layer: where j = 1, .., m.

MLP - BP Learning Algorithm Step 2: Activation (continued) (b) Calculate the actual outputs of the neurons in the output layer: where k = 1, …, l. (c) Calculate the output errors of the neurons in the output layer:

MLP - BP Learning Algorithm Step 3: Weight training (a) Calculate the error gradient for the neurons in the output layer: Calculate the weight corrections by delta rule: Update the weights at the output neurons:

MLP - BP Learning Algorithm Step 3: Weight training (continued) (b) Calculate the propagated errors for the neurons in the hidden layer: Calculate the error gradient for the neurons in the hidden layer: Calculate the weight corrections by delta rule: Update the weights at the hidden neurons:

MLP - BP Learning Algorithm Step 4: Iteration Increase iteration p by one, go back to Step 2 and repeat the process until the selected error criterion is satisfied.

Multi-Layered Perceptrons We can accelerate training by including a momentum term in the delta rule and changing it into the generalized delta rule: where  is a positive number (0    1), called the momentum constant. Typically, the momentum constant is set to 0.95.

Multi-Layered Perceptrons To accelerate the convergence and yet avoid the danger of instability, we can apply two heuristics: Heuristic 1 If the change of the sum of squared errors has the same algebraic sign for several consequent epochs, then the learning rate parameter, , should be increased. The sum of squared errors (network performance measure): Heuristic 2 If the algebraic sign of the change of the sum of squared errors alternates for several consequent epochs, then the learning rate parameter, , should be decreased.

Multi-Layered Perceptrons Adapting the learning rate If the sum of squared errors at the current epoch exceeds the previous value by more than a predefined ratio (typically 1.04), the learning rate parameter is decreased (typically by multiplying by 0.7) and new weights and thresholds are calculated. If the error is less than the previous one, the learning rate is increased (typically by multiplying by 1.05).

Applications Decision trees Multi-Layered Perceptrons Data mining Churn model construction Multi-Layered Perceptrons Function approximation Pattern recognition Hand-writing recognition Case-based retrieval