CS 484 – Artificial Intelligence1 Announcements List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October.

Slides:

Advertisements

Similar presentations

Concept Learning and the General-to-Specific Ordering

Advertisements

2. Concept Learning 2.1 Introduction

Introduction to Artificial Intelligence CS440/ECE448 Lecture 21

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)

Decision Trees Decision tree representation ID3 learning algorithm

1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.

ICS320-Foundations of Adaptive and Learning Systems

Decision Tree Example MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.

CS 484 – Artificial Intelligence1 Announcements Project 1 is due Tuesday, October 16 Send me the name of your konane bot Midterm is Thursday, October 18.

Università di Milano-Bicocca Laurea Magistrale in Informatica

1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.

Decision Trees. DEFINE: Set X of Instances (of n-tuples x = ) –E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, Water, Forecast.

Decision Tree Learning

CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3.

Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.

Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

Ch 3. Decision Tree Learning

Machine Learning Reading: Chapter Text Classification  Is text i a finance new article? PositiveNegative.

MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

Concept Learning and Version Spaces

NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav.

Machine Learning Chapter 3. Decision Tree Learning

Short Introduction to Machine Learning Instructor: Rada Mihalcea.

Artificial Intelligence 7. Decision trees

1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.

Machine Learning Chapter 11.

CpSc 810: Machine Learning Decision Tree Learning.

General-to-Specific Ordering. 8/29/03Logic Based Classification2 SkyAirTempHumidityWindWaterForecastEnjoySport SunnyWarmNormalStrongWarmSameYes SunnyWarmHighStrongWarmSameYes.

1 Concept Learning By Dong Xu State Key Lab of CAD&CG, ZJU.

Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.

Chapter 2: Concept Learning and the General-to-Specific Ordering.

For Wednesday No reading Homework: –Chapter 18, exercise 6.

CpSc 810: Machine Learning Concept Learning and General to Specific Ordering.

Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.

Outline Inductive bias General-to specific ordering of hypotheses

Overview Concept Learning Representation Inductive Learning Hypothesis

For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.

CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.

CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.

1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.

Machine Learning: Lecture 2

Machine Learning Concept Learning General-to Specific Ordering

1 Decision Trees ExampleSkyAirTempHumidityWindWaterForecastEnjoySport 1SunnyWarmNormalStrongWarmSameYes 2SunnyWarmHighStrongWarmSameYes 3RainyColdHighStrongWarmChangeNo.

Introduction to Machine Learning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.

CS464 Introduction to Machine Learning1 Concept Learning Inducing general functions from specific training examples is a main issue of machine learning.

Concept Learning and The General-To Specific Ordering

Computational Learning Theory Part 1: Preliminaries 1.

Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

Decision Tree Learning CMPT 463. Reminders Homework 7 is due on Tuesday, May 10 Projects are due on Tuesday, May 10 o Moodle submission: readme.doc and.

Chapter 2 Concept Learning

Decision Tree Learning

Concept Learning Machine Learning by T. Mitchell (McGraw-Hill) Chp. 2

CSE543: Machine Learning Lecture 2: August 6, 2014

CS 9633 Machine Learning Concept Learning

Artificial Intelligence

Analytical Learning Discussion (4 of 4):

Machine Learning Chapter 2

Ordering of Hypothesis Space

Machine Learning Chapter 3. Decision Tree Learning

Machine Learning: Lecture 3

Machine Learning Chapter 3. Decision Tree Learning

Concept Learning.

Concept Learning Berlin Chen 2005 References:

Machine Learning Chapter 2

Version Space Machine Learning Fall 2018.

Machine Learning Chapter 2

Presentation transcript:

CS 484 – Artificial Intelligence1 Announcements List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October 30

Classification problems and Machine Learning Lecture 10

CS 484 – Artificial Intelligence3 EnjoySport concept learning task Given Instances X: Possible days, each described by the attributes Sky (with possible values Sunny, Cloudy, and Rainy) AirTemp (with values Warm and Cold) Humidity (with values Normal and High) Wind (with values Strong and Weak) Water (with values Warm and Cool), and Forecast (with values Same and Change) Hypothesis H: Each hypothesis is described by a conjunction of constraints on the attributes. The constraints may be "?", " Ø ", or a specific value Target concept c: EnjoySport : X → {0,1} Training Examples D: Positive or negative examples of the target function Determine A hypothesis h in H such that h(x) = c(x) for all x in X

CS 484 – Artificial Intelligence4 Find-S: Finding a maximally Specific Hypothesis (review) 1.Initialize h to the most specific hypothesis in H 2.For each positive training instance x For each attribute constraint a i in h If the constraint a i is satisfied by x Then do nothing Else replace a i in h by the next more general constraint that is satisfied by x 3.Output hypothesis h Begin: h ← ExampleSkyAirTempHumidityWindWaterForecastEnjoySport 1SunnyWarmNormalStrongWarmSameYes 2SunnyWarmHighStrongWarmSameYes 3RainyColdHighStrongWarmChangeNo 4SunnyWarmHighStrongCoolChangeYes

CS 484 – Artificial Intelligence5 Candidate Elimination Candidate elimination aims to derive one hypothesis which matches all training data (including negative examples). { } { }, { } G: S:

CS 484 – Artificial Intelligence6 Candidate-Elimination Learning Algorithm Initialize G to the set of maximally general hypotheses in H Initialize S to the set of maximally specific hypotheses in H For each training example d, do If d is a positive example Remove from G any hypothesis inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S Add to S all minimal generalizations h of s such that h is consistent with d, and some member of G is more general than h Remove from S any hypothesis that is more general than another hypothesis in S If d is a negative example Remove from S any hypothesis inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G Add to G all minimal specializations h of g such that h is consistent with d, and some member of S is more specific than h Remove from G any hypothesis that is less general than another hypothesis in G

CS 484 – Artificial Intelligence7 Example G 0 ← { } G 1 ← G 2 ← S 0 ← { } S 1 ← S 2 ← E 1 = positive E 2 = positive

CS 484 – Artificial Intelligence8 Example (cont. 2) G 2 ← G 3 ← S 2 ← S 3 ← E 3 = negative

CS 484 – Artificial Intelligence9 Example (cont. 3) G 3 ← G 4 ← S 3 ← S 4 ← E 4 = positive

CS 484 – Artificial Intelligence10 Decision Tree Learning Has two major benefits over Find-S and Candidate Elimination Can cope with noisy data Capable of learning disjunctive expressions Limitation May be many valid decision trees given the training data Prefers small trees over large trees Apply to board range of learning tasks Classify medical patients by their disease Classify equipment malfunctions by their cause Classify loan applicants by their likelihood of defaulting on payments

CS 484 – Artificial Intelligence11 Decision Tree Example Outlook HumidityWind Yes Sunny Overcast Rain Yes No High Normal StrongWeak Days on which to play tennis

CS 484 – Artificial Intelligence12 Decision Tree Induction (1) Decision tree induction involves creating a decision tree from a set of training data that can be used to correctly classify the training data. ID3 is an example of a decision tree learning algorithm. ID3 builds the decision tree from the top down, selecting the features from the training data that provide the most information at each stage.

CS 484 – Artificial Intelligence13 Decision Tree Induction (2) ID3 selects attributes based on information gain. Information gain is the reduction in entropy caused by a decision. Entropy is defined as: H(S) = - p 1 log 2 p 1 - p 0 log 2 p 0 p 1 is the proportion of the training data which are positive examples p 0 is the proportion which are negative examples Intuition about H(S) Zero (min value) when all the examples are the same (positive or negative) One (max value) when half are positive and half are negative.

CS 484 – Artificial Intelligence14 Example – Training Data DayOutlookTemperatureHumidityWindPlayTennis D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildHighWeakYes D5RainCoolNormalWeakYes D6RainCoolNormalStrongNo D7OvercastCoolNormalStrongYes D8SunnyMildHighWeakNo D9SunnyCoolNormalWeakYes D10RainMildNormalWeakYes D11SunnyMildNormalStrongYes D12OvercastMildHighStrongYes D13OvercastHotNormalWeakYes D14RainMildHighStrongNo

CS 484 – Artificial Intelligence15 Calculate Information Gain Initial Entropy All examples in one class 9 positive examples, 5 negative examples H(init) = log log = Calculate Entropy for each attribute and then combine as a weighted sum Entropy of "Outlook" Sunny 5 examples, 2 positives, 3 negatives H(Sunny) = -(2/5) log 2 (2/5) – (3/5) log 2 (3/5) = Overcast 4 examples, 4 positives, 0 negatives H(Overcast) = -1 log 2 (1) – (0) log 2 (0) = 0 (0 log 2 0 is defined as 0) Rain 5 examples, 3 positives, 2 negatives H(Rain) = -(3/5) log 2 (3/5) – (2/5) log 2 (2/5) = H(Outlook) =.357(0.971) +.286(0) +.357(0.971) = Information Gain = H(0) – H(1) Gain = – =.246

CS 484 – Artificial Intelligence16 Maximize Information Gain Gain of each attribute Gain(Outlook) = Gain(Humidity) = Gain(Wind) = Gain(Temperature) = Outlook ?? Sunny Overcast Rain {D1, D2, …, D14} [9+,5-] Yes {D3, D7, D12, D13} [4+,0-] {D4, D5, D6,D10, D14} [3+,2-] {D1, D2, D8,D9, D11} [2+,3-]

CS 484 – Artificial Intelligence17 Unbiased Learner Provide a hypothesis space capable of representing every teachable concept Every possible subset of the instances X (power set of X) How large is this space? For EnjoySport, there are 96 instances in X The power set is 2 |X| EnjoySport has distinct target concepts Allows disjunctions, conjunctions, and negations Can no longer generalize beyond observed examples

CS 484 – Artificial Intelligence18 Inductive Bias All learning methods have an inductive bias. The inductive bias of a learning method is the set of restrictions on the learning method. Without inductive bias, a learning method could not learn to generalize. A learner that makes no a priori assumptions regarding the identity of the target concept has no rational basis for classifying any unseen instances

CS 484 – Artificial Intelligence19 Bias in Learning Algorithms Rote-Learner: If the instance is found in memory, the stored classification is returned. Otherwise, the system refuses to classify the new instance Find-S: Finds the most specific hypothesis consistent with the training examples. It then uses this hypothesis to classify all subsequent instances

CS 484 – Artificial Intelligence20 Candidate-Elimination Bias Candidate-Elimination will converge to true target concept provided accurate training examples and its initial hypothesis space contains the true target concept Only consider conjunctions of attribute values Cannot represent "Sky = Sunny or Sky = Cloudy" What if the target concept is not contained in the hypothesis space? ExampleSkyAirTempHumidityWindWaterForecastEnjoySport 1SunnyWarmNormalStrongCoolChangeYes 2CloudyWarmNormalStrongCoolChangeYes 3RainyWarmNormalStrongCoolChangeNo

CS 484 – Artificial Intelligence21 Bias of ID3 Choose the first acceptable tree it encounters in its simple-to-complex, hill- climbing search Favors shorter trees over longer ones Selects trees that place the attributes with highest information gain closest to the root Interaction between attribute selection heuristic and training examples makes it difficult to precisely characterize its bias

CS 484 – Artificial Intelligence22 ID3 vs. Candidate Elimination Difference between the types of inductive bias Hypothesis space ID3 searches a complete hypothesis space Inductive bias is a consequence of the ordering of hypotheses by its search strategy Candidate-Elimination searches an incomplete hypothesis space Searches the space completely Inductive bias is a consequence of the expressive power of its hypothesis representation

CS 484 – Artificial Intelligence23 Why Prefer Short Hypotheses? Occam's razor Prefer the simplest hypothesis that fits the data Appling Occam's razor Fewer short hypotheses than long ones, so it is less likely that one will find a short hypothesis that coincidentally fits the training data A 5-node tree is less likely to be a statistical coincidence and prefer this hypothesis over the 500-node hypothesis Problems with this argument By the same argument, you could put many more qualifications on the decision tree. Would that be better? Size is determined by the particular representation used internally by the learner Don't reject Occam's razor all together Evolution will create internal representations that make the learning algorithm's inductive bias a self-fulfilling prophecy, simply because it can alter the representation easier than it can alter the learning algorithm

CS 484 – Artificial Intelligence24 The Problem of Overfitting Black dots represent positive examples, white dots negative. The two lines represent two different hypotheses. In the first diagram, there are just a few items of training data, correctly classified by the hypothesis represented by the darker line. In the second and third diagrams we see the complete set of data, and that the simpler hypothesis which matched the training data less well matches the rest of the data better than the more complex hypothesis, which overfits.

CS 484 – Artificial Intelligence25 The Nearest Neighbor Algorithm (1) This is an example of instance based learning. Instance based learning involves storing training data and using it to attempt to classify new data as it arrives. The nearest neighbor algorithm works with data that consists of vectors of numeric attributes. Each vector represents a point in n-dimensional space.

CS 484 – Artificial Intelligence26 The Nearest Neighbor Algorithm (2) When an unseen data item is to be classified, the Euclidean distance is calculated between this item and all training data. the distance between and is: The classification for the unseen data is usually selected as the one that is most common amongst the few nearest neighbors. Shepard’s method involves allowing all training data to contribute to the classification with their contribution being proportional to their distance from the data item to be classified.