Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.

Slides:



Advertisements
Similar presentations
Learning from Observations Chapter 18 Section 1 – 3.
Advertisements

Machine Learning: Intro and Supervised Classification
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Decision Tree Approach in Data Mining
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Spring 2004.
Machine Learning CPSC 315 – Programming Studio Spring 2009 Project 2, Lecture 5.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2005.
Cooperating Intelligent Systems
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Learning from Observations Chapter 18 Section 1 – 4.
18 LEARNING FROM OBSERVATIONS
Learning From Observations
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.
Induction of Decision Trees
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18.
Three kinds of learning
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Machine Learning: Symbol-Based
Learning….in a rather broad sense: improvement of performance on the basis of experience Machine learning…… improve for task T with respect to performance.
Learning: Introduction and Overview
MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning CPS4801. Research Day Keynote Speaker o Tuesday 9:30-11:00 STEM Lecture Hall (2 nd floor) o Meet-and-Greet 11:30 STEM 512 Faculty Presentation.
Machine Learning Version Spaces Learning. 2  Neural Net approaches  Symbolic approaches:  version spaces  decision trees  knowledge discovery  data.
Inductive learning Simplest form: learn a function from examples
For Wednesday No new reading Homework: –Chapter 18, exercises 3, 4, 7.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
For Friday No reading No homework. Program 4 Exam 2 A week from Friday Covers 10, 11, 13, 14, 18, Take home due at the exam.
CpSc 810: Machine Learning Decision Tree Learning.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Learning from observations
Learning from Observations Chapter 18 Through
CHAPTER 18 SECTION 1 – 3 Learning from Observations.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
Learning from observations
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Today’s Topics Learning Decision Trees (Chapter 18) –We’ll use d-trees to introduce/motivate many general issues in ML (eg, overfitting reduction) “Forests”
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Machine Learning Concept Learning General-to Specific Ordering
More Symbolic Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Chapter 18 Section 1 – 3 Learning from Observations.
Learning From Observations Inductive Learning Decision Trees Ensembles.
Anifuddin Azis LEARNING. Why is learning important? So far we have assumed we know how the world works Rules of queens puzzle Rules of chess Knowledge.
Decision Tree Learning CMPT 463. Reminders Homework 7 is due on Tuesday, May 10 Projects are due on Tuesday, May 10 o Moodle submission: readme.doc and.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Learning from Observations
Learning from Observations
Machine Learning Inductive Learning and Decision Trees
Introduce to machine learning
Presented By S.Yamuna AP/CSE
Chapter 11: Learning Introduction
Machine Learning: Lecture 3
Learning from Observations
CS639: Data Management for Data Science
Lecture 14 Learning Inductive inference
Learning from Observations
Presentation transcript:

Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College

What is learning? Process which changes a system to enable it to do the same task or tasks drawn from the same population more efficiently next time (improving performance). Examples (increasing abstraction) –Rote learning –Performance enhancement (problem solving) –Classification –Knowledge acquisition

Designing a Learning Agent Which components of the performance element are to be learned? What feedback is available to learn this? What representation is used?

Symbolic vs. Non-Symbolic learning If you “open the system up” after it has learned, can the knowledge be easily expressed? Symbolic uses accessible internal representations Non-symbolic uses inaccessible internal representations

Learning Examples Classified

Inductive learning Given a set of examples (x, y) where x is input, y is output Learn a function y=f(x) that –Returns correct results for all (x,y) pairs in the training set of examples –Generalizes well -- returns correct results for x values not in the training set

Ockham’s Razor If two functions fit, pick the simplest There is an inevitable tradeoff between the complexity of the hypothesis function and the degree of fit to the data.

Decision Trees Each node is a question Each leaf is a decision hair? legs? snakefrog Pet? Cat Lion

Learning Decision Trees from Examples Silly example: should I buy this car? 1. red VW (foreign, small, red) YES 2.green Cadillac (domestic, large, green) NO 3.blue Subaru (foreign, small, blue) YES 4. blue Mercedes (foreign, large, blue) NO 5. red Saturn (domestic, small, red) YES

Three types of learning Supervised –The system learns a function from examples of inputs and outputs –Correct outputs must be available during training Unsupervised –The system learns without feedback, based on global optimization criterion Reinforcement –System is rewarded (or punished) for decisions –This is the most general, models most human learning (except school).

Recursive Splitting Start with one big class –If there are some yes, some no, choose an attribute to split them (we now have 2 recursive problems) –Otherwise, we are done When all recursive problems are solved, the remaining classes will have all YES or all NO Each decision used for a split is a branch on the tree.

Recursive splitting example Initial class { (foreign,small,red,yes), (domestic, large, green, no), (foreign, small, blue, yes), (foreign,large,blue,no), (domestic,small,red,yes) } Split on size: { (foreign,small,red,yes), (foreign, small, blue, yes), (domestic,small,red,yes) } {(domestic, large, green, no), (foreign,large,blue,no)}

Choosing an attribute to split on We want to split on an attribute that gives us information –If an attribute splits the class into all pos/all neg that’s best! –Otherwise: if an attribute splits the class roughly evenly, and one subclass is mostly pos, one mostly neg, that’s pretty good

A formal notation of “best” Goal is to maximize information gain –Number of “bits” of information still needed after the split – number of bits needed before the split –Information I(p,n) = –( (p/p+n)lg(p/p+n) + (n/p+n)lg(n/p+n) ) –We need to subtract the sum of the informations for the split, weighted by the number of items in each Example: (4,2) -> (3, 0) and (1,2) Value is I(4,2) - 1/2 * I(3,0) - 1/2*I(1,2)

Updating our recursive algorithm Defun tree(examples) –If all examples are positive (or negative) return examples –Else Choose best attribute using Information gain Divide examples into sublists based on examples Return (cons attribute (mapcar #’Tree (list of sublists))) Result will be tree with each element being an attribute and a list of branches.

Assessing a Learning System Collect a large set of examples Divide into test and training sets (disjoint) Apply learning algorithm to training set (only) Measure its performance on test set (only) Repeat for different sizes of training sets Repeat for different randomly selected test sets of each training set

Learning Curve Training set size (% of total) % correct

Learning Depends on Training If the test set is not a random subset of the training set, strange results can occur! –What if test set contains only small cars, training set only large cars? If the overall set of examples doesn’t “cover the space” the wrong concept will be learned –Tank and weather example

Overfitting is Bad An algorithm is fully trained if it classifies every test case perfectly But what if every leaf is a set with only one element? –Training set is perfectly classified –Each element of test set creates a new category-- we have no experience! –Avoid by requiring minimum information gain value in order to split a set

One example at a time At any given point we have a current hypothesis that explains the examples Positive examples (that were incorrectly classified as negative) extend the hypothesis until it includes the new example Negative examples (that were incorrectly classified as positive) restrict the hypothesis until it does not include the new example

Extending and Restricting To extend a hypothesis, “add in” the new information –Extended hypothesis = hypothesis | pos. example To restrict a hypothesis “subtract out” the new information –Extended hypothesis = hypothesis & not(neg. ex.)

Candidate elimination (car example) 1. red VW (foreign, small, red) YES Min hypothesis: all foreign, small red things are good cars Max hypothesis: everything is a good car 2. green Cadillac (domestic, large, green) NO Min hypothesis: all foreign, small red things are good cars Max hypothesis: everything foreign or small or not green is a good car 3. blue Subaru (foreign, small, blue) YES Min hypothesis: all foreign, small (red or blue) things are good cars Max hypothesis: everything foreign or small or not green is a good car

Candidate elimination (car example) cont. 4. blue Mercedes (foreign, large, blue) NO Min hypothesis: all foreign, small, (red or blue) things are good cars Max hypothesis: everything small or (domestic and not green) or (foreign and not blue) or red is a good car 5. red Saturn (domestic, small, red) YES Min hypothesis: all small, (red or blue) things are good cars Max hypothesis: everything small or (domestic and not green) or (foreign and not blue) or red is a good car

Version Space Learning Consider the set of all hypotheses consistent with the examples –This will be the “range” from min to max in the prior examples –This is called a version space, and is updated after each example Least-commitment algorithm –We take no great leaps, but only make the minimal changes required for the concept to fit the examples.

Evaluating these algorithms Decision Tree learning is faster... But you need to have all examples in advance Decision trees make disjunctions easier to express Both are highly dependent on having the right attributes available Both are highly susceptible to noise (incorrect training examples)