Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Spring 2004.

Slides:



Advertisements
Similar presentations
Learning from Observations Chapter 18 Section 1 – 3.
Advertisements

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
CS 4700: Foundations of Artificial Intelligence
Knowledge in Learning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 19 Spring 2004.
Machine Learning CPSC 315 – Programming Studio Spring 2009 Project 2, Lecture 5.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2005.
Cooperating Intelligent Systems
Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Learning from Observations Chapter 18 Section 1 – 4.
18 LEARNING FROM OBSERVATIONS
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18.
Learning From Observations
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
LEARNING DECISION TREES
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
CS 4700: Foundations of Artificial Intelligence
Learning: Introduction and Overview
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Part I: Classification and Bayesian Learning
CS 4700: Foundations of Artificial Intelligence
Induction of Decision Trees (IDT) CSE 335/435 Resources: – –
Machine Learning CPS4801. Research Day Keynote Speaker o Tuesday 9:30-11:00 STEM Lecture Hall (2 nd floor) o Meet-and-Greet 11:30 STEM 512 Faculty Presentation.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Inductive learning Simplest form: learn a function from examples
Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 CSI 5388:Topics in Machine Learning Inductive Learning: A Review.
LEARNING DECISION TREES Yılmaz KILIÇASLAN. Definition - I Decision tree induction is one of the simplest, and yet most successful forms of learning algorithm.
Learning from observations
Learning from Observations Chapter 18 Through
CHAPTER 18 SECTION 1 – 3 Learning from Observations.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
Learning from Observations Chapter 18 Section 1 – 3, 5-8 (presentation TBC)
Learning from Observations Chapter 18 Section 1 – 3.
CPS 270: Artificial Intelligence Machine learning Instructor: Vincent Conitzer.
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
Machine Learning II Decision Tree Induction CSE 573.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Chapter 18 Section 1 – 3 Learning from Observations.
Learning From Observations Inductive Learning Decision Trees Ensembles.
CS-424 Gregory Dudek Lecture 14 Learning –Inductive inference –Probably approximately correct learning.
Anifuddin Azis LEARNING. Why is learning important? So far we have assumed we know how the world works Rules of queens puzzle Rules of chess Knowledge.
Decision Tree Learning CMPT 463. Reminders Homework 7 is due on Tuesday, May 10 Projects are due on Tuesday, May 10 o Moodle submission: readme.doc and.
Learning from Observations
Learning from Observations
Machine Learning Inductive Learning and Decision Trees
Introduce to machine learning
Presented By S.Yamuna AP/CSE
Chapter 11: Learning Introduction
CH. 2: Supervised Learning
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Learning from Observations
CS639: Data Management for Data Science
Lecture 14 Learning Inductive inference
Learning from Observations
Decision trees One possible representation for hypotheses
Machine Learning: Decision Tree Learning
Decision Trees - Intermediate
Presentation transcript:

Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Spring 2004

CS 471/598 by H. Liu2 Learning agents Improve their behavior through diligent study of their own experiences. Acting -> Experience -> Better Acting We’ll study how to make a learning agent to learn; what is needed for learning; and some representative methods of learning from observations

CS 471/598 by H. Liu3 A general model What are the components of a learning agent? Learning element - learn and improve (Fig 2.15) Performance element - an agent itself to perceive & act Problem generator - suggest some exploratory actions Critic - provide feedback how the agent is doing The design of an LA is affected by four issues: prior info, feedback, representation, performance

CS 471/598 by H. Liu4 What do we need Components of the performance element Each component should be learnable given feedback Representation of the components Propositional Logic, FOL, or others Available feedback Supervised, Reinforcement, Unsupervised Prior knowledge Nil, some, (Why not all?) Put it all together as learning some functions

CS 471/598 by H. Liu5 Inductive Learning Data described by examples an example is a pair (x, f(x)) Induction - given a collection of examples of f, return a function h that approximates f. Data in Fig 18.3 Concepts about learning (explained using Fig 18.1)  Hypothesis  Bias Learning incrementally or in batch

CS 471/598 by H. Liu6 Some questions about inductive learning Are there many forms of inductive learning? We’ll learn some Can we achieve both expressiveness and efficiency? How can one possibly know that one’s learning algorithm has produced a theory that will correctly predict the future? If one does not, how can one say that the algorithm is any good?

CS 471/598 by H. Liu7 Learning decision trees A decision tree takes as input an object described by a set of properties and outputs yes/no “decision”. One of the simplest and yet most successful forms of learning To make a decision “wait” or “not wait”, we need information such as … (page 654 for 10 attributes for the data set in Fig 18.3) Patrons(Full)^WaitEstimate(0-10)^Hungry(N)=>WillWait

CS 471/598 by H. Liu8 Let’s make a decision Where to start?

CS 471/598 by H. Liu9 Expressiveness of a DT Continued from page 7 - A possible DT (e.g., Fig 18.2 ) The decision tree language is essentially propositional, with each attribute test being a proposition. Any Boolean functions can be written as a decision tree (truth tables DTs) DTs can represent many functions with much smaller trees, but not for all Boolean functions (parity, majority)

CS 471/598 by H. Liu10 How many different functions are in the set of all Boolean functions on n attributes? How to find consistent hypotheses in the space of all possible ones? And which one is most likely the best?

CS 471/598 by H. Liu11 Inducing DTs from examples Extracting a pattern (DTs) means being able to describe a large number of cases in a concise way - a consistent & concise tree. Applying Occam’s razor: the most likely hypothesis is the simplest one that is consistent with all observations. How to find the smallest DT? Examine the most important attribute first (Fig 18.4) Algorithm (Fig 18.5, page 658) Another DT (Fig 18.6)

CS 471/598 by H. Liu12 Choosing the best attribute A computational method - information theory Information - informally, the more surprise you have, the more information you have; mathematically, I(P(v1),…,P(vn)) = sum[-P(vi)logP(vi)]  I(1/2,1/2) = 1  I(0,1) = (1,0) = 0 Information alone can’t help much to answer “what is the correct classification?”.

CS 471/598 by H. Liu13 Information gain - the difference between the original and the new info requirement: Remainder(A) = p1*I(B1)+…+pn*I(Bn) where p1+…+pn = 1 Gain(A) = I(A) - Remainder(A)

CS 471/598 by H. Liu14 Which attribute? Revisit the example of “Wait” or “Not Wait” using your favorite 2 attributes.

CS 471/598 by H. Liu15 Assessing the performance A fair assessment: the one the learner has not seen. Errors Training and test sets: Divide the data into two sets Learn on the training set Test on the test set If necessary, shuffle the data and repeat Learning curve - “happy graph” (Fig 18.7)

CS 471/598 by H. Liu16 Practical use of DT learning BP’s use of GASOIL Learning to fly on a flight simulator An industrial strength system - Quinlan’s C4.5 Who’s the next hero?

CS 471/598 by H. Liu17 Some issues of DT applications Missing values Multivalued attributes Continuous-valued attributes

CS 471/598 by H. Liu18 Why learning works? How can one possibly know that his/her learning algorithm will correctly predict the future? How do we know that h is close enough to f without knowing f? Computational learning theory has provided some answers. The basic idea is that because any wrong h will make an incorrect prediction, it will be found out with high probability after a small number of examples. So, if h is consistent with a sufficient number of examples, it is unlikely to to seriously wrong - probably approximately correct (PAC). Stationarity assumption – Tr and Te have the same probability distribution

CS 471/598 by H. Liu19 Summary Learning is essential for intelligent agents dealing with the unknowns improving its capability over time All types of learning can be considered as learning an accurate representation h of f. Inductive learning - f from data to h Decision trees - deterministic Boolean functions