Computing & Information Sciences Kansas State University Lecture 01 of 42 Wednesday, 24 January 2008 William H. Hsu Department of Computing and Information.

Slides:



Advertisements
Similar presentations
Concept Learning and the General-to-Specific Ordering
Advertisements

2. Concept Learning 2.1 Introduction
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
CS 484 – Artificial Intelligence1 Announcements Project 1 is due Tuesday, October 16 Send me the name of your konane bot Midterm is Thursday, October 18.
Università di Milano-Bicocca Laurea Magistrale in Informatica
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.
Chapter 2 - Concept learning
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Concept Learning and Version Spaces
Part I: Classification and Bayesian Learning
Lecture 32 of 42 Machine Learning: Basic Concepts
Artificial Intelligence 6. Machine Learning, Version Space Method
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 19, 2001.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 25 January 2008 William.
CS 484 – Artificial Intelligence1 Announcements List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October.
For Friday Read chapter 18, sections 3-4 Homework: –Chapter 14, exercise 12 a, b, d.
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning Chapter 11.
Machine Learning CSE 681 CH2 - Supervised Learning.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 19, 2000.
General-to-Specific Ordering. 8/29/03Logic Based Classification2 SkyAirTempHumidityWindWaterForecastEnjoySport SunnyWarmNormalStrongWarmSameYes SunnyWarmHighStrongWarmSameYes.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Gun Ho Lee Soongsil University, Seoul.
CS 445/545 Machine Learning Winter, 2012 Course overview: –Instructor Melanie Mitchell –Textbook Machine Learning: An Algorithmic Approach by Stephen Marsland.
Computing & Information Sciences Kansas State University Monday, 13 Nov 2006CIS 490 / 730: Artificial Intelligence Lecture 34 of 42 Monday, 13 November.
1 Concept Learning By Dong Xu State Key Lab of CAD&CG, ZJU.
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. 课程基本信息  主讲教师:陈昱 Tel :  助教:程再兴, Tel :  课程网页:
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
Chapter 2: Concept Learning and the General-to-Specific Ordering.
Computing & Information Sciences Kansas State University Friday, 10 Nov 2006CIS 490 / 730: Artificial Intelligence Lecture 33 of 42 Friday, 10 November.
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
CpSc 810: Machine Learning Concept Learning and General to Specific Ordering.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
Computing & Information Sciences Kansas State University Wednesday, 15 Nov 2006CIS 490 / 730: Artificial Intelligence Lecture 35 of 42 Wednesday, 15 November.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Thursday, August 26, 1999 William.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 34 of 41 Wednesday, 10.
Machine Learning: Lecture 2
Machine Learning Concept Learning General-to Specific Ordering
Kansas State University Department of Computing and Information Sciences CIS 690: Implementation of High-Performance Data Mining Systems Thursday, 20 May.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CS464 Introduction to Machine Learning1 Concept Learning Inducing general functions from specific training examples is a main issue of machine learning.
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
Chapter 2 Concept Learning
Computational Learning Theory
Concept Learning Machine Learning by T. Mitchell (McGraw-Hill) Chp. 2
CSE543: Machine Learning Lecture 2: August 6, 2014
CS 9633 Machine Learning Concept Learning
Analytical Learning Discussion (4 of 4):
Machine Learning Chapter 2
Ordering of Hypothesis Space
Machine Learning: Lecture 3
Computational Learning Theory
Concept Learning.
IES 511 Machine Learning Dr. Türker İnce (Lecture notes by Prof. T. M
Concept Learning Berlin Chen 2005 References:
Machine Learning Chapter 2
Implementation of Learning Systems
Version Space Machine Learning Fall 2018.
Machine Learning Chapter 2
Presentation transcript:

Computing & Information Sciences Kansas State University Lecture 01 of 42 Wednesday, 24 January 2008 William H. Hsu Department of Computing and Information Sciences, KSU KSOL course pages: / Course web site: Instructor home page: Reading for Next Class: Syllabus and Course Intro Handout: Chapters 1-2, Mitchell (at K-State Union Copy Center) Introduction to Machine Learning and the Version Space Formalism

Computing & Information Sciences Kansas State University Review: Course Administration Course Pages (KSOL): / Class Web Page: Instructional Addresses (for Topics in AI, substitute 830 for 732)  (always use this to reach instructor and TA)  (this goes to everyone) Instructor: William Hsu, Nichols 213  Office phone: ; home phone:  IM: AIM/MSN/YIM hsuwh/rizanabsith, ICQ / , Google banazir  Office hours: after class Mon/Wed/Fri; other times by appointment Graduate Teaching Assistant: Jing Xia  Office location: Nichols 213a  Office hours: to be announced on class web board Grading Policy  Midterm: 15% (in-class, open-book); final (take-home): 20%  Machine problems, problem sets (6 of 8): 30%; term project: 20%  Paper reviews (10 of 12): 10%; class participation: 5% (HW, Q&A)

Computing & Information Sciences Kansas State University Learning Functions Notation and Definitions  Instance: x = (x 1, x 2, …, x n ), sometimes x j, 1 ≤ j ≤ m with x ji, 1 ≤ i ≤ n  Instance space X such that x  X  Data set: D = {x 1, x 2, …, x m } where x j = (x j1, x j2, …, x jn ) Clustering  Mapping from old x = (x 1, x 2, …, x n ) to new x’ = (x 1 ’, x 2 ’, …, x k ’), k << n  Attributes x i ’ of new instance not necessarily named  Idea: project instance space X into lower dimension X’  Goal: keep groups of similar X together in X’ Regression  Idea: given independent variable x, dependent variables y = f(x), fit f  Goal: given new (previously unseen) x, approximate f(x) Classification  Similar to regression, except that f is boolean- or nominal-valued  “Curve fitting” figurative – approximator may be logical formula Skills  Evaluation functions  Policies

Computing & Information Sciences Kansas State University Prototypical Concept Learning Tasks Given  Instances X: possible days, each described by attributes Sky, AirTemp, Humidity, Wind, Water, Forecast  Target function c  EnjoySport: X  H  {{Rainy, Sunny, Cloudy}  {Warm, Cold}  {Normal, High}  {None-Mild, Strong}  {Cool, Warm}  {Same, Change}}  {0, 1}  Hypotheses H: conjunctions of literals (e.g., )  Training examples D: positive and negative examples of the target function Determine  Hypothesis h  H such that h(x) = c(x) for all x  D  Such h are consistent with the training data Training Examples  Assumption: no missing X values  Noise in values of c (contradictory labels)?

Computing & Information Sciences Kansas State University The Inductive Learning Hypothesis Fundamental Assumption of Inductive Learning Informal Statement  Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples  Definitions deferred: sufficiently large, approximate well, unobserved Later: Formal Statements, Justification, Analysis  Different operational definitions – see: Chapters 5 – 7, Mitchell (1997)  Statistical: sufficient statistics  Probabilistic: distributions for training and validation/test data  Computational: sample complexity, confidence and error bounds Next: How to Find This Hypothesis?

Computing & Information Sciences Kansas State University h1h1 h2h2 h3h3 x1x1 x2x2 Instances, Hypotheses, and the Partial Ordering Less-General-Than Instances XHypotheses H x 1 = x 2 = h 1 = h 2 = h 3 = h 1  P h 3 h 2  P h 3 Specific General  P  Less-General-Than (corresponding set of instances: Subset-Of) False   True  

Computing & Information Sciences Kansas State University Find-S Algorithm 1. Initialize h to the most specific hypothesis in H H: the hypothesis space (partially ordered set under relation Less-Specific-Than) 2. For each positive training instance x For each attribute constraint a i in h IF constraint a i in h is satisfied by x THEN do nothing ELSE replace a i in h by next more general constraint satisfied by x 3. Output hypothesis h

Computing & Information Sciences Kansas State University h1h1 h 0 =  h 2,3 h4h4 Hypothesis Space Search by Find-S Instances XHypotheses H x 1 =, + x 2 =, + x 3 =, - x 4 =, + h 0 = h 1 = h 2 = h 3 = h 4 = Shortcomings of Find-S  Can’t tell whether it has learned concept  Can’t tell when training data inconsistent  Picks a maximally specific h (why?)  Depending on H, there might be several! x3x3 x1x1 x2x2 x4x4

Computing & Information Sciences Kansas State University Version Spaces Definition: Consistent Hypotheses  A hypothesis h is consistent with a set of training examples D of target concept c if and only if h(x) = c(x) for each training example in D.  Consistent (h, D)    D. h(x) = c(x) Given  Hypothesis space H  Data set D: set of training examples Definition  Version space VS H,D with respect to H, D  Subset of hypotheses from H consistent with all training examples in D  VS H,D  { h  H | Consistent (h, D) }

Computing & Information Sciences Kansas State University List-Then-Eliminate Algorithm 1. Initialization: VersionSpace  list containing every hypothesis in H 2. For each training example Remove from VersionSpace any hypothesis h for which h(x)  c(x) 3. Output the list of hypotheses in VersionSpace S: Example Version Space G:  P : less general (fewer instances)

Computing & Information Sciences Kansas State University Hypothesis Spaces As Lattices Meet Semilattice  Every pair of hypotheses h i and h j has greatest lower bound (GLB) h i  h j  Is H meet semilattice?    some ø Join Semilattice  Every pair of hypotheses h i and h j has least upper bound (LUB) h i  h j Is H join semilattice?   all ? (Full) Lattice  Every pair of hypotheses has GLB h i  h j and LUB h i  h j  Both meet semilattice and join semilattice  Partial ordering Less-General-Than 

Computing & Information Sciences Kansas State University Representing Version Spaces As Lattices Definition: General (Upper) Boundary  General boundary G of version space VS H,D : set of most general members  Most general  maximal elements of VS H,D  “set of necessary conditions” Definition: Specific (Lower) Boundary  Specific boundary S of version space VS H,D : set of least general members  Most specific  minimal elements of VS H,D  “set of sufficient conditions” Version Space  VS H,D  consistent poset (partially-ordered subset of H)  Every member of version space lies between S and G  VS H,D  { h  H |  s  S, g  G. s  P h  P g },  P  Less-General-Than “Version space is defined as set of hypotheses sandwiched between specific s and general g (given data)” 

Computing & Information Sciences Kansas State University Candidate Elimination Algorithm [1] 1. Initialization G 0   most general hypothesis in H, denoted { } S 0    least general hypotheses in H, denoted { } 2. For each training example d If d is a positive example (Update-S)// generalize Remove from G any hypotheses inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S// “move S upwards” Add to S all minimal generalizations h of s such that 1. h is consistent with d 2. Some member of G is more general than h (These are least upper bounds, or joins, s  d, in VS H,D ) Remove from S any hypothesis that is more general than another hypothesis in S (remove any dominating elements) 

Computing & Information Sciences Kansas State University (continued) If d is a negative example (Update-G)// specialize Remove from S any hypotheses inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G// “move G downwards” Add to G all minimal specializations h of g such that 1. h is consistent with d 2. Some member of S is less general than h (These are greatest lower bounds, or meets, g  d, in VS H,D ) Remove from G any hypothesis that is less general than another hypothesis in G (remove any dominated elements) Candidate Elimination Algorithm [2]

Computing & Information Sciences Kansas State University Example Trace d 1 : d 2 : d 3 : d 4 : = G 2 S0S0 G0G0 = S 3 = G 1 G4G4 S4S4 S1S1 G3G3 S2S2

Computing & Information Sciences Kansas State University What Next Training Example? Active Learning: What Query Should The Learner Make Next? How Should These Be Classified?  S: G:

Computing & Information Sciences Kansas State University What Justifies This Inductive Leap? Example: Inductive Generalization  Positive example:  Induced S: Why Believe We Can Classify The Unseen?  e.g.,  When is there enough information (in a new case) to make a prediction?

Computing & Information Sciences Kansas State University Inductive Bias –Any preference for one hypothesis over another, besides consistency –Example: H  conjunctive concepts with don’t cares –What concepts can H not express? (Hint: what are its syntactic limitations?) Idea –Choose unbiased H’: expresses every teachable concept (i.e., power set of X) –Recall: | A  B | = | B | | A | (A = X; B = {labels}; H’ = A  B) –{{Rainy, Sunny, Cloudy}  {Warm, Cold}  {Normal, High}  {None-Mild, Strong}  {Cool, Warm}  {Same, Change}}  {0, 1} An Exhaustive Hypothesis Language –Consider: H’ = disjunctions (  ), conjunctions (  ), negations (¬) over H –| H’ | = 2 ( ) = 2 96 ; | H | = 1 + ( ) = 973 What Are S, G For The Hypothesis Language H’? –S  disjunction of all positive examples –G  conjunction of all negated negative examples An Unbiased Learner

Computing & Information Sciences Kansas State University Summary Points Concept Learning as Search through H  Hypothesis space H as a state space  Learning: finding the correct hypothesis General-to-Specific Ordering over H  Partially-ordered set: Less-Specific-Than (More-General-Than) relation  Upper and lower bounds in H Version Space Candidate Elimination Algorithm  S and G boundaries characterize learner’s uncertainty  Version space can be used to make predictions over unseen cases Learner Can Generate Useful Queries Next Lecture: When and Why Are Inductive Leaps Possible?

Computing & Information Sciences Kansas State University Terminology Hypotheses  Classification function or classifier – nominal-valued function f  Regression function or regressor – integer or real-valued function f  Hypothesis – proposed function h believed to be similar to c (or f)  Instance / unlabeled example – tuples of the form x = (x 1, x 2, … x m )  Example / labeled instance – tuples of the form The Version Space Algorithm  Consistent hypothesis - one that correctly predicts observed examples  Version space - space of all currently consistent (or satisfiable) hypotheses  Meet semilattice (GLB  ), join semilattice (LUB  ), lattice (GLB and LUB)  Algorithms: Find-S, List-Then-Eliminate, candidate elimination Inductive Learning  Inductive generalization - process of generating hypotheses that describe cases not yet observed  Inductive learning hypothesis – when generalization is feasible  Inductive bias – any preference for one h over another other than consistency with data D