Computational Learning Theory Part 1: Preliminaries 1.

Slides:



Advertisements
Similar presentations
Concept Learning and the General-to-Specific Ordering
Advertisements

2. Concept Learning 2.1 Introduction
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Inferences The Reasoning Power of Expert Systems.
Copyright © Cengage Learning. All rights reserved.
Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination.
CS 484 – Artificial Intelligence1 Announcements Project 1 is due Tuesday, October 16 Send me the name of your konane bot Midterm is Thursday, October 18.
Università di Milano-Bicocca Laurea Magistrale in Informatica
Decision Trees. DEFINE: Set X of Instances (of n-tuples x = ) –E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, Water, Forecast.
18 LEARNING FROM OBSERVATIONS
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.
Chapter 2 - Concept learning
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Concept Learning and Version Spaces
Artificial Intelligence 6. Machine Learning, Version Space Method
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 19, 2001.
Computing & Information Sciences Kansas State University Lecture 01 of 42 Wednesday, 24 January 2008 William H. Hsu Department of Computing and Information.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 25 January 2008 William.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
CS 484 – Artificial Intelligence1 Announcements List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October.
CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias.
For Friday Read chapter 18, sections 3-4 Homework: –Chapter 14, exercise 12 a, b, d.
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning Chapter 11.
General-to-Specific Ordering. 8/29/03Logic Based Classification2 SkyAirTempHumidityWindWaterForecastEnjoySport SunnyWarmNormalStrongWarmSameYes SunnyWarmHighStrongWarmSameYes.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Gun Ho Lee Soongsil University, Seoul.
1 Concept Learning By Dong Xu State Key Lab of CAD&CG, ZJU.
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. 课程基本信息  主讲教师:陈昱 Tel :  助教:程再兴, Tel :  课程网页:
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
Chapter 2: Concept Learning and the General-to-Specific Ordering.
CpSc 810: Machine Learning Concept Learning and General to Specific Ordering.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
1 Inductive Learning (continued) Chapter 19 Slides for Ch. 19 by J.C. Latombe.
For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 2-Concept Learning (1/3) Eduardo Poggi
Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Thursday, August 26, 1999 William.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 34 of 41 Wednesday, 10.
Machine Learning: Lecture 2
Machine Learning Concept Learning General-to Specific Ordering
Kansas State University Department of Computing and Information Sciences CIS 690: Implementation of High-Performance Data Mining Systems Thursday, 20 May.
Artificial Intelligence Machine Learning. Learning Learning can be described as normally a relatively permanent change that occurs in behaviour as a result.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
CS464 Introduction to Machine Learning1 Concept Learning Inducing general functions from specific training examples is a main issue of machine learning.
Concept Learning and The General-To Specific Ordering
Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).
Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19,
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
Chapter 2 Concept Learning
Concept Learning Machine Learning by T. Mitchell (McGraw-Hill) Chp. 2
CSE543: Machine Learning Lecture 2: August 6, 2014
CS 9633 Machine Learning Concept Learning
Analytical Learning Discussion (4 of 4):
Machine Learning Chapter 2
Ordering of Hypothesis Space
Machine Learning: Lecture 3
Concept Learning.
IES 511 Machine Learning Dr. Türker İnce (Lecture notes by Prof. T. M
Concept Learning Berlin Chen 2005 References:
Machine Learning Chapter 2
Inductive Learning (2/2) Version Space and PAC Learning
Implementation of Learning Systems
Version Space Machine Learning Fall 2018.
Machine Learning Chapter 2
Presentation transcript:

Computational Learning Theory Part 1: Preliminaries 1

2 Much of human learning involves acquiring general concepts from specific training examples (this is called inductive learning) Example: Concept of ball * red, round, small * green, round, small * red, round, medium Complicated concepts: “situations in which I should study more to pass the exam” VERSION SPACE Concept Learning by Induction

3 Each concept can be thought of as a Boolean-valued function whose value is true for some inputs and false for all the rest (e.g. a function defined over all the animals, whose value is true for birds and false for all the other animals) This lecture is about the problem of automatically inferring the general definition of some concept, given examples labeled as members or nonmembers of the concept. This task is called concept learning, or approximating (inferring) a Boolean valued function from examples VERSION SPACE Concept Learning by Induction

4 Target Concept to be learnt: “Days on which Aldo enjoys his favorite water sport” Training Examples present are: VERSION SPACE Concept Learning by Induction

5 The training examples are described by the values of seven “Attributes” The task is to learn to predict the value of the attribute EnjoySport for an arbitrary day, based on the values of its other attributes VERSION SPACE Concept Learning by Induction

6 The possible concepts are called Hypotheses and we need an appropriate representation for the hypotheses Let the hypothesis be a conjunction of constraints on the attribute-values VERSION SPACE Concept Learning by Induction: Hypothesis Representation

7 If sky = sunny  temp = warm  humidity = ?  wind = strong  water = ?  forecast = same then Enjoy Sport = Yes else Enjoy sport = No Alternatively, this can be written as: { sunny, warm, ?, strong, ?, same} VERSION SPACE Concept Learning by Induction: Hypothesis Representation

8 For each attribute, the hypothesis will have either ?Any value is acceptable ValueAny single value is acceptable  No value is acceptable VERSION SPACE Concept Learning by Induction: Hypothesis Representation

9 If some instance (example/observation) satisfies all the constraints of a hypothesis, then it is classified as positive (belonging to the concept) The most general hypothesis is {?, ?, ?, ?, ?, ?} It would classify every example as a positive example The most specific hypothesis is { , , , , ,  } It would classify every example as negative VERSION SPACE Concept Learning by Induction: Hypothesis Representation

10 Alternate hypothesis representation could have been Disjunction of several conjunction of constraints on the attribute-values Example: {sunny, warm, normal, strong, warm, same}  {sunny, warm, high, strong, warm, same}  {sunny, warm, high, strong, cool, change} VERSION SPACE Concept Learning by Induction: Hypothesis Representation

11 Another alternate hypothesis representation could have been Conjunction of constraints on the attribute-values where each constraint may be a disjunction of values Example: {sunny, warm, normal  high, strong, warm  cool, same  change} VERSION SPACE Concept Learning by Induction: Hypothesis Representation

12 Yet another alternate hypothesis representation could have incorporated negations Example: {  sunny, warm,  (normal  high), ?, ?, ?} VERSION SPACE Concept Learning by Induction: Hypothesis Representation

13 By selecting a hypothesis representation, the space of all hypotheses (that the program can ever represent and therefore can ever learn) is implicitly defined In our example, the instance space X can contain = 96 distinct instances There are = 5120 syntactically distinct hypotheses. Since every hypothesis containing even one  classifies every instance as negative, hence semantically distinct hypotheses are: = 973 VERSION SPACE Concept Learning by Induction: Hypothesis Representation

14 Most practical learning tasks involve much larger, sometimes infinite, hypothesis spaces VERSION SPACE Concept Learning by Induction: Hypothesis Representation

15 Concept learning can be viewed as the task of searching through a large space of hypotheses implicitly defined by the hypothesis representation The goal of this search is to find the hypothesis that best fits the training examples VERSION SPACE Concept Learning by Induction: Search in Hypotheses Space

16 Once a hypothesis that best fits the training examples is found, we can use it to predict the class label of new examples The basic assumption while using this hypothesis is: Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples VERSION SPACE Concept Learning by Induction: Basic Assumption

17 If we view learning as a search problem, then it is natural that our study of learning algorithms will examine different strategies for searching the hypothesis space Many algorithms for concept learning organize the search through the hypothesis space by relying on a general to specific ordering of hypotheses VERSION SPACE Concept Learning by Induction: General to Specific Ordering

18 Example: Consider h1 = {sunny, ?, ?, strong, ?, ?} h2 = {sunny, ?, ?, ?, ?, ?} any instance classified positive by h1 will also be classified positive by h2 (because it imposes fewer constraints on the instance) Hence h2 is more general than h1 and h1 is more specific than h2 VERSION SPACE Concept Learning by Induction: General to Specific Ordering

19 Consider the three hypotheses h1, h2 and h3 VERSION SPACE Concept Learning by Induction: General to Specific Ordering Neither h1 nor h3 is more general than the other h2 is more general than both h1 and h3

20 How to find a hypothesis consistent with the observed training examples? - A hypothesis is consistent with the training examples if it correctly classifies these examples One way is to begin with the most specific possible hypothesis, then generalize it each time it fails to cover a positive training example (i.e. classifies it as negative) The algorithm based on this method is called Find-S VERSION SPACE Find-S Algorithm

21 We say that a hypothesis covers a positive training example if it correctly classifies the example as positive A positive training example is an example of the concept to be learnt Similarly a negative training example is not an example of the concept VERSION SPACE Find-S Algorithm

22 VERSION SPACE Find-S Algorithm

23 VERSION SPACE Find-S Algorithm

24 The nodes shown in the diagram are the possible hypotheses allowed by our hypothesis representation scheme Note that our search is guided by the positive examples and we consider only those hypotheses which are consistent with the positive training examples The search moves from hypothesis to hypothesis, searching from the most specific to progressively more general hypotheses VERSION SPACE Find-S Algorithm

25 At each step, the hypothesis is generalized only as far as necessary to cover the new positive example Therefore, at each stage the hypothesis is the most specific hypothesis consistent with the training examples observed up to this point Hence, it is called Find-S VERSION SPACE Find-S Algorithm

26 Note that the algorithm simply ignores every negative example However, since at each step our current hypothesis is maximally specific it will never cover (falsely classify) any negative example. In other words, it will be always consistent with each negative training example However the data must be noise free and our hypothesis representation should be such that the true concept can be described by it VERSION SPACE Find-S Algorithm

27 Version Space is the set of hypotheses consistent with the training examples of a problem Find-S algorithm finds one hypothesis present in the Version Space, however there may be others VERSION SPACE Definition: Version Space

28 Version Space is the set of hypotheses consistent with the training examples of a problem VERSION SPACE Definition: Version Space

29 This algorithm first initializes the version space to contain all hypotheses possible, then eliminate any hypothesis found inconsistent with any training example The version space of candidate hypotheses thus shrinks as more examples are observed, until ideally just one hypothesis remains that is consistent with all the observed examples VERSION SPACE List-then-Eliminate Algorithm

30 For the Enjoy Sport data we can list 973 possible hypotheses Then we can test each hypothesis to see whether it confirms with our training data set or not VERSION SPACE List-then-Eliminate Algorithm

31 For this data we will be left with the following hypotheses h1 = {Sunny, Warm, ?, Strong, ?, ?} h2 = {Sunny, ?, ?, Strong, ?, ?} h3 = {Sunny, Warm, ?, ?, ?, ?} h4 = {?, Warm, ?, Strong, ?, ?} h5 = {Sunny, ?, ?, ?, ?, ?} h6 = {?, Warm, ?, ?, ?, ?} Note that the Find-S algorithm is able to find only h1 VERSION SPACE List-then-Eliminate Algorithm

32 If insufficient data is available to narrow the version space to a single hypothesis, then the algorithm can output the entire set of hypotheses consistent with the observed data It has the advantage that it guarantees to output all the hypotheses consistent with the training data Unfortunately, it requires exhaustive listing of all hypotheses – an unrealistic requirement for practical problems VERSION SPACE List-then-Eliminate Algorithm

33 The Candidate Elimination algorithm instead of listing all the possible members of the version space, employs a much more compact representation The version space is represented by its most general (maximally general) and most specific (maximally specific) members These members form the general and specific boundary sets that delimit the version space. Every other member of the version space lies between these boundaries VERSION SPACE Candidate Elimination Algorithm

34 VERSION SPACE Candidate Elimination Algorithm

35 It begins by initializing the version space to the set of all hypotheses, by initializing the G & S boundary sets as {?, ?, …, ?, ?} and { , , …, ,  } respectively As each training example is considered, the S boundary is generalized and the G boundary is specialized, to eliminate from the version space any hypotheses found inconsistent with the new training example VERSION SPACE Candidate Elimination Algorithm

36 VERSION SPACE Candidate Elimination Algorithm

37 VERSION SPACE Candidate Elimination Algorithm

38 VERSION SPACE Candidate Elimination Algorithm

39 S0 = { , , , , ,  } G0 = {?, ?, ?, ?, ?, ?} VERSION SPACE Candidate Elimination Algorithm: Example

40 S0 = { , , , , ,  } S1 = {sunny, warm, normal, strong, warm, same} G0 = G1 = {?, ?, ?, ?, ?, ?} VERSION SPACE Candidate Elimination Algorithm: Example

41 S0 = { , , , , ,  } S1 = {sunny, warm, normal, strong, warm, same} S2 = {sunny, warm, ?, strong, warm, same} G0 = G1 = G2 = {?, ?, ?, ?, ?, ?} VERSION SPACE Candidate Elimination Algorithm: Example

42 S0 = { , , , , ,  } S1 = {sunny, warm, normal, strong, warm, same} S2 = S3 = {sunny, warm, ?, strong, warm, same} G3 {sunny, ?,?,?,?,?} {?, warm, ?,?,?,?} {?, ?, ?, ?, same} {?, ?, ?, ?, ?, ?} VERSION SPACE Candidate Elimination Algorithm: Example

43 S0 = { , , , , ,  } S1 = {sunny, warm, normal, strong, warm, same} S2 = S3 = {sunny, warm, ?, strong, warm, same} S4 = {sunny, warm, ?, strong, ?, ?} G4 {sunny, ?,?,?,?,?} {?, warm, ?,?,?,?} G3 {sunny, ?,?,?,?,?} {?, warm, ?,?,?,?} {?, ?, ?, ?, same} {?, ?, ?, ?, ?, ?} VERSION SPACE Candidate Elimination Algorithm: Example

44 VERSION SPACE Candidate Elimination Algorithm: Example

45 Suppose the 2 nd example is presented as negative S0 = { , , , , ,  } S1 = S2 = {sunny, warm, normal, strong, warm, same} G2 = {?, ?, Normal, ?, ?, ?} G0 = G1 = {?, ?, ?, ?, ?, ?} VERSION SPACE Candidate Elimination Algorithm: Noisy Data

46 When the 4 th example arrives the only general hypothesis will be wiped off G2 = G3 = {?, ?, Normal, ?, ?, ?} G0 = G1 = {?, ?, ?, ?, ?, ?} VERSION SPACE Candidate Elimination Algorithm: Noisy Data

47 The algorithm will fail if the target concept cannot be described in the hypothesis representation Our hypothesis representation: conjunction of attribute values Example sunny warm normal strong cool changeYes cloudy warm normal strong cool change Yes rainy warm normal strong cool change No VERSION SPACE Candidate Elimination Algorithm: Is target concept present in the hypothesis representation?

48 Example sunny warm normal strong cool changeYes cloudy warm normal strong cool change Yes rainy warm normal strong cool change No Our representation is unable to represent disjunctive target concepts such as: sunny or cloudy VERSION SPACE Candidate Elimination Algorithm: Is target concept present in the hypothesis representation?

49 The obvious solution to the problem is to provide a hypothesis space capable of representing every teachable concept (by allowing arbitrary disjunctions, conjunctions, negations of attributes and hypotheses to form new hypotheses) However this raises the problem that the algorithm is not able to generalize beyond the training instances Example: Let there be three positive instances x1, x2, & x3 and 2 negative instances x4 and x5 VERSION SPACE Candidate Elimination Algorithm: Is target concept present in the hypothesis representation?

50 Sections – 2.6 of T. Mitchell VERSION SPACE Reference