Implementation of Learning Systems

Slides:



Advertisements
Similar presentations
Concept Learning and the General-to-Specific Ordering
Advertisements

2. Concept Learning 2.1 Introduction
Università di Milano-Bicocca Laurea Magistrale in Informatica
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Let remember from the previous lesson what is Knowledge representation
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Positional Number Systems
Machine Learning CSE 681 CH2 - Supervised Learning.
Advanced Topics in Propositional Logic Chapter 17 Language, Proof and Logic.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Chapter 2: Concept Learning and the General-to-Specific Ordering.
Section 2.3 Properties of Solution Sets
CpSc 810: Machine Learning Concept Learning and General to Specific Ordering.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
Learning, page 1 CSI 4106, Winter 2005 Symbolic learning Points Definitions Representation in logic What is an arch? Version spaces Candidate elimination.
Machine Learning: Lecture 2
OR Chapter 8. General LP Problems Converting other forms to general LP problem : min c’x  - max (-c)’x   = by adding a nonnegative slack variable.
Machine Learning Concept Learning General-to Specific Ordering
Artificial Intelligence Machine Learning. Learning Learning can be described as normally a relatively permanent change that occurs in behaviour as a result.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
Logical Database Design and the Rational Model
Lecture 7: Constrained Conditional Models
Systems of Equations.
EMGT 6412/MATH 6665 Mathematical Programming Spring 2016
Chapter 2 Concept Learning
Chapter 7. Classification and Prediction
Systems of Linear Equations
Normalization Karolina muszyńska
Concept Learning Machine Learning by T. Mitchell (McGraw-Hill) Chp. 2
CSE543: Machine Learning Lecture 2: August 6, 2014
CS 9633 Machine Learning Concept Learning
Computational Learning Theory
Analytical Learning Discussion (4 of 4):
Systems of Linear Equations
Ordering of Hypothesis Space
Computer Science cpsc322, Lecture 14
Systems of First Order Linear Equations
Data Mining Practical Machine Learning Tools and Techniques
Transformations of Functions
Chap 9. General LP problems: Duality and Infeasibility
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Quantum One.
Version Spaces Learning
Systems of Linear Equations
Machine Learning Chapter 3. Decision Tree Learning
Concept Learning.
Chapter 8. General LP Problems
Instance Space (X) X T BP SK x1 L - x2 N x3 H x4 x5 x6 x7 x8 x9.
Systems of Linear Equations
Chapter 8. General LP Problems
Concept Learning Berlin Chen 2005 References:
Copyright © Cengage Learning. All rights reserved.
Machine Learning Chapter 2
Inductive Learning (2/2) Version Space and PAC Learning
Chapter 8. General LP Problems
Version Space Machine Learning Fall 2018.
Machine Learning Chapter 2
ADDITIONAL ANALYSIS TECHNIQUES
Presentation transcript:

Implementation of Learning Systems Formulating the Problem Engineering the Representation Collecting and Preparing Data Evaluating the Learned Knowledge Gaining User Acceptance Learning System Use the colors of your own choice which suit the background

Inductive Examples A toddler applies inductive principles to generalize and categorize different objects with repeated samples that he/she sees over time For instance everything that is round and bounces is a ball? Make these balls yourself kindly.

Practicality of these General Techniques These general techniques, although very powerful, but not practical in general problem solving They are used as components in more practical learning algorithms For instance Induction is the heart of many learning algorithms

Concept Learning A concept itself is merely a function, which we don’t know yet. We do have some of the inputs and their corresponding outputs. From these input-output pair we would try to find the generic function that generated them.

Concept GOOD STUDENT Two attributes to define a student: Grade and Class Participation Learner acquires examples: Student (GOOD STUDENT): Grade (High) ^ Class Participation (High) Student (GOOD STUDENT): Grade (High) ^ Class Participation (Low) Student (NOT GOOD STUDENT): Grade (Low) ^ Class Participation (High) Student (NOT GOOD STUDENT): Grade (Low) ^ Class Participation (Low) Final Rule for Good Student: Student (GOOD STUDENT): Grade (High) ^ Class Participation (?)

SICK (SK) Attributes: Temperature (T) Blood Pressure (BP) Low (L) High (H) Low (L) Normal (N) High (H) Low (L) Normal (N)

Instance Space (X) X T BP SK x1 L - x2 N x3 H x4 x5 x6 x7 x8 x9 These tables are going to be extensively used in the next slides, in the next lectures also, so be careful with these

Concept as a Function The solution to any problem is a function that converts its inputs to corresponding outputs A concept itself is merely a function, which we don’t know yet. We do have some of the inputs and their corresponding outputs. From these input-output pair we would try to find the generic function that generates these results

Concept Space (C) One of the possible concepts for the concept SICK might be enumerated in the following table: X T BP SK x1 L x2 N x3 H 1 x4 x5 x6 x7 x8 x9

Concept Space But there are a lot of other possibilities besides this one The question is: how many total concepts can be generated out of this given situation The answer is: 2|X| Here 29, since |X| = 9

Concept Space In short the true concept SK is a function defined over the attributes T and BP, such that it gives a 0 or a 1 as output for each of the 9 instances xi belonging to Instance Space X.

Concept Space For any arbitrary concept C, if the following table is the format for representation of each output corresponding to each instance C(x3) C(x6) C(x9) C(x2) C(x5) C(x8) C(x1) C(x4) C(x7)

Concept Space Since we don’t know the true concept yet, so there might be concepts which can produce 29 different outputs, such as: 1 1 1 1 Be careful with the subscripts and superscripts C1 C2 C3 C4 C29

So what is a Concept? Concept is nothing more than a function whose independent variables are the attributes, in this case T and BP Maybe the true concept is some complicated arrangement of conjunctions and disjunctions like: C=< T = H AND BP = H OR T = N AND BP = H OR T = H AND BP = N >

Instance Space (X) X T BP SK x1 L - x2 N x3 H x4 x5 x6 x7 x8 x9

Concept Space (C) One of the possible concepts for the concept SICK might be enumerated in the following table: X T BP SK x1 L x2 N x3 H 1 x4 x5 x6 x7 x8 x9

Concept Space For any arbitrary concept C, the following table is another format for representation of each output corresponding to each instance Each c(xi) is the output of the concept c for that particular instance C(x3) C(x6) C(x9) C(x2) C(x5) C(x8) C(x1) C(x4) C(x7)

Concept Space There two attributes and each can have 3 possible values, there are 29 unique combinations or functions. 1 1 1 1 C1 C2 C3 C4 C29

Training Data Set (D) D T BP SK x1 N L 1 x2 x3

Hypothesis Space (H) The learner has to apply some hypothesis, that introduces a search bias to reduce the size of the concept space This reduced concept space becomes the hypothesis space

Hypothesis Space For example, the most common bias is one that uses the AND relationship between the attributes In other words the hypothesis space uses the conjunctions (AND) of the attributes T and BP i.e. h = <T, BP>

Hypothesis Space H denotes the hypothesis space Here it is the conjunction of attributes T and BP If written in English it would mean: H = <t, bp>: IF “Temperature” = t AND “Blood Pressure” = bp Then H = 1 Otherwise H = 0 In other words, the function gives a 1 output for all conjunctions of T and BP, e.g., H and H, H and L, H and M, etc.

Hypothesis Space h = <H, H>: <temp, bp> BP   H 1 N L T

Hypothesis Space h = <L, L>: BP   H N L 1 T Notice that this is the C2 that we discussed earlier in the Concept Space section 1

Hypothesis Space H = <T, BP> Where T and BP can take on five values H, N, L (High, Normal, Low) Also ? and Ø ? means that for all values of the input H = 1 (don’t care) Ø means that there will be no value for which H will be 1

Hypothesis Space For example, h1 = <?, ?>: [For any value of T and BP, the person is sick] The person is always sick BP   H 1 N L T

Hypothesis Space Similarly h2 = <?, H>: [For any value of T AND for BP = High, the person is sick] Irrespective of temperature, if BP is High, the person is sick BP   H 1 N L T

Hypothesis Space The person is never sick h3 = < Ø , Ø >: [For no value of T or BP, the person is sick] The person is never sick BP   H N L T

Hypothesis Space Having said all this, how does this still reduce the hypothesis space to 17? Well it’s simple, now each attribute Temp and BP can take 5 values each: L, N, H, ? and Ø So there are 5 x 5 = 25 total number of possible hypotheses

Hypothesis Space Now this is a tremendous reduction from (29 ) 512 to 25 This number can be reduced further There are redundancies within these 25 hypotheses Caused by Ø

Hypothesis Space These redundancies are caused by Ø Whenever there is Ø in any of the inputs and we are considering conjunctions (min) the output will always be 0 If there’s this ‘Ø’ in the T or the BP or both, we’ll have the same hypothesis as the outcome is always, all zeros For a ?: we will either get a full column of 1’s, or a full row of 1’s in the concept matrix representation. For both ?: all 1’s

Hypothesis Space h = < N , Ø >: h = < Ø , Ø >: BP H N L T   H N L T h = < N , Ø >: h = < Ø , Ø >: BP   H N L T

Concept Learning as Search We assume that the concept lies in the hypothesis space. So we search for a hypothesis belonging to this hypothesis space that best fits the training examples, such that the output given by the hypothesis is same as the true output of concept Hence the search has achieved the learning of the actual concept using the given training set

Concept Learning as Search In short: Assume , search for an that best fits D, such that xi D, h(xi) = c(xi) Where c is the concept we are trying to determine (the output of the training set) H is the hypothesis space D is the training set h is the hypothesis xi is the ith instance of Instance space

Ordering of Hypothesis Space General to Specific Ordering of Hypothesis Space Most General Hypothesis: hg< ?, ? > Most Specific Hypothesis: hs< Ø , Ø >

Ordering of Hypothesis Space SK = < T, BP >, T = { H, N, L } and BP = { H, N, L } < ?, ? > < H, ? > < N, ? > < L, ? > < ?, H > < ?, N > < ?, L > < H, H > < H, N > < H, L > < N, H > < N, N > < N, L > < L, H > < L, N > < L, L > < Ø , Ø >

Find-S Algorithm FIND-S finds the most specific hypothesis possible within the version space given a set of training data Uses the general-to-specific ordering for searching through the hypotheses space

Find-S Algorithm Initialize hypothesis h to the most specific hypothesis in H (the hypothesis space) For each positive training instance x (i.e. output is 1) For each attribute constraint ai in h If the constraint ai is satisfied by x Then do nothing Else Replace ai in h by the next more general constraint that is satisfied by x Output hypothesis h

Find-S Algorithm To illustrate this algorithm, let us assume that the learner is given the sequence of following training examples from the SICK domain: D T BP SK x1 H 1 x2 L x3 N The first step of FIND-S is to initialize hypothesis h to the most specific hypothesis in H: h = < Ø , Ø >

Find-S Algorithm First training example is positive: D T BP SK x1 H 1 But h = < Ø , Ø > fails over this first instance Because h(x1) = 0, since Ø gives us 0 for any attribute value Since h = < Ø , Ø > is so specific that it doesn’t give even one single instance as positive, so we change it to next more general hypothesis that fits this particular first instance x1 of the training data set D to h = < H , H >

Find-S Algorithm SK = < T, BP >, T = { H, N, L } and BP = { H, N, L } < ?, ? > < H, ? > < N, ? > < L, ? > < ?, H > < ?, N > < ?, L > < H, H > < H, N > < H, L > < N, H > < N, N > < N, L > < L, H > < L, N > < L, L > < Ø , Ø >

Find-S Algorithm So the hypothesis still remains: h = < H , H > BP SK x1 H 1 x2 L Upon encountering the second example; in this case a negative example, the algorithm makes no change to h. In fact, the FIND-S algorithm simply ignores every negative example So the hypothesis still remains: h = < H , H >

Find-S Algorithm Final Hypothesis: h = < ?, H > BP SK x1 H 1 x2 L x3 N Final Hypothesis: h = < ?, H > What does this hypothesis state? This hypothesis will term all the future patients which have BP = H as SICK for all the different values of T

Find-S Algorithm < ?, ? > < H, ? > < N, ? > BP SK x1 H 1 x2 L x3 N H 1 < ?, ? > < H, ? > < N, ? > < L, ? > < ?, H > < ?, N > < ?, L > < H, H > < H, N > < H, L > < N, H > < N, N > < N, L > < L, H > < L, N > < L, L > < Ø , Ø >

Candidate-Elimination Algorithm Although FIND-S does find a consistent hypothesis In general, however, there may be more hypotheses consistent with D; of which FIND-S only finds one Candidate-Elimination finds all the hypotheses in the Version Space

Version Space (VS) Version space is a set of all the hypotheses that are consistent with all the training examples By consistent we mean h(xi) = c(xi) , for all instances belonging to training set D

Version Space Let us take the following training set D: BP SK x1 H 1 x2 L x3 N Another representation of this set D: BP   H - 1 N L T

Version Space Is there a hypothesis that can generate this D: BP   H - 1 N L T One of the consistent hypotheses can be h1 = < H, H > BP   H 1 N L T

Version Space There are other hypotheses consistent with D, such as h2 = < H, ? > BP   H 1 N L T There’s another hypothesis, h3 = < ?, H > BP   H 1 N L T

Version Space Version space is denoted as VS H,D = {h1, h2, h3} This translates as: Version space is a subset of hypothesis space H, composed of h1, h2 and h3, that is consistent with D In other words version space is a group of all hypotheses consistent with D, not just one hypothesis we saw in the previous case

Candidate-Elimination Algorithm Candidate Elimination works with two sets: Set G (General hypotheses) Set S (Specific hypotheses) Starts with: G0 = {< ? , ? >} considers negative examples only S0 = {< Ø , Ø >} considers positive examples only Within these two boundaries is the entire Hypothesis space

Candidate-Elimination Algorithm Intuitively: As each training example is observed one by one The S boundary is made more and more general The G boundary set is made more and more specific This eliminates from the version space any hypotheses found inconsistent with the new training example At the end, we are left with VS

Candidate-Elimination Algorithm Initialize G to the set of maximally general hypotheses in H Initialize S to the set of maximally specific hypotheses in H For each training example d, do If d is a positive example Remove from G any hypothesis inconsistent with d For each hypothesis s in S that is inconsistent with d Remove s from S Add to S all minimal generalization h of s, such that h is consistent with d, and some member of G is more general than h Remove from S any hypothesis that is more general than another one in S If d is a negative example Remove from S any hypothesis inconsistent with d For each hypothesis g in G that is inconsistent with d Remove g from G Add to G all minimal specializations h of g, such that h is consistent with d, and some member of S is more specific than h Remove from G any hypothesis that is less general than another one in G

Candidate-Elimination Algorithm BP SK x1 H 1 x2 L x3 N G0 = {< ?, ? >} most general S0 = {< Ø, Ø >} most specific