Implementation of Learning Systems Formulating the Problem Engineering the Representation Collecting and Preparing Data Evaluating the Learned Knowledge Gaining User Acceptance Learning System Use the colors of your own choice which suit the background
Inductive Examples A toddler applies inductive principles to generalize and categorize different objects with repeated samples that he/she sees over time For instance everything that is round and bounces is a ball? Make these balls yourself kindly.
Practicality of these General Techniques These general techniques, although very powerful, but not practical in general problem solving They are used as components in more practical learning algorithms For instance Induction is the heart of many learning algorithms
Concept Learning A concept itself is merely a function, which we don’t know yet. We do have some of the inputs and their corresponding outputs. From these input-output pair we would try to find the generic function that generated them.
Concept GOOD STUDENT Two attributes to define a student: Grade and Class Participation Learner acquires examples: Student (GOOD STUDENT): Grade (High) ^ Class Participation (High) Student (GOOD STUDENT): Grade (High) ^ Class Participation (Low) Student (NOT GOOD STUDENT): Grade (Low) ^ Class Participation (High) Student (NOT GOOD STUDENT): Grade (Low) ^ Class Participation (Low) Final Rule for Good Student: Student (GOOD STUDENT): Grade (High) ^ Class Participation (?)
SICK (SK) Attributes: Temperature (T) Blood Pressure (BP) Low (L) High (H) Low (L) Normal (N) High (H) Low (L) Normal (N)
Instance Space (X) X T BP SK x1 L - x2 N x3 H x4 x5 x6 x7 x8 x9 These tables are going to be extensively used in the next slides, in the next lectures also, so be careful with these
Concept as a Function The solution to any problem is a function that converts its inputs to corresponding outputs A concept itself is merely a function, which we don’t know yet. We do have some of the inputs and their corresponding outputs. From these input-output pair we would try to find the generic function that generates these results
Concept Space (C) One of the possible concepts for the concept SICK might be enumerated in the following table: X T BP SK x1 L x2 N x3 H 1 x4 x5 x6 x7 x8 x9
Concept Space But there are a lot of other possibilities besides this one The question is: how many total concepts can be generated out of this given situation The answer is: 2|X| Here 29, since |X| = 9
Concept Space In short the true concept SK is a function defined over the attributes T and BP, such that it gives a 0 or a 1 as output for each of the 9 instances xi belonging to Instance Space X.
Concept Space For any arbitrary concept C, if the following table is the format for representation of each output corresponding to each instance C(x3) C(x6) C(x9) C(x2) C(x5) C(x8) C(x1) C(x4) C(x7)
Concept Space Since we don’t know the true concept yet, so there might be concepts which can produce 29 different outputs, such as: 1 1 1 1 Be careful with the subscripts and superscripts C1 C2 C3 C4 C29
So what is a Concept? Concept is nothing more than a function whose independent variables are the attributes, in this case T and BP Maybe the true concept is some complicated arrangement of conjunctions and disjunctions like: C=< T = H AND BP = H OR T = N AND BP = H OR T = H AND BP = N >
Instance Space (X) X T BP SK x1 L - x2 N x3 H x4 x5 x6 x7 x8 x9
Concept Space (C) One of the possible concepts for the concept SICK might be enumerated in the following table: X T BP SK x1 L x2 N x3 H 1 x4 x5 x6 x7 x8 x9
Concept Space For any arbitrary concept C, the following table is another format for representation of each output corresponding to each instance Each c(xi) is the output of the concept c for that particular instance C(x3) C(x6) C(x9) C(x2) C(x5) C(x8) C(x1) C(x4) C(x7)
Concept Space There two attributes and each can have 3 possible values, there are 29 unique combinations or functions. 1 1 1 1 C1 C2 C3 C4 C29
Training Data Set (D) D T BP SK x1 N L 1 x2 x3
Hypothesis Space (H) The learner has to apply some hypothesis, that introduces a search bias to reduce the size of the concept space This reduced concept space becomes the hypothesis space
Hypothesis Space For example, the most common bias is one that uses the AND relationship between the attributes In other words the hypothesis space uses the conjunctions (AND) of the attributes T and BP i.e. h = <T, BP>
Hypothesis Space H denotes the hypothesis space Here it is the conjunction of attributes T and BP If written in English it would mean: H = <t, bp>: IF “Temperature” = t AND “Blood Pressure” = bp Then H = 1 Otherwise H = 0 In other words, the function gives a 1 output for all conjunctions of T and BP, e.g., H and H, H and L, H and M, etc.
Hypothesis Space h = <H, H>: <temp, bp> BP H 1 N L T
Hypothesis Space h = <L, L>: BP H N L 1 T Notice that this is the C2 that we discussed earlier in the Concept Space section 1
Hypothesis Space H = <T, BP> Where T and BP can take on five values H, N, L (High, Normal, Low) Also ? and Ø ? means that for all values of the input H = 1 (don’t care) Ø means that there will be no value for which H will be 1
Hypothesis Space For example, h1 = <?, ?>: [For any value of T and BP, the person is sick] The person is always sick BP H 1 N L T
Hypothesis Space Similarly h2 = <?, H>: [For any value of T AND for BP = High, the person is sick] Irrespective of temperature, if BP is High, the person is sick BP H 1 N L T
Hypothesis Space The person is never sick h3 = < Ø , Ø >: [For no value of T or BP, the person is sick] The person is never sick BP H N L T
Hypothesis Space Having said all this, how does this still reduce the hypothesis space to 17? Well it’s simple, now each attribute Temp and BP can take 5 values each: L, N, H, ? and Ø So there are 5 x 5 = 25 total number of possible hypotheses
Hypothesis Space Now this is a tremendous reduction from (29 ) 512 to 25 This number can be reduced further There are redundancies within these 25 hypotheses Caused by Ø
Hypothesis Space These redundancies are caused by Ø Whenever there is Ø in any of the inputs and we are considering conjunctions (min) the output will always be 0 If there’s this ‘Ø’ in the T or the BP or both, we’ll have the same hypothesis as the outcome is always, all zeros For a ?: we will either get a full column of 1’s, or a full row of 1’s in the concept matrix representation. For both ?: all 1’s
Hypothesis Space h = < N , Ø >: h = < Ø , Ø >: BP H N L T H N L T h = < N , Ø >: h = < Ø , Ø >: BP H N L T
Concept Learning as Search We assume that the concept lies in the hypothesis space. So we search for a hypothesis belonging to this hypothesis space that best fits the training examples, such that the output given by the hypothesis is same as the true output of concept Hence the search has achieved the learning of the actual concept using the given training set
Concept Learning as Search In short: Assume , search for an that best fits D, such that xi D, h(xi) = c(xi) Where c is the concept we are trying to determine (the output of the training set) H is the hypothesis space D is the training set h is the hypothesis xi is the ith instance of Instance space
Ordering of Hypothesis Space General to Specific Ordering of Hypothesis Space Most General Hypothesis: hg< ?, ? > Most Specific Hypothesis: hs< Ø , Ø >
Ordering of Hypothesis Space SK = < T, BP >, T = { H, N, L } and BP = { H, N, L } < ?, ? > < H, ? > < N, ? > < L, ? > < ?, H > < ?, N > < ?, L > < H, H > < H, N > < H, L > < N, H > < N, N > < N, L > < L, H > < L, N > < L, L > < Ø , Ø >
Find-S Algorithm FIND-S finds the most specific hypothesis possible within the version space given a set of training data Uses the general-to-specific ordering for searching through the hypotheses space
Find-S Algorithm Initialize hypothesis h to the most specific hypothesis in H (the hypothesis space) For each positive training instance x (i.e. output is 1) For each attribute constraint ai in h If the constraint ai is satisfied by x Then do nothing Else Replace ai in h by the next more general constraint that is satisfied by x Output hypothesis h
Find-S Algorithm To illustrate this algorithm, let us assume that the learner is given the sequence of following training examples from the SICK domain: D T BP SK x1 H 1 x2 L x3 N The first step of FIND-S is to initialize hypothesis h to the most specific hypothesis in H: h = < Ø , Ø >
Find-S Algorithm First training example is positive: D T BP SK x1 H 1 But h = < Ø , Ø > fails over this first instance Because h(x1) = 0, since Ø gives us 0 for any attribute value Since h = < Ø , Ø > is so specific that it doesn’t give even one single instance as positive, so we change it to next more general hypothesis that fits this particular first instance x1 of the training data set D to h = < H , H >
Find-S Algorithm SK = < T, BP >, T = { H, N, L } and BP = { H, N, L } < ?, ? > < H, ? > < N, ? > < L, ? > < ?, H > < ?, N > < ?, L > < H, H > < H, N > < H, L > < N, H > < N, N > < N, L > < L, H > < L, N > < L, L > < Ø , Ø >
Find-S Algorithm So the hypothesis still remains: h = < H , H > BP SK x1 H 1 x2 L Upon encountering the second example; in this case a negative example, the algorithm makes no change to h. In fact, the FIND-S algorithm simply ignores every negative example So the hypothesis still remains: h = < H , H >
Find-S Algorithm Final Hypothesis: h = < ?, H > BP SK x1 H 1 x2 L x3 N Final Hypothesis: h = < ?, H > What does this hypothesis state? This hypothesis will term all the future patients which have BP = H as SICK for all the different values of T
Find-S Algorithm < ?, ? > < H, ? > < N, ? > BP SK x1 H 1 x2 L x3 N H 1 < ?, ? > < H, ? > < N, ? > < L, ? > < ?, H > < ?, N > < ?, L > < H, H > < H, N > < H, L > < N, H > < N, N > < N, L > < L, H > < L, N > < L, L > < Ø , Ø >
Candidate-Elimination Algorithm Although FIND-S does find a consistent hypothesis In general, however, there may be more hypotheses consistent with D; of which FIND-S only finds one Candidate-Elimination finds all the hypotheses in the Version Space
Version Space (VS) Version space is a set of all the hypotheses that are consistent with all the training examples By consistent we mean h(xi) = c(xi) , for all instances belonging to training set D
Version Space Let us take the following training set D: BP SK x1 H 1 x2 L x3 N Another representation of this set D: BP H - 1 N L T
Version Space Is there a hypothesis that can generate this D: BP H - 1 N L T One of the consistent hypotheses can be h1 = < H, H > BP H 1 N L T
Version Space There are other hypotheses consistent with D, such as h2 = < H, ? > BP H 1 N L T There’s another hypothesis, h3 = < ?, H > BP H 1 N L T
Version Space Version space is denoted as VS H,D = {h1, h2, h3} This translates as: Version space is a subset of hypothesis space H, composed of h1, h2 and h3, that is consistent with D In other words version space is a group of all hypotheses consistent with D, not just one hypothesis we saw in the previous case
Candidate-Elimination Algorithm Candidate Elimination works with two sets: Set G (General hypotheses) Set S (Specific hypotheses) Starts with: G0 = {< ? , ? >} considers negative examples only S0 = {< Ø , Ø >} considers positive examples only Within these two boundaries is the entire Hypothesis space
Candidate-Elimination Algorithm Intuitively: As each training example is observed one by one The S boundary is made more and more general The G boundary set is made more and more specific This eliminates from the version space any hypotheses found inconsistent with the new training example At the end, we are left with VS
Candidate-Elimination Algorithm Initialize G to the set of maximally general hypotheses in H Initialize S to the set of maximally specific hypotheses in H For each training example d, do If d is a positive example Remove from G any hypothesis inconsistent with d For each hypothesis s in S that is inconsistent with d Remove s from S Add to S all minimal generalization h of s, such that h is consistent with d, and some member of G is more general than h Remove from S any hypothesis that is more general than another one in S If d is a negative example Remove from S any hypothesis inconsistent with d For each hypothesis g in G that is inconsistent with d Remove g from G Add to G all minimal specializations h of g, such that h is consistent with d, and some member of S is more specific than h Remove from G any hypothesis that is less general than another one in G
Candidate-Elimination Algorithm BP SK x1 H 1 x2 L x3 N G0 = {< ?, ? >} most general S0 = {< Ø, Ø >} most specific