 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 3.

 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 3. Inductive Learning from Examples: Version space learning 3. Inductive Learning from Examples: Version space learning

 2003, G.Tecuci, Learning Agents Laboratory 2 Overview Concept learning from examples Version spaces and the candidate elimination algorithm The LEX system Discussion Instances, concepts and generalization Recommended reading The learning bias

 2003, G.Tecuci, Learning Agents Laboratory 3 Basic ontological elements: instances and concepts An instance is a representation of a particular entity from the application domain. A concept is a representation of a set of instances. government_of_Britain_1943government_of_US_1943 state_government instance_of “state_government” represents the set of all entities that are governments of states. This set includes “government_of_US_1943” and “government_of_Britain_1943” which are called positive examples. government_of_US_1943 government_of_Britain_1943 state_government “instance_of” is the relationship between an instance and the concept to which it belongs. An entity which is not an instance of a concept is called a negative example of that concept.

 2003, G.Tecuci, Learning Agents Laboratory 4 An instance is a representation of a specific entity, such as “US_1943” or the “government_of_US_1943”. A concept is a representation of a set of instances. For example, “state_government” represents the set of all entities that are governments of states. This set includes “government_of_US_1943”. One may use a concept name to refer to an unspecified individual from the set represented by the concept. For example, one could say that ?X is a state government, meaning that ?X might be “government_of_US_1943” or “government_of_Britain_1943” or any other instance from the set “state_government”. The relationship between an instance and the concept to which it belongs is called “instance_of”.

 2003, G.Tecuci, Learning Agents Laboratory 5 Concept generality A concept P is more general than another concept Q if and only if the set of instances represented by P includes the set of instances represented by Q. state_government democratic_government representative_ democracy totalitarian_ government parliamentary_ democracy Example: “subconcept_of” is the relationship between a concept and a more general concept. state_government subconcept_of democratic_government

 2003, G.Tecuci, Learning Agents Laboratory 6 A concept represents a set of instances. The larger this set is, the more general the concept is said to be. For example, “democratic_government” represents the set of all the state governments that are democratic. The set of democratic governments is included into the set of state governments. Therefore the concept “democratic_government” is said to be less general than the concept “state_government”. The formal relationship between them is called “subconcept_of”. Similarly, the concept “state_government” is said to be more general than “democratic_government”.

 2003, G.Tecuci, Learning Agents Laboratory 7 A generalization hierarchy feudal_god_ king_government totalitarian_ government democratic_ government theocratic_ government state_government military_ dictatorship police_ state religious_ dictatorship representative_ democracy parliamentary_ democracy theocratic_ democracy monarchy governing_body other_state_ government dictator deity_figure chief_and_ tribal_council autocratic_ leader democratic_ council_ or_board group_governing_body other_ group_ governing_ body government_ of_Italy_1943 government_ of_Germany_1943 government_ of_US_1943 government_ of_Britain_1943 ad_hoc_ governing_body established_ governing_body other_type_of_ governing_body fascist_ state communist_ dictatorship religious_ dictatorship government_ of_USSR_1943

 2003, G.Tecuci, Learning Agents Laboratory 8 The instances and the concepts are organized into generalization hierarchies like this hierarchy of governing bodies. Notice, however, that the generalization hierarchies are not always as strict as this one, where each concept is a subconcept of only one concept. For instance, the concept “strategic_raw_material” is both a subconcept of “raw_material” and a subconcept of “strategically_essential_resource_or_infrastructure_element”.

 2003, G.Tecuci, Learning Agents Laboratory 10 Empirical inductive concept learning from examples Illustration Positive examples of cups: P1 P2... Negative examples of cups: N1 … A description of the cup concept: has-handle(x),... Given Learn Approach: Compare the positive and the negative examples of a concept, in terms of their similarities and differences, and learn the concept as a generalized description of the similarities of the positive examples. Concept Learning allows the agent to recognize other entities as being instances of the learned concept. Why is Concept Learning important?

 2003, G.Tecuci, Learning Agents Laboratory 11 The learning problem Given a language of instances; a language of generalizations; a set of positive examples (E1,..., En) of a concept a set of negative examples (C1,..., Cm) of the same concept a learning bias other background knowledge Determine a concept description which is a generalization of the positive examples that does not cover any of the negative examples Purpose of concept learning Predict if an instance is an example of the learned concept.

 2003, G.Tecuci, Learning Agents Laboratory 12 Generalization and specialization rules A generalization rule is a rule that transforms an expression into a more general expression. A specialization rule is a rule that transforms an expression into a less general expression. The reverse of any generalization rule is a specialization rule. Learning a concept from examples is based on generalization and specialization rules.

 2003, G.Tecuci, Learning Agents Laboratory 13 Indicate several generalizations of the following sentence: Students who have lived in Fairfax for more then 3 years. Discussion Indicate several specializations of the following sentence: Students who have lived in Fairfax for more then 3 years.

 2003, G.Tecuci, Learning Agents Laboratory 14 Generalization (and specialization) rules Climbing the generalization hierarchy Dropping condition Generalizing numbers Adding alternatives Turning constants into variables

 2003, G.Tecuci, Learning Agents Laboratory 15 Turning constants into variables Generalizes an expression by replacing a constant with a variable. ?O1 is multi_group_force number_of_subgroups 5 ?O1 is multi_group_force number_of_subgroups ?N1 generalization specialization ?N1  5 5  ?N1 The set of multi_group_forces with 5 subgroups. The set of multi_group_forces with any number of subgroups. Allied_forces_operation_Husky Axis_forces_Sicily Japan_1944_Armed_Forces

 2003, G.Tecuci, Learning Agents Laboratory 16 The top expression represents the following concept: the set of multi group forces with 5 subgroups. This set contains, for instance, Axis_forces_Sicily from the Sicily_1943 scenario (the invasion of Sicily by the Allied Forces in 1943). By replacing 5 with a variable ?N1 that can take any value, we generalize this concept to the following one: the set of multi group forces with any number of subgroups. In particular ?N1 could be 5. Therefore the second concept includes the first one. Conversely, by replacing ?N1 with 5, we specialize the bottom concept to the top one. The important thing to notice here is that by a simple syntactic operation (transforming a number into a variable) we can generalize a concept. This is one way in which an agent generalizes concepts.

 2003, G.Tecuci, Learning Agents Laboratory 17 Climbing the generalization hierarchies Generalizes an expression by replacing a concept with a more general one. ?O1 issingle_state_force has_as_governing_body ?O2 ?O2 isrepresentative_democracy generalizationspecialization representative_democracy  democratic_government The set of single state forces governed by representative democracies democratic_government  representative_democracy ?O1 issingle_state_force has_as_governing_body ?O2 ?O2 isdemocratic_government The set of single state forces governed by democracies democratic_government representative_democracyparliamentary_democracy

 2003, G.Tecuci, Learning Agents Laboratory 18 One can also generalize an expression by replacing a concept from its description with a more general concept, according to some generalization hierarchy. The reverse operation, of replacing a concept with a less general one, leads to the specialization of an expression. The agent can also generalize a concept by dropping a condition. That is, by dropping a constraint that its instances must satisfy. This rule is illustrated in the next slide.

 2003, G.Tecuci, Learning Agents Laboratory 19 Dropping conditions Generalizes an expression by removing a constraint from its description. ?O1ismulti_member_force has_international_legitimacy“yes” ?O1ismulti_member_force generalizationspecialization The set of multi-member forces that have international legitimacy. The set of multi-member forces (that may or may not have international legitimacy).

 2003, G.Tecuci, Learning Agents Laboratory 20 Extending intervals Generalizes an expression by replacing a number with an interval, or by replacing an interval with a larger interval. ?O1 is multi_group_force number_of_subgroups ?N1 ?N1 is-in [3.. 7] generalization specialization [3.. 7]  5 5  [3.. 7] ?O1 is multi_group_force number_of_subgroups ?N1 ?N1 is-in [2.. 10] generalization specialization [2.. 10]  [3.. 7] [3.. 7]  [2.. 10] ?O1 is multi_group_force number_of_subgroups 5 The set of multi_group_forces with exactly 5 subgroups. The set of multi_group_forces with at least 3 subgroups and at most 7 subgroups. The set of multi_group_forces with at most 10 subgroups.

 2003, G.Tecuci, Learning Agents Laboratory 21 A concept may also be generalized by replacing a number with an interval containing it, or by replacing an interval with a larger interval. The reverse operations specialize the concept. Yet another generalization rule, which is illustrated in the next slide, is to add alternatives. According to the expression from the top of this slide, ?O1 is any alliance. Therefore this expression represents the following concept: the set of all alliances. This concept can be generalized by adding another alternative for ?O1, namely the alternative of being a coalition. Now ?O1 could be either an alliance or coalition. Consequently, the expression from the bottom of this slide represents the following more general concept: the set of all alliances and coalitions.

 2003, G.Tecuci, Learning Agents Laboratory 22 Adding alternatives ?O1 isalliance has_as_member?O2 ?O1 isalliance OR coalition has_as_member?O2 generalizationspecialization The set of alliances. The set including both the alliances and the coalitions. Generalizes an expression by replacing a concept C1 with the union (C1 U C2), which is a more general concept.

 2003, G.Tecuci, Learning Agents Laboratory 23 Generalization and specialization rules Climbing the generalization hierarchies Dropping conditions Extending intervals Adding alternatives Turning constants into variables Descending the generalization hierarchies Adding conditions Reducing intervals Dropping alternatives Turning variables into constants

 2003, G.Tecuci, Learning Agents Laboratory 24 Operational definition of generalization/specialization Generalization/specialization of two concepts Least general generalization of two concepts Minimally general generalization of two concepts Types of generalizations and specializations Maximally general specialization of two concepts

 2003, G.Tecuci, Learning Agents Laboratory 25 Operational definition of generalization Operational definition: A concept P is said to be more general than another concept Q if and only if Q can be transformed into P by applying a sequence of generalization rules. Non-operational definition: A concept P is said to be more general than another concept Q if and only if the set of instances represented by P includes the set of instances represented by Q. This definition is not operational because it requires to show that each instance I from a potential infinite set Q is also in the set P. Why isn’t this an operational definition?

 2003, G.Tecuci, Learning Agents Laboratory 26 Generalization of two concepts Operational definition: The concept Cg is a generalization of the concepts C1 and C2 if and only if both C1 and C2 can be transformed into Cg by applying generalization rules (assuming the existence of a complete set of rules). Definition: The concept Cg is a generalization of the concepts C1 and C2 if and only if Cg is more general than C1 and Cg is more general than C2. MANEUVER-UNIT ARMORED-UNIT INFANTRY-UNIT MANEUVER-UNIT is a generalization of ARMORED-UNIT and INFANTRY-UNIT How would you define this? Is the above definition operational?

 2003, G.Tecuci, Learning Agents Laboratory 27 Generalization of two concepts: example ?O1ISCOURSE-OF-ACTION TOTAL-NUMBER-OF-OFFENSIVE-ACTIONS 10 TYPEOFFENSIVE C1: ?O1ISCOURSE-OF-ACTION TOTAL-NUMBER-OF-OFFENSIVE-ACTIONS 5 C2: ?O1ISCOURSE-OF-ACTION TOTAL-NUMBER-OF-OFFENSIVE-ACTIONS ?N1 ?N1IS-IN[5 … 10] C: Generalize 10 to [5.. 10] Drop “?O1 TYPE OFFENSIVE” Generalize 5 to [5.. 10] Remark: COA=Course of Action

 2003, G.Tecuci, Learning Agents Laboratory 28 Specialization of two concepts Operational definition: The concept Cs is a specialization of the concepts C1 and C2 if and only if both C1 and C2 can be transformed into Cs by applying specialization rules (or Cs can be transformed into both C1 and into C2 by applying generalization rules). This assumes a complete set of rules. Definition: The concept Cs is a specialization of the concepts C1 and C2 if and only if Cs is less general than C1 and Cs is less general than C2. MILITARY- MANEUVER MILITARY- ATTACK PENETRATE-MILITARY-TASK is a specialization of MILITARY-MANEUVER and MILITARY-ATTACK PENETRATE- MILITARY-TASK

 2003, G.Tecuci, Learning Agents Laboratory 29 Other useful definitions The concept G is a minimally general generalization of A and B if and only if G is a generalization of A and B, and G is not more general than any other generalization of A and B. If there is only one minimally general generalization of two concepts A and B, then this generalization is called the least general generalization of A and B. The concept C is a maximally general specialization of two concepts A and B if and only if C is a specialization of A and B and no other specialization of A and B is more general than C. Minimally general generalization Least general generalization Maximally general specialization Specialization of a concept with a negative example

 2003, G.Tecuci, Learning Agents Laboratory 30 Concept learning: another illustration Learned concept: Cautious learner Allied_Forces_1943isequal_partner_multi_state_alliance has_as_member US_1943 Positive examples: Negative examples: European_Axis_1943isdominant_partner_multi_state_alliance has_as_member Germany_1943 Somali_clans_1992isequal_partner_multi_group_coalition has_as_member Isasq_somali_clan_1992 ?O1ismulti_state_alliance has_as_member ?O2 ?O2issingle_state_force A multi-state alliance that has as member a single state force.

 2003, G.Tecuci, Learning Agents Laboratory 31 What could be said about the predictions of a cautious learner? Discussion Concept to be learned Concept learned by a cautions learner

 2003, G.Tecuci, Learning Agents Laboratory 32 There are many different generalizations of the positive examples that do not cover the negative examples. For instance, a cautious learner might attempt to learn the most specific generalization. When such a learner classifies an instance as a positive example of a concept, this classification is most likely to be correct. However, the learner may more easily make mistakes when classifying an instance as a negative example (this type of error is called “error of omission” because some positive examples are omitted – are classified as negative examples).

 2003, G.Tecuci, Learning Agents Laboratory 33 Concept learning: yet another illustration Aggressive learner Learned concept: Allied_Forces_1943isequal_partner_multi_state_alliance has_as_member US_1943 Positive examples: Negative examples: European_Axis_1943isdominant_partner_multi_state_alliance has_as_member Germany_1943 Somali_clans_1992isequal_partner_multi_group_coalition has_as_member Isasq_somali_clan_1992 ?O1ismulti_member_force has_as_member ?O2 ?O2issingle_state_force A multi-member force that has as member a single state force.

 2003, G.Tecuci, Learning Agents Laboratory 34 What could be said about the predictions of an aggressive learner? Discussion Concept learned by an aggressive learner Concept to be learned

 2003, G.Tecuci, Learning Agents Laboratory 35 A more aggressive learner, on the other hand, might attempt to learn the most general generalization. When such a learner classifies an instance as a negative example of a concept, this classification is most likely to be correct. However, the learner may more easily make mistakes when classifying an instance as a positive example (this type of error is called “error of commission” because some negative examples are committed – are classified as positive examples).

 2003, G.Tecuci, Learning Agents Laboratory 36 How could one synergistically integrate a cautious learner with an aggressive learner to take advantage of their qualities to compensate for each other’s weaknesses? Discussion Concept to be learned Concept learned by a cautions learner Concept learned by an aggressive learner Concept to be learned Concept learned by an aggressive learner Concept to be learned Concept learned by a cautions learner

 2003, G.Tecuci, Learning Agents Laboratory 38 Basic idea of version space concept learning UB + Initialize the lower bound to the first positive example (LB=E1) and the upper bound (UB) to the most general generalization of E1. LB + + UB LB If the next example is a positive one, then generalize LB as little as possible to cover it. _ + + UB LB If the next example is a negative one, then specialize UB as little as possible to uncover it and to remain more general than LB. _ + + UB=LB _ _ + + … Repeat the above two steps with the rest of examples until UB=LB. This is the learned concept. Consider the examples E1, …, E2 in sequence.

 2003, G.Tecuci, Learning Agents Laboratory 39 The main idea of the version space concept learning is to combine the relative strengths of the cautious and aggressive learners, to compensate for each other’s weaknesses. It attempts to always find both the most general generalization and the least general generalization. It has been demonstrated that if there are enough negative and positive examples, and if a generalization exists, then it will eventually be found as the convergence of the two generalizations.

 2003, G.Tecuci, Learning Agents Laboratory 40 The candidate elimination algorithm (Mitchell, 1978) Let us suppose that we have an example e1 of a concept to be learned. Then, any sentence of the representation language which is more general than this example, is a plausible hypothesis for the concept. H = { h | h is more general than e1 } The version space is:

 2003, G.Tecuci, Learning Agents Laboratory 41 The candidate elimination algorithm (cont.) more general UB LB more specific As new examples and counterexamples are presented to the program, candidate concepts are eliminated from H. This is practically done by updating the set G (which is the set of the most general elements in H) and the set S (which is the set of the most specific elements in H).

 2003, G.Tecuci, Learning Agents Laboratory 42 Version spaces and the candidate elimination algorithm General presentation This is a concept learning method based on exhaustive search. It was developed by Mitchell and his colleagues. Let us suppose that we have an example e1 of a concept to be learned. Then, any sentence of the representation language which is more general than this example, is a plausible hypothesis for the concept. H is the set of the concepts covering the example e1. The following figure is an intuitive representation of the version space H (each hypothesis being represented as a point in the network): Because the more general than relation is a partial ordering relation, one may represent the version spaces H by its boundaries: H = { h | h is more general than e1 and h is less general than eg } or H = {S, G} As new examples and counterexamples are presented to the program, candidate concepts are eliminated from H. This is practically done by updating the set G (which is the set of the most general elements in H) and the set S (which is the set of the most specific elements in H): Thus, the version space H, is the set of all concept descriptions that are consistent with all the training instances seen so far. When the set H contains only one candidate concept, the desired concept has been found. The set H of all the plausible hypotheses for the concept to be learned, is called the version space: H = { h | h is more general than e1 } Let S be the set containing the example e1, and G be the set containing the most general description of the representation language which is more general than e1: S = { e1 }, G = { eg }

 2003, G.Tecuci, Learning Agents Laboratory 43 The candidate elimination algorithm 1.Initialize S to the first positive example and G to its most general generalization 2. Accept a new training instance I If I is a positive example then - remove from G all the concepts that do not cover I; - generalize the elements in S as little as possible to cover I but remain less general than some concept in G; - keep in S the minimally general concepts. If I is a negative example then - remove from S all the concepts that cover I; - specialize the elements in G as little as possible to uncover I and be more general than at least one element from S; - keep in G the maximally general concepts. 3. Repeat 2 until G=S and they contain a single concept C (this is the learned concept)

 2003, G.Tecuci, Learning Agents Laboratory 44 Illustration of the candidate elimination algorithm Language of instances: (shape, size) shape: {ball, brick, cube} size: {large, small} Learning process: Input examples: ball small + shape size class ball large + brick small – cube large – -(brick, small) G = {(ball, any-size) (any-shape, large)} 2 -(cube, large) G = {(ball, any-size)} 3 G = {(any-shape, any-size)} S = {(ball, large)} +(ball, large) 1 1 +(ball, small) S = {(ball, any-size)} || 4 Language of generalizations: (shape, size) shape: {ball, brick, cube, any-shape} size: {large, small, any-size}

 2003, G.Tecuci, Learning Agents Laboratory 45 Another illustration of the learning algorithm Let us suppose that the positive and the negative examples are objects used in the process of loudspeakers' manufacturing. All that it is known about these objects (the background knowledge) is the following generalization hierarchy: Let us also consider that the concept to be learned represents that set of objects that could be used to clean the membrane of a loudspeaker. The positive examples of this concept are objects which can be used to clean a membrane: alcohol, acetone, air-press The negative examples of this concept are objects which cannot be used to clean a membrane: ventilator, emery- paper The problem is to determine the concept that covers all the positive examples (i.e. alcohol, acetone, air-press) and none of the negative examples (i.e. ventilator, emery-paper). The version space method is iterative, that is, it analyzes the examples one after the other, in the order in which they are presented. The first example should be a positive one. Let us suppose that the examples are presented in the following order: instance1(+): alcohol instance2(+): acetone instance3(-): ventilator instance4(+): air-press instance5(-): emery-paper Step 1: instance1(+): alcohol Any concept which is more general than this example, is a plausible hypotheses for the concept to be learned:

 2003, G.Tecuci, Learning Agents Laboratory 46 This set H of all the plausible hypotheses, is the version space. Because this space is partially ordered, one may represent it by its boundaries: G(upper bound):(something); the most general generalization of the example S(lower bound): (alcohol); the example Step2:instance2(+): acetone G covers 'acetone', therefore it is not changed There are two least general generalizations of acetone and alcohol: g(acetone, alcohol) = (solvent inflammable-obj) Therefore new S:(solvent inflammable-obj) Step2:instance3(-): ventilator no concept from S covers 'ventilator', therefore S remains unchanged. G covers ventilator, therefore it has to be specialized. There are four possible specializations of G: s1:(air-jet-device) ; not acceptable because does not cover any element of S s2:(cleaner); acceptable s3: (inflammable-object); acceptable s4:(loudspeaker-component); not acceptable because does not cover any element of S Neither of s2 and s3 is more general than the other, therefore both are kept in G G:(cleaner inflammable-obj) Step2:instance4(+): air-press removes 'inflammable-object' from G because it does not cover 'air-press' G = (cleaner) generalize the elements of S so as to cover the new positive example S = g(old-S, air-press) = (g(solvent, air-press), g(inflammable-obj, air-press)) = (soft-cleaner something) remove 'something' from S because it is more general than 'soft-cleaner' S = (soft-cleaner) Step2:instance5(-): emery-paper S does not cover 'emery-paper' and is not changed G covers 'emery-paper' and has to be specialized. The only possible specialization is G = (soft-cleaner) Step3:S=G: (soft-cleaner) Therefore, the concept that covers all the positive examples and none of the negative example is soft-cleaner.

 2003, G.Tecuci, Learning Agents Laboratory 47 The performed specializations and generalizations are shown in the following figure:

 2003, G.Tecuci, Learning Agents Laboratory 49 The LEX system Lex is a system that uses the version space method to learn heuristics for suggesting when the integration operators should be applied for solving symbolic integration problems. The problem of learning control heuristics Given Operators for symbolic integration: OP1:∫ r f(x) dx --> r ∫ f(x) dx OP2:∫ u dv --> uv - ∫ v du,where u=f1(x) and dv=f2(x)dx OP3:1 f(x) --> f(x) OP4:∫ (f1(x) + f2(x))dx --> ∫ f1(x) dx + ∫ f2(x)dx OP5:∫ sin(x) dx --> -cos(x) + C OP6:∫ cos(x) dx --> sin(x) + C Find Heuristics for applying the operators as, for instance, the following one: To solve ∫ rx transc(x) dx apply OP2 with u=rx and dv=transc(x)dx

 2003, G.Tecuci, Learning Agents Laboratory 50 Remarks The integration operators assure a satisfactory level of competence to the LEX system. That it, LEX is able in principle to solve a significant class of symbolic integration problems. However, in practice, it may not be able to solve many of these problems because this would require too many resources of time and space. The description of an operator shows when the operator is applicable, while a heuristic associated with an operator shows when the operator should be applied, in order to solve a problem. LEX tries to discover, for each operator OPi, the definition of the concept: situations in which OPi should be used.

 2003, G.Tecuci, Learning Agents Laboratory 51 The architecture of LEX Version space of a proposed heuristic: S: 3x cos(x) dx --> Apply OP2 with u = 3xdv = cos(x) dx G: f1(x) f2(x) dx --> Apply OP2 with u = f1(x)dv = f2(x) dx One of the suggested positive training instances: 3x cos(x) dx --> Apply OP2 with u = 3xdv = cos(x) dx 3x cos(x) dx OP2 with u = 3x, dv = cos(x) dx 3x sin(x) - 3sin(x) dx 3x sin(x) + 3cos(x) + C PROBLEM GENERATOR LEARNER PROBLEM SOLVER CRITIC... OP1 OP5 ∫ ∫ ∫ ∫ ∫ ∫ ∫ 1. What search strategy to use for problem solving? 2. How to characterize individual problem solving steps? 3. How to learn from these steps? How is the initial VS defined? 4. How to generate a new problem?

 2003, G.Tecuci, Learning Agents Laboratory 52 The problem solver This module uses the operators and heuristics to solve a given problem as, for instance, ∫3xcos(x)dx. It conducts a uniform-cost search: at each step chooses the one expansion of the search tree that has the smallest estimated cost (in terms of time and space). For each integration problem it has a time and space limit. If it runs out of these limits then it gives up. The output is a detailed trace of the search performed in attempting to solve the problem. The critic This module examines the trace to assign credit or blame to the individual decisions made by the problem solver. It labels as positive instance every search step along the minimum-cost solution path. It labels as negative instance every step that (a) leads from a node on the minimum-cost path to a node not on this path and (b) leads to a solution path whose length is greater than or equal to 1.15 times the length of the minimum cost path. For instance, ∫ 3xcos(x)dx --> 3xsin(x) - ∫ 3sin(x)dx is a positive example for the application of the operator OP2 with u = 3x and dv = cos(x)dx The trace shows also positive examples for OP1 and OP5: ∫ 3sin(x)dx --> 3 ∫ sin(x)dxpositive example for OP1 ∫ sin(x)dx --> cos(x) positive example for OP5 The learner Learns heuristics from examples by using the version space method. The problem generator Inspects the current content of the knowledge base (i.e. operators and heuristics) and generates problems to solve that are useful for learning. Strategies for problem generation: - find an operator for which the version space is still unrefined and select a problem that matches only half of the patterns in S and G; - take a solved problem and slightly modify it, guided by the generalization hierarchy; - if the version spaces for two operators are overlapping, choose a problem for which both are considered to be applicable in order to learn a preference for one of them.

 2003, G.Tecuci, Learning Agents Laboratory 53

 2003, G.Tecuci, Learning Agents Laboratory 54 Illustration of the learning process Continue learning of the heuristic for applying OP2: The problem generator generates a new problem to solve that is useful for learning. The problem solver Solves this problem The critic Extract positive and negative examples from the problem solving tree. The learner Refine the version space of the heuristic.

 2003, G.Tecuci, Learning Agents Laboratory 55 Illustration of the learning process The initial positive example is: ∫ 3x cos(x)dx --> Apply OP2 with u = 3x and dv = cos(x)dx The initial version space of this heuristic was shown in the architecture figure. Notice that S is the training instance, and G is the most general pattern for which OP2 is legal. Similarly, initial version spaces are defined for OP1 and OP5. Let us suppose that the next problem generated by the problem generator is ∫5x sin(x)dx. The problem solving tree built by the problem solver is the following one: 5xsin(x) dx OP2 with u = 5x, dv = sin(x) dx -5xcos(x) + 5cos(x) dx -5cos(x) + 5sin(x) + C... OP1 OP6 OP2 with u = sin(x), dv = 5x dx x cos(x) - 5 _ 2 2 5 _ 2 2 x cos(x) dx ∫ ∫ ∫ ∫ This tree shows a positive and a negative example for OP2: ∫ 5x sin(x)dx --> Apply OP2 with u = 5x and dv = sin(x)dxpositive example ∫ 5x sin(x)dx --> Apply OP2 with u = sin(x) and dv = 5xdxnegative example Consequently, the version space for OP2 is modified as indicated in the following figure:

 2003, G.Tecuci, Learning Agents Laboratory 56 With a few more training instances, the heuristic for OP2 converges to the form: ∫ f1(x) transc(x)dx --> Apply OP2 with u = f1(x) and dv = transc(x)dx 3x cos(x) dx --> Apply OP2 kx trig(x) dx --> Apply OP2 f1(x) f2(x) dx --> Apply OP2 with u=f1(x), v=f2(x)dx poly(x) f2(x) dx --> Apply OP2 f1(x) transc(x) dx --> Apply OP2 with u=poly(x), v=f2(x)dxwith u=f1(x), v=transc(x)dx with u=kx, v=trig(x)dx with u=3x, v=cos(x)dx 5xsin(x) dx --> Apply OP2 with u=5x, v=sin(x)dx 5xsin(x) dx --> Apply OP2 with u=sin(x), v=5xdx G: S: ∫ ∫ ∫ ∫ ∫ ∫∫

 2003, G.Tecuci, Learning Agents Laboratory 58 Types of bias: - restricted hypothesis space bias; - preference bias. The learning bias A bias is any basis for choosing one generalization over another, other than strict consistency with the observed training examples.

 2003, G.Tecuci, Learning Agents Laboratory 59 Some of the restricted spaces investigated: -logical conjunctions (i.e. the learning system will look for a concept description in the form of a conjunction); -linear threshold functions (for exemplar-based representations); -three-layer neural networks with a fixed number of hidden units. Restricted hypothesis space bias The hypothesis space H (i.e. the space containing all the possible concept descriptions) is defined by the generalization language. This language may not be capable of expressing all possible classes of instances. Consequently, the hypothesis space in which the concept description is searched is restricted.

 2003, G.Tecuci, Learning Agents Laboratory 60 The language of instances consists of triples of bits as, for example: (0, 1, 1), (1, 0, 1). How many concepts are in this space? Restricted hypothesis space bias: example The total number of subsets of instances is 2 8 = 256. This hypothesis space consists of 3x3x3 = 27 elements. The language of generalizations consists of triples of 0, 1, and *, where * means any bit, for example: (0, *, 1), (*, 0, 1). How many concepts could be represented in this language?

 2003, G.Tecuci, Learning Agents Laboratory 61 Most preference biases attempt to minimize some measure of syntactic complexity of the hypothesis representation (e.g. shortest logical expression, smallest decision tree). These are variants of Occam's Razor, which is the bias first defined by William of Occam (1300-1349): Given two explanations of data, all other things being equal, the simpler explanation is preferable. Preference bias A preference bias places a preference ordering over the hypotheses in the hypothesis space H. The learning algorithm can then choose the most preferred hypothesis f in H that is consistent with the training examples, and produce this hypothesis as its output.

 2003, G.Tecuci, Learning Agents Laboratory 62 In general, the preference bias may be implemented as an order relationship 'better(f1, f2)' over the hypothesis space H. Then, the system will choose the "best" hypothesis f, according to the "better" relationship. An example of such a relationship: "less-general-than" which produces the least general expression consistent with the data. Preference bias: representation How could the preference bias be represented?

 2003, G.Tecuci, Learning Agents Laboratory 64 Problem colorshapesizeclass orangesquarelarge+i1 blueellipsesmall-i2 redtrianglesmall+i3 greenrectanglesmall-i4 yellowcirclelarge+i5 Apply the candidate elimination algorithm to learn the concept represented by the above examples. Language of instances: An instance is defined by triplet of the form (specific-color, specific-shape, specific-size) Language of generalization: (color-concept, shape-concept, size-concept) Set of examples: Background knowledge: Task:

 2003, G.Tecuci, Learning Agents Laboratory 65 Solution: +i1: (color = orange) & (shape = square) & (size = large) S: {[(color = orange) & (shape = square) & (size = large)]} G: {[(color = any-color) & (shape = any-shape) & (size = any-size)]} -i2: (color = blue) & (shape = ellipse) & (size = small) S: {[(color = orange) & (shape = square) & (size = large)]} G: {[(color = warm-color) & (shape = any-shape) & (size = any-size)], [(color = any-color) & (shape = polygon) & (size = any-size)], [(color = any-color) & (shape = any-shape) & (size = large)]} +i3: (color = red) & (shape = triangle) & (size = small) S: {[(color = warm-color) & (shape = polygon) & (size = any-size)]} G: {[(color = warm-color) & (shape = any-shape) & (any-size)], [(color = any-color) & (shape = polygon) & (size = any-size)]} -i4: (color = green) & (shape = rectangle) & (size = small) S: {[(color = warm-color) & (shape = polygon) & (size = any-size)]} G: {[(color = warm-color) & (shape = any-shape) & (size = any-size)} +i5: (color = yellow) & (shape = circle) & (size = large) S: {[(color = warm-color) & (shape = any-shape) & (size = any-size)]} G: {[(color = warm-color) & (shape = any-shape) & (size = any-size)]} The concept is: (color = warm-color) & (shape = any-shape) & (size = any-size); a warm color object

 2003, G.Tecuci, Learning Agents Laboratory 66 Does the order of the examples count? Why and how? Consider the following order: colorshapesizeclass orangesquarelarge+i1 redtrianglesmall+i3 yellowcirclelarge+i5 blueellipsesmall-i2 greenrectanglesmall-i4

 2003, G.Tecuci, Learning Agents Laboratory 67 What happens if there are not enough examples for S and G to become identical? Discussion Could we still learn something useful? How could we classify a new instance? When could we be sure that the classification is the same as the one made if the concept were completely learned? Could we be sure that the classification is correct?

 2003, G.Tecuci, Learning Agents Laboratory 68 What happens if there are not enough examples for S and G to become identical? Let us assume that one learns only from the first 3 examples: colorshapesizeclass orangesquarelarge+i1 blueellipsesmall-i2 redtrianglesmall+i3 S: {[(color = warm-color) & (shape = polygon) & (size = any-size)]} G: {[(color = warm-color) & (shape = any-shape) & (any-size)], [(color = any-color) & (shape = polygon) & (size = any-size)]} The final version space will be:

 2003, G.Tecuci, Learning Agents Laboratory 69 colorshapesizeclass bluecirclelarge orangesquaresmall redellipselarge bluepolygonsmall G: {[(color = warm-color) & (shape = any-shape) & (any-size)], [(color = any-color) & (shape = polygon) & (size = any-size)]} S: {[(color = warm-color) & (shape = polygon) & (size = any-size)]} Assume that the final version space is: How could we classify the following examples, how certain we are about the classification, and why? _ + don’t know

 2003, G.Tecuci, Learning Agents Laboratory 70 Could the examples contain errors? What kind of errors could be found in an example? What will be the result of the learning algorithm if there are errors in examples? What could we do if we know that there are errors? Discussion

 2003, G.Tecuci, Learning Agents Laboratory 71 Could the examples contain errors? What kind of errors could be found in an example? Discussion - Classification errors: - positive examples labeled as negative - negative examples labeled as positive - Measurement errors - errors in the values of the attributes

 2003, G.Tecuci, Learning Agents Laboratory 72 What will be the result of the learning algorithm if there are errors in examples? Let us assume that the 4th example is incorrectly classified: colorshapesizeclass orangesquarelarge+i1 blueellipsesmall-i2 redtrianglesmall+i3 greenrectanglesmall+i4 (incorrect classification) yellowcirclelarge+i5 S: {[(color = warm-color) & (shape = polygon) & (size = any-size)]} G: {[(color = warm-color) & (shape = any-shape) & (any-size)], [(color = any-color) & (shape = polygon) & (size = any-size)]} The version space after the first three examples is: Continue learning

 2003, G.Tecuci, Learning Agents Laboratory 73 What could we do if we know that there might be errors in the examples? If we cannot find a concept consistent with all the training examples, then we may try to find a concept that is consistent with all but one of the examples. If this fails, then we may try to find a concept that is consistent with all but two of the examples, an so on. What is a problem with this approach? Combinatorial explosion.

 2003, G.Tecuci, Learning Agents Laboratory 74 What happens if we extend the generalization language to include conjunction, disjunction and negation of examples? colorshapesizeclass orangesquarelarge+i1 blueellipsesmall-i2 redtrianglesmall+i3 greenrectanglesmall-i4 yellowcirclelarge+i5 Learn the concept represented by the above examples by applying the Versions Space method. Set of examples: Background knowledge: Task:

 2003, G.Tecuci, Learning Agents Laboratory 75 colorshapesizeclass orangesquarelarge+i1 blueellipsesmall-i2 redtrianglesmall+i3 greenrectanglesmall-i4 yellowcirclelarge+i5 Set of examples: G = {all the examples} S = { i1 } G = { ¬i2 } ; all the examples except i2 S = { i1 } G = { ¬i2 } S = { i1 or i3 } G = { ¬i2 or ¬i4 } ; all examples except i2 and i4 S = { i1 or i3 } G = { ¬i2 or ¬i4 } ; all examples except i2 and i4 S = { i1 or i3 or i5 } These are the minimal generalizations and specializations

 2003, G.Tecuci, Learning Agents Laboratory 76 The futility of bias-free learning A learner that makes no a priori assumptions regarding the identity of the target concept has no rational basis for classifying any unseen instance.

 2003, G.Tecuci, Learning Agents Laboratory 77 What happens if we extend the generalization language to include internal disjunction? Does the algorithm still generalizes over the observed data? colorshapesizeclass orangesquarelarge+i1 blueellipsesmall-i2 redtrianglesmall+i3 greenrectanglesmall-i4 yellowcirclelarge+i5 Learn the concept represented by the above examples by applying the Versions Space method. Set of examples: Background knowledge: Task: Generalization(i1, i3): (orange or red, square or triangle, large or small) Is it different from: i1 or i3?

 2003, G.Tecuci, Learning Agents Laboratory 78 How is the generalization language extended by the internal disjunction? Consider the following generalization hierarchy: any-shape polygon triangle rectanglecircle

 2003, G.Tecuci, Learning Agents Laboratory 79 How is the generalization language extended by the internal disjunction? polygon trianglerectanglecircle triangle or rectangle triangle or circle rectangle or circle polygon or circle triangle or rectangle or circle any-shape polygon triangle rectangle circle The above hierarchy is replaced with the following one:

 2003, G.Tecuci, Learning Agents Laboratory 80 any-color warm-color cold-color redyellow orangeblack blue green Consider now the following generalization hierarchy: Which is the corresponding hierarchy containing disjunctions?

 2003, G.Tecuci, Learning Agents Laboratory 81 Could you think of another approach to learning a disjunctive concept with the candidate elimination algorithm? Find a concept1 that is consistent with some of the positive examples and none of the negative examples. Remove the covered positive examples from the training set and repeat the procedure for the rest of examples, computing another concept2 that covers some positive examples, and so on, until there is no positive example left. The learned concept is “concept1 or concept2 or …” Could you specify this algorithm better? Hint: Initialize S with the first positive example, …

 2003, G.Tecuci, Learning Agents Laboratory 82 Consider the following: Instance language color {red, orange, yellow, blue, green, black} Generalization language color {red, orange, yellow, blue, green, black, warm-color, cold-color, any-color} sequence of positive and negative examples of a concept, and the background knowledge represented by the following hierarchy: Apply the candidate elimination algorithm to learn the concept represented by the above examples. Exercise

 2003, G.Tecuci, Learning Agents Laboratory 83 In its original form learns only conjunctive descriptions. However, it could be applied successively to learn disjunctive descriptions. Requires an exhaustive set of examples. Conducts an exhaustive bi-directional breadth-first search. The sets S and G can be very large for complex problems. It is very important from a theoretical point of view, clarifying the process of inductive concept learning from examples. Has very limited practical applicability because of the combinatorial explosion of the S and G sets. It is at the basis of the powerful Disciple multistrategy learning method which has practical applications. Features of the version space method

 2003, G.Tecuci, Learning Agents Laboratory 84 Recommended reading Mitchell T.M., Machine Learning, Chapter 2: Concept learning and the general to specific ordering, pp. 20-51, McGraw Hill, 1997. Mitchell, T.M., Utgoff P.E., Banerji R., Learning by Experimentation: Acquiring and Refining Problem-Solving Heuristics, in Readings in Machine Learning. Tecuci, G., Building Intelligent Agents, Chapter 3: Knowledge representation and reasoning, pp. 31-75, Academic Press, 1998. Barr A. and Feigenbaum E. (Eds.), The Handbook of Artificial Intelligence, vol III, pp.385-400, pp.484-493.

 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 3.

Similar presentations

Presentation on theme: " 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 3."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 3.

Similar presentations

Presentation on theme: " 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 3."— Presentation transcript:

Similar presentations

About project

Feedback