Knowledge in Learning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 21
CS 471/598 by H. Liu2 Using prior knowledge zFor DT and logical description learning, we assume no prior knowledge zWe do have some prior knowledge, so how can we use it? zWe need a logical formulation as opposed to the function learning.
CS 471/598 by H. Liu3 Inductive learning in the logical setting zThe objective is to find a hypothesis that explains the classifications of the examples, given their descriptions. Hypothesis ^ Description |= Classifications yDescriptions - the conjunction of all the example descriptions yClassifications - the conjunction of all the example classifications
CS 471/598 by H. Liu4 A cumulative learning process zFig 21.1 zThe new approach is to design agents that already know something and are trying to learning some more. zIntuitively, this should be faster and better than without using knowledge, assuming what’s known is always correct. zHow to implement this cumulative learning with increasing knowledge?
CS 471/598 by H. Liu5 Some examples of using knowledge zOne can leap to general conclusions after only one observation. yYour such experience? zTraveling to Brazil: Language and name y? zA pharmacologically ignorant but diagnostically sophisticated medical student … y?
CS 471/598 by H. Liu6 Some general schemes zExplanation-based learning (EBL) yHypothesis^Description |= Classifications yBackground |= Hypothesis xdoesn’t learn anything factually new from instance zRelevance-based learning (RBL) yHypothesis^Descriptions |= Classifications yBackground^Descrip’s^Class |= Hypothesis xdeductive in nature zKnowledge-based inductive learning (KBIL) yBackground^Hypothesis^Descrip’s |= Classifications
CS 471/598 by H. Liu7 Inductive logical programming (ILP) zILP can formulate hypotheses in general first- order logic yOthers like DT are more restricted languages zPrior knowledge is used to reduce the complexity of learning: yprior knowledge further reduces the H space yprior knowledge helps find the shorter H yAgain, assuming prior knowledge is correct
CS 471/598 by H. Liu8 Explanation-based learning zA method to extract general rules from individual observations zThe goal is to solve a similar problem faster next time. zMemoization - speed up by saving results and avoiding solving a problem from scratch zEBL does it one step further - from observations to rules
CS 471/598 by H. Liu9 Why EBL? zExplaining why something is a good idea is much easier than coming up with the idea. zOnce something is understood, it can be generalized and reused in other circumstances. zExtracting general rules from examples zEBL constructs two proof trees simultaneously by variablization of the constants in the first tree zAn example (Fig 21.2)
CS 471/598 by H. Liu10 Basic EBL zGiven an example, construct a proof tree using the background knowledge zIn parallel, construct a generalized proof tree for the variabilized goal zConstruct a new rule (leaves => the root) zDrop any conditions that are true regardless of the variables in the goal
CS 471/598 by H. Liu11 Efficiency of EBL zChoosing a general rule ytoo many rules -> slow inference yaim for gain - significant increase in speed yas general as possible zOperationality - A subgoal is operational means it is easy to solve yTrade-off between Operationality and Generality zEmpirical analysis of efficiency in EBL study
CS 471/598 by H. Liu12 Learning using relevant information zPrior knowledge: People in a country usually speak the same language zObservation: Given Fernando is Brazilian & speaks Portuguese zWe cab logically conclude via resolution
CS 471/598 by H. Liu13 Functional dependencies zWe have seen a form of relevance: determination - language (Portuguese) is a function of nationality (Brazil) zDetermination is really a relationship between the predicates zThe corresponding generalization follows logically from the determinations and descriptions.
CS 471/598 by H. Liu14 zWe can generalize from Fernando to all Brazilians, but not to all nations. So, determinations can limit the H space to be considered. zDeterminations specify a sufficient basis vocabulary from which to construct hypotheses concerning the target predicate. zA reduction in the H space size should make it easier to learn the target predicate.
CS 471/598 by H. Liu15 Learning using relevant information zA determination P Q says if any examples match on P, they must also match on Q zFind the simplest determination consistent with the observations ySearch through the space of determinations from one predicate, two predicates yAlgorithm - Fig 21.3 (page 635) yTime complexity is n choosing p.
CS 471/598 by H. Liu16 zCombining relevance based learning with decision tree learning -> RBDTL zIts learning performance improves (Fig 21.4). zOther issues ynoise handling yusing other prior knowledge yfrom attribute-based to FOL
CS 471/598 by H. Liu17 Inductive logic programming zIt combines inductive methods with FOL. zILP represents theories as logic programs. zILP offers complete algorithms for inducing general, first-order theories from examples. zIt can learn successfully in domains where attribute-based algorithms fail completely. zAn example - a typical family tree (Fig 21.5)
CS 471/598 by H. Liu18 Inverse resolution zIf Classifications follow from B^H^D, then we can prove this by resolution with refutation (completeness). zIf we run the proof backwards, we can find a H such that the proof goes through. zGenerating inverse proofs yA family tree example (Fig 21.6)
CS 471/598 by H. Liu19 zInverse resolution involves search yEach inverse resolution step is nondeterministic xFor any C and C1, there can be many C2 zDiscovering new knowledge with IR yIt’s not easy - a monkey and a typewriter zDiscovering new predicates with IR yFig 21.7 zThe ability to use background knowledge provides significant advantages
CS 471/598 by H. Liu20 Top-down learning (FOIL) zA generalization of DT induction to the first-order case by the same author of C4.5 yStarting with a general rule and specialize it to fit data yNow we use first-order literals instead of attributes, and H is a set of clauses instead of a decision tree. zExample: =>grandfather(x,y) (page 642) ypositive and negative examples yadding literals one at a time to the left-hand side ye.g., Father (x,y) => Grandfather(x,y) yHow to choose literal? (Algorithm on page 643)
CS 471/598 by H. Liu21 Summary zUsing prior knowledge in cumulative learning zPrior knowledge allows for shorter H’s. zPrior knowledge plays different logical roles as in entailment constraints zEBL, RBL, KBIL zILP generate new predicates so that concise new theories can be expressed.