Knowledge in Learning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 19 Spring 2005.

Slides:



Advertisements
Similar presentations
Explanation-Based Learning (borrowed from mooney et al)
Advertisements

Analytical Learning.
Learning from Observations Chapter 18 Section 1 – 3.
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Methods of Proof Chapter 7, second half.. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound)
Logic Use mathematical deduction to derive new knowledge.
Methods of Proof Chapter 7, Part II. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound) generation.
Knowledge Representation and Reasoning Learning Sets of Rules and Analytical Learning Harris Georgiou – 4.
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
Knowledge in Learning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 19 Spring 2004.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Spring 2004.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2005.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 21 Jim Martin.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Inference in FOL Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 9 Spring 2004.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
18 LEARNING FROM OBSERVATIONS
Knowledge in Learning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 21.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18.
Methods of Proof Chapter 7, second half.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Machine Learning: Symbol-Based
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.
Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias.
1 Machine Learning: Lecture 11 Analytical Learning / Explanation-Based Learning (Based on Chapter 11 of Mitchell, T., Machine Learning, 1997)
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, February 7, 2001.
Machine Learning CSE 681 CH2 - Supervised Learning.
Theory Revision Chris Murphy. The Problem Sometimes we: – Have theories for existing data that do not match new data – Do not want to repeat learning.
Learning from Observations Chapter 18 Through
CHAPTER 18 SECTION 1 – 3 Learning from Observations.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
November 10, Machine Learning: Lecture 9 Rule Learning / Inductive Logic Programming.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
An Introduction to Artificial Intelligence – CE Chapter 7- Logical Agents Ramin Halavati
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
1 Inductive Learning (continued) Chapter 19 Slides for Ch. 19 by J.C. Latombe.
For Monday Finish chapter 19 No homework. Program 4 Any questions?
Automated Reasoning Early AI explored how to automated several reasoning tasks – these were solved by what we might call weak problem solving methods as.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?
CS 5751 Machine Learning Chapter 10 Learning Sets of Rules1 Learning Sets of Rules Sequential covering algorithms FOIL Induction as the inverse of deduction.
Machine Learning Concept Learning General-to Specific Ordering
CpSc 810: Machine Learning Analytical learning. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various.
1 Propositional Logic Limits The expressive power of propositional logic is limited. The assumption is that everything can be expressed by simple facts.
Logical Agents Chapter 7. Outline Knowledge-based agents Propositional (Boolean) logic Equivalence, validity, satisfiability Inference rules and theorem.
More Symbolic Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Computational Learning Theory Part 1: Preliminaries 1.
Chap. 10 Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과.
Chapter 18 Section 1 – 3 Learning from Observations.
Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).
Learning From Observations Inductive Learning Decision Trees Ensembles.
Web-Mining Agents First-Order Knowledge in Learning Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.
Artificial Intelligence Knowledge in Learning:
Knowledge in Learning Chapter 19
Machine Learning Chapter 2
Implementation of Learning Systems
Machine Learning Chapter 2
Presentation transcript:

Knowledge in Learning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 19 Spring 2005

CSE 471/598, CBS 598 by H. Liu2 A logical formulation of learning What’re Goal and Hypotheses Goal predicate Q - WillWait Learning is to find an equivalent logical expression we can classify examples Each hypothesis proposes such an expression - a candidate definition of Q  r WillWait(r)  Pat(r,Some)  Pat(r,Full)  Hungry(r)  Type(r,French)  …

CSE 471/598, CBS 598 by H. Liu3 Hypothesis space is the set of all hypotheses the learning algorithm is designed to entertain. One of the hypotheses is correct: H 1 V H 2 V…V H n Each H i predicts a certain set of examples - the extension of the goal predicate. Two hypotheses with different extensions are logically inconsistent with each other, otherwise, they are logically equivalent.

CSE 471/598, CBS 598 by H. Liu4 What are Examples An example is an object of some logical description to which the goal concept may or may not apply. Alt(X1)^!Bar(X1)^!Fri/Sat(X1)^… Ideally, we want to find a hypothesis that agrees with all the examples. The relation between f and h are: ++, --, +- (false negative), -+ (false positive). If the last two occur, example I and h are logically inconsistent.

CSE 471/598, CBS 598 by H. Liu5 Current-best hypothesis search Maintain a single hypothesis Adjust it as new examples arrive to maintain consistency (Fig 19.1) Generalization for positive examples Specialization for negative examples Algorithm (Fig 19.2, page 681) Need to check for consistency with all existing examples each time taking a new example

CSE 471/598, CBS 598 by H. Liu6 Example of WillWait Fig 18.3 for Current-Best-Learning Problems: nondeterministic, no guarantee for simplest and correct h, need backtrack

CSE 471/598, CBS 598 by H. Liu7 Least-commitment search Keeping only one h as its best guess is the problem -> Can we keep as many as possible? Version space (candidate elimination) Algorithm incremental least-commitment From intervals to boundary sets G-set and S-set  S0 – the most specific set contains nothing  G0 – the most general set covers everything Everything between is guaranteed to be consistent with examples. VS tries to generalize S0 and specialize G0 incrementally

CSE 471/598, CBS 598 by H. Liu8 Version space An example with 4 instances from Tom Mitchell’s book Generalization and specialization (Fig 19.4) False positive for Si, too general, discard it False negative for Si, too specific, generalize it minimally False positive for Gi, too general, specialize it minimally False negative for Gi, too specific, discard it When to stop One concept left (Si = Gi) The version space collapses (G is more special than S, or..) Run out of examples One major problem: can’t handle noise

CSE 471/598, CBS 598 by H. Liu9 Using prior knowledge For DT and logical description learning, we assume no prior knowledge We do have some prior knowledge, so how can we use it? We need a logical formulation as opposed to the function learning.

CSE 471/598, CBS 598 by H. Liu10 Inductive learning in the logical setting The objective is to find a hypothesis that explains the classifications of the examples, given their descriptions. Hypothesis ^ Description |= Classifications Descriptions - the conjunction of all the example descriptions Classifications - the conjunction of all the example classifications

CSE 471/598, CBS 598 by H. Liu11 A cumulative learning process Fig 19.6 (p 687) The new approach is to design agents that already know something and are trying to learning some more. Intuitively, this should be faster and better than without using knowledge, assuming what’s known is always correct. How to implement this cumulative learning with increasing knowledge?

CSE 471/598, CBS 598 by H. Liu12 Some examples of using knowledge One can leap to general conclusions after only one observation. Your such experience? Traveling to Brazil: Language and name ? A pharmacologically ignorant but diagnostically sophisticated medical student … ?

CSE 471/598, CBS 598 by H. Liu13 Some general schemes Explanation-based learning (EBL) Hypothesis^Description |= Classifications Background |= Hypothesis  doesn’t learn anything factually new from instance Relevance-based learning (RBL) Hypothesis^Descriptions |= Classifications Background^Descrip’s^Class |= Hypothesis  deductive in nature Knowledge-based inductive learning (KBIL) Background^Hypothesis^Descrip’s |= Classifications

CSE 471/598, CBS 598 by H. Liu14 Inductive logical programming (ILP) ILP can formulate hypotheses in general first- order logic Others like DT are more restricted languages Prior knowledge is used to reduce the complexity of learning: prior knowledge further reduces the H space prior knowledge helps find the shorter H Again, assuming prior knowledge is correct

CSE 471/598, CBS 598 by H. Liu15 Explanation-based learning A method to extract general rules from individual observations The goal is to solve a similar problem faster next time. Memoization - speed up by saving results and avoiding solving a problem from scratch EBL does it one step further - from observations to rules

CSE 471/598, CBS 598 by H. Liu16 Why EBL? Explaining why something is a good idea is much easier than coming up with the idea. Once something is understood, it can be generalized and reused in other circumstances. Extracting general rules from examples EBL constructs two proof trees simultaneously by variablization of the constants in the first tree An example (Fig 19.7)

CSE 471/598, CBS 598 by H. Liu17 Basic EBL Given an example, construct a proof tree using the background knowledge In parallel, construct a generalized proof tree for the variabilized goal Construct a new rule (leaves => the root) Drop any conditions that are true regardless of the variables in the goal

CSE 471/598, CBS 598 by H. Liu18 Efficiency of EBL Choosing a general rule too many rules -> slow inference aim for gain - significant increase in speed as general as possible Operationality - A subgoal is operational means it is easy to solve Trade-off between Operationality and Generality Empirical analysis of efficiency in EBL study

CSE 471/598, CBS 598 by H. Liu19 Learning using relevant information Prior knowledge: People in a country usually speak the same language Nat(x,n) ^Nat(y,n)^Lang(x,l)=>Lang(y,l) Observation: Given nationality, language is fully determined Given Fernando is Brazilian & speaks Portuguese Nat(Fernando,B) ^ Lang(Fernando,P) We can logically conclude Nat(y,B) => Lang(y,P)

CSE 471/598, CBS 598 by H. Liu20 Functional dependencies We have seen a form of relevance: determination - language (Portuguese) is a function of nationality (Brazil) Determination is really a relationship between the predicates The corresponding generalization follows logically from the determinations and descriptions.

CSE 471/598, CBS 598 by H. Liu21 We can generalize from Fernando to all Brazilians, but not to all nations. So, determinations can limit the H space to be considered. Determinations specify a sufficient basis vocabulary from which to construct hypotheses concerning the target predicate. A reduction in the H space size should make it easier to learn the target predicate For n Boolean features, if the determination contains d features, what is the saving for the required number of examples according to PAC?

CSE 471/598, CBS 598 by H. Liu22 Learning using relevant information A determination P  Q says if any examples match on P, they must also match on Q Find the simplest determination consistent with the observations Search through the space of determinations from one predicate, two predicates Algorithm - Fig 19.8 (page 696) Time complexity is n choosing p Feature selection is an active research area for machine learning, pattern recognition, statistics

CSE 471/598, CBS 598 by H. Liu23 Combining relevance based learning with decision tree learning -> RBDTL Its learning performance improves (Fig 19.9). Performance in terms of training set size Gains: time saving, less chance to overfit Other issues relevance based learning noise handling using other prior knowledge from attribute-based to FOL

CSE 471/598, CBS 598 by H. Liu24 Inductive logic programming It combines inductive methods with FOL. ILP represents theories as logic programs. ILP offers complete algorithms for inducing general, first-order theories from examples. It can learn successfully in domains where attribute-based algorithms fail completely. An example - a typical family tree (Fig 19.11)

CSE 471/598, CBS 598 by H. Liu25 Inverse resolution If Classifications follow from B^H^D, then we can prove this by resolution with refutation (completeness). If we run the proof backwards, we can find a H such that the proof goes through. C -> C1 and C2 C and C2 -> C1 Generating inverse proofs A family tree example (Fig 19.13)

CSE 471/598, CBS 598 by H. Liu26 Inverse resolution involves search Each inverse resolution step is nondeterministic  For any C and C1, there can be many C2 Discovering new knowledge with IR It’s not easy - a monkey and a typewriter Discovering new predicates with IR Fig The ability to use background knowledge provides significant advantages

CSE 471/598, CBS 598 by H. Liu27 Top-down learning (FOIL) A generalization of DT induction to the first-order case by the same author of C4.5 Starting with a general rule and specialize it to fit data Now we use first-order literals instead of attributes, and H is a set of clauses instead of a decision tree. Example: =>grandfather(x,y) (page 701) positive and negative examples adding literals one at a time to the left-hand side  e.g., Father (x,y) => Grandfather(x,y)  How to choose literal? (Algorithm on page 702) the rule should agree with some + examples, none of – examples FOIL removes the covered + examples, repeats

CSE 471/598, CBS 598 by H. Liu28 Summary Using prior knowledge in cumulative learning Prior knowledge allows for shorter H’s. Prior knowledge plays different logical roles as in entailment constraints EBL, RBL, KBIL ILP generates new predicates so that concise new theories can be expressed.