Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,

Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 141 HW5 must be turned in by 11:55pm Fri (soln out early Sat) Read Chapters 26 and 27 of textbook for Next Tuesday Exam (comprehensive, with focus on material since midterm), Thurs 5:30-7:30pm, in this room, two pages and notes and simple calculator (log, e, * / + -) allowed Next Tues We’ll Cover My Fall 2014 Final (Spring 2013 Next Weds?) A Short Introduction to Inductive Logic Programming (ILP) – Sec. 19.5 of textbook - learning FOPC ‘rule sets’ - could, in a follow-up step, learn MLN weights on these rules (ie, learn ‘structure’ then learn ‘wgts’) A Short Introduction to Computational Learning Theory (COLT) – Sec 18.5 of text

12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 Inductive Logic Programming (ILP) Use mathematical logic to –Represent training examples (goes beyond fixed-length feature vectors) –Represent learned models (FOPC rule sets) ML work in the late ’70s through early ’90s was logic-based, then statistical ML ‘took over’ 2

12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 Examples in FOPC (not all have same # of ‘features’) on(ex1, block1, table)  on(ex1, block2, block1)  color(ex1, block1, blue)  color(ex1, block2, blue)  size(ex1, block1, large)  size(ex1, block2, small) PosEx1 PosEx2 Learned Concept tower(?E) if on(?E, ?A, table), on(?E, ?B, ?A). 3

12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 Searching for a Good Rule (propositional-logic version) P if A P if B and C P if C P if B and D P is always true P if B 4

All Possible Extensions of a Clause (capital letters are variables) Assume we are expanding this node q(X, Z)  p(X, Y) What are the possible extensions using r/3 ? r(X,X,X) r(Y,Y,Y) r(Z,Z,Z) r(1,1,1) r(X,Y,Z) r(Z,Y,X) r(X,X,Y) r(X,X,1) r(X,Y,A) r(X,A,B) r(A,A,A) r(A,B,1) and many more … Choose from: old variables, constants, new vars Huge branching factor in our search! 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 145

12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 Example: ILP in the Blocks World Consider this training set POS NEG 6 Can you guess an FOPC rule?

12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 Searching for a Good Rule (FOPC version; cap letters are vars) on(X,Y)  POS true  POS 7 blue(X)  POS tall(X)  POS Assume we have: tall(X), wide(Y), square(X), on(X,Y), red(X), green(X), blue(X), block(X) … POSSIBLE RULE LEARNED: If on(X,Y)  block(Y)  blue(X)  POS - hard to learn with fixed-length feature vectors! + -

12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 Covering Algorithms (learn a rule, then recur; so disjunctive) + + + + + + + - - - - - - - - - - - - + + + + + + + + + + + + + + - - - - - - - - - - - - Examples covered by Rule 1 Examples Still to Cover; use to learn Rule 2 8

12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 Using Background Knowledge (BK) in ILP Now consider adding some domain knowledge about the task being learned For example If Q, R, and W are all true Then you can infer Z is true Can also do arithmetic, etc in BK rule bodies If SOME_TRIG_CALCS_OUTSIDE_OF_LOGIC Then openPassingLane(P1, P2, Radius, Angle) 9

12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 Searching for a Good Rule using Deduced Features (eg, Z) P if AP if CP if B P if Z P if B & Z Note that more BK can lead to slower learning! But hopefully less search depth needed P is always true P if B and DP if B and C 10

12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 Controlling the Search for a Good Rule Choose a ‘seed’ positive example, then only consider properties that are true about this example Specify argument types and whether arguments are ‘input’ (+) or ‘output’ (-) –Only consider adding a literal if all of its input arguments already present in rule –For example enemies(+person, -person) Only if a variable of type PERSON is already in the rule [eg, murdered(person)], consider adding that person’s enemies 11

12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 Formal Specification of the ILP Task Givena set of pos examples (P) a set of neg examples (N) some background knowledge (BK) Doinduce additional knowledge (AK) such that BK  AK allows all/most in P to be proved BK  AK allows none/few in N to be proved Technically, the BK also contains all the facts about the pos and neg examples plus some rules 12

ILP Wrapup Use best-first search with a large beam Commonly used scoring function #posExCovered - #negExCoved – ruleLength Performs ML without requiring fixed-length-feature-vectors Produces human-readable rules (straightforward to convert FOPC to English) Can be slow due to large search space Appealing ‘inner loop’ for prob logic learning 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 14 13

COLT: Probably Approximately Correct (PAC) Learning PAC theory of learning (Valiant ’84) Given C class of possible concepts c  Ctarget concept Hhypothesis space (usually H = C) ,  correctness bounds Npolynomial number of examples 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 1414

Probably Approximately Correct (PAC) Learning Do with probability 1 - , return an h in H whose accuracy is at least 1 -  Do this for any probability distribution for the examples In other words Prob[error(h, c) >  ] <  h c Shaded regions are where errors occur 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 1415

How Many Examples Needed to be PAC? Consider finite hypothesis spaces Let H bad  { h 1, …, h z } The set of hypotheses whose (‘testset’) error is >  Goal With high prob, eliminate all items in H bad via (noise-free) training examples 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 1416

How Many Examples Needed to be PAC? How can an h look bad, even though it is correct on all the training examples? If we never see any examples in the shaded regions We’ll compute an N s.t. the odds of this are sufficiently low (recall, N = number of examples) h c 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 1417

H bad Consider H 1  H bad and ex  { N } What is the probability that H 1 is consistent with ex ? Prob[consistent A (ex, H 1 )] ≤ 1 -  (since H 1 is bad its error rate is at least  ) The set of N examples 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 1418

H bad (cont.) What is the probability that H 1 is consistent with all N examples? Prob[consistent B ({ N }, H 1 )] ≤ (1 -  ) |N| (by iid assumption) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 1419

H bad (cont.) What is the probability that some member of H bad is consistent with the examples in { N } ? Prob[consistent C ({N}, H bad )]  Prob[consistent B ({N}, H 1 )  …  consistent B ({N}, H z )] ≤ |H bad | x (1-  ) |N| // P(A  B) = P(A) + P(B) - P(A  B) ≤ |H| x (1-  ) |N| // H bad  H Ignore this in upper bound calc 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 1420

Solving for #Examples, |N| We want Prob[consistent C ({N}, H bad )] ≤ |H| x (1-  ) |N| <  Recall that we want the prob of a bad concept surviving to be less than , our bound on learning a poor concept Assume that if many consistent hypotheses survive, we get unlucky and choose a bad one (we’re doing a worst-case analysis) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 1421

Solving for |N| (number of examples needed to be confident of getting a good model) Solving |N| > [ log(1/  ) + log(|H|) ] / -ln(1-  ) Since  ≤ -log(1-  ) over [0,1) we get |N| > [ log(1/  ) + log(|H|) ] /  (Aside: notice that this calculation assumed we could always find a hypothesis that fits the training data) Notice we made NO assumptions about the prob dist of the data (other than it does not change) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 1422

Example: Number of Instances Needed Assume F = 100 binary features H = all (pure) conjuncts [3 |F| possibilities (  i, use f i, use ¬ f i, or ignore f i ) so log |H| = |F|  log 3 ≈ |F| ]  = 0.01  = 0.01 N = [log(1/  )+log(|H|)] /  = 100  [log(100) + 100] ≈ 10 4 But how many real-world concepts are pure conjuncts with noise-free training data? 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 1423

Agnostic Learning So far we’ve assumed we knew the concept class - but that is unrealistic on real-world data In agnostic learning we relax this assumption We instead aim to find a hypothesis arbitrarily close (ie <  error) to the best* hypothesis in our hypothesis space We now need |N| ≥ [ log(1/  ) + log(|H|) ] / 2  2 (denominator had been just  before) * ie, closest to the true concept 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 1424

Two Senses of Complexity Sample complexity (number of examples needed) vs. Time complexity (time needed to find h  H that is consistent with the training examples) - in CS, we usually only address time complexity 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 1425

Complexity (cont.) –Some concepts require a polynomial number of examples, but an exponential amount of time (in the worst case) –Eg, optimally training neural networks is NP-hard (recall BP is a ‘greedy’ algorithm that finds a local min) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 1426

Some Other COLT Topics COLT + clustering + k-NN + RL + SVMs + ANNs + ILP, etc. Average case analysis (vs. worst case) Learnability of natural languages (language innate?) Learnability in parallel 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 1427

Summary of COLT Strengths Formalizes learning task Allows for imperfections (eg,  and  in PAC) Work on boosting excellent case of ML theory influencing ML practice Shows what concepts are intrinsically hard to learn (eg, k-term DNF*) * though a superset of this class is PAC learnable! 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 1428

Summary of COLT Weaknesses Most analyses are worst case Hence, bounds often much higher than what works in practice (see Domingos article assigned early this semester) Use of ‘prior knowledge’ not captured very well yet 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30, Week 1429

Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,

Similar presentations

Presentation on theme: "Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,

Similar presentations

Presentation on theme: "Today’s Topics (only on final at a high level; Sec 19.5 and Sec 18.5 readings below are ‘skim only’) 12/8/15CS 540 - Fall 2015 (Shavlik©), Lecture 30,"— Presentation transcript:

Similar presentations

About project

Feedback