CS-424 Gregory Dudek Lecture 14 Learning –Probably approximately correct learning (cont’d) –Version spaces –Decision trees.

CS-424 Gregory Dudek Lecture 14 Learning –Probably approximately correct learning (cont’d) –Version spaces –Decision trees

CS-424 Gregory Dudek PAC: definition Relax this requirement by not requiring that the learning program necessarily achieve a small error but only that it to keep the error small with high probability. Probably approximately correct (PAC) with probability  and error at most  if, given any set of training examples drawn according to the fixed distribution, the program outputs a hypothesis f such that Pr(Error(f) >  ) < 

CS-424 Gregory Dudek PAC Training examples Theorem: If the number of hypotheses |H| is finite, then a program that returns an hypothesis that is consistent with m = ln(  /|H|)/ln(1-  ) training examples (drawn according to Pr) is guaranteed to be PAC with probability  and error bounded by .

CS-424 Gregory Dudek We want…. PAC (so far) describes accuracy of the hypothesis, and the chances of finding such a concept. –How may examples do we need to rule out the “really bad” hypotheses. We also want the process to proceed quickly.

CS-424 Gregory Dudek PAC learnable spaces A class of concepts C is said to be PAC learnable for a hypothesis space H if there exists an polynomial time algorithm A such that: for any c  C, distribution Pr, , and , if A is given a quantity of training examples polynomial in 1/  and 1/ , then with probability 1-  the algorithm will return a hypothesis f from H such that error(f) <= .

CS-424 Gregory Dudek Observations on PAC PAC learnability doesn’t tell us how to find the learning algorithm. The number of examples needed grows slowly as the concept space increases, and with other key parameters.

CS-424 Gregory Dudek Example Target and learned concepts are conjunctions with up to n predicates. (This is our bias.) –Each predicate might be appear in either positive or negated form, or be absent: 3 options. –This gives 3 n possible conjunctions in the hypothesis space.

CS-424 Gregory Dudek Result I have such a formula in mind. I’ll give you some examples. You try to guess what the formula is. A concept that matches all our examples will be PAC if m is at least n/  ln ( 3/  )

CS-424 Gregory Dudek How How can we actually find a suitable concept? One key approach: start with the examples themselves, and try to generalize. E.g. Given f(3,5) and f(5,5). –We might try replacing the first argument with a variable X: f(X,5). –We might try replacing both arguments with variables: f(X,Y). –We want to get as general as possible, but not too general. The converse of this generalization is specialization.

CS-424 Gregory Dudek Version Space [RN 18.5;DAA 5.3] Deals with conjunctive concepts. Consider a concept C as being identified with the set of positive examples it associated with. –C:even numbered hearts = {3 ,5 ,7 ,9  }. A concept C 1 is a specialization of concept C 2 if the examples associated with C 1 are a subset of those associated with C 2. 3-of-hearts more specialized than odd hearts.

CS-424 Gregory Dudek Specialization/Generalization Cards BlackRed Odd redEven red  Even-  (red is implied) 3  5  7  9 

CS-424 Gregory Dudek Immediate Immediate Specialization: no intermediate. Red is not the immediate specialization of 2-of- hearts. Red is the immediate specialization of hearts and diamonds. –Note: This observation depends on knowing the hypothesis space restriction.

CS-424 Gregory Dudek Algorithm outline Incrementally process training data. Keep list of most and least specific concepts consistent with the observed data. –For two concepts A and B that are consistent with the data,the concept C= (A AND B) will also be consistent yet more specific. Tied in a subtle way to conjunctions. –Disjunctive concepts can be obtained trivially by joining examples, but they’re not interesting.

CS-424 Gregory Dudek 4  :no 5  :yes 5  :no 7  :yes 9  --- 3  yes VS example Cards BlackRed Even redOdd red  Odd-  (red is implied) 3  5  7  9 

CS-424 Gregory Dudek Algorithm specifics Maintain two bounding concepts: –The most specialized (specific boundary, S-set) –The broadest (general boundary, G-set). Each example we see is either positive (yes) or negative (no). Positive examples (+) tend to make the concept more general (or inclusive). Negative examples (-) are used to make the concept more exclusive (to reject them). + -> move “up” the specific boundary - -> move “down” the general boundary. Detailed algorithm: RN p. 549; DAA p. 191-192.

CS-424 Gregory Dudek Observations It allows you to GENERALIZE from a training set to examples never-before-seen !!! –In contrast, consider table lookup or rote learning. Why is that good? 1 It allows you to infer things about new data (the whole point of learning) 2 It allows you to (potentially) remember old data much more efficiently. Version space method is optimal for a conjunction of positive literals. –How does it perform with noisy data?

CS-424 Gregory Dudek

Learning Decision Trees Decision tree takes as input a set of properties and outputs yes/no “decisions”. A tree structure such that –Internal nodes correspond to questions –Arcs are associated with answers to questions –Terminal (leaf) nodes are decisions. Example: goal predicate: WillWait do we want to wait for a table at a restaurant?

CS-424 Gregory Dudek Restaurant Selector Example attributes: 1. Alternate 2. Bar 3. Fri/Sat 4. Hungry 5. Patrons 6. Price etc. Forall restaurants r : Patrons(r, Full) AND WaitEstimate(r,under_10_min) AND Hungry(r,N) -> WillWait(r)

CS-424 Gregory Dudek Example 2 Maybe we should have made a reservation? (using a decision tree) Restaurant lookup: you’ve heard Joe’s is good. Lookup Joe’s Lookup Chez Joe Lookup Restaurant Joe’s Lookup Bistro Joe’s Lookup Restaurant Chez Joe’s Lookup Le Restaurant Chez Joe Lookup Le Bar Restaurant Joe Lookup Le Restaurant Casa Chez Joe

CS-424 Gregory Dudek Decision trees: issues Constructing a decision tree is easy… really easy! –Just add examples in turn. Difficulty: how can we extract a simplified decision tree? –This implies (among other things) establishing a preference order (bias) among alternative decision trees. –Finding the smallest one proves to be VERY hard. Improving over the trivial one is okay.

CS-424 Gregory Dudek Office size example Training examples: 1. large ^ cs ^ faculty -> yes 2. large ^ ee ^ faculty -> no 3. large ^ cs ^ student -> yes 4. small ^ cs ^ faculty -> no 5. small ^ cs ^ student -> no The questions about office size, department and status tells use something about the mystery attribute. Let’s encode all this as a decision tree.

CS-424 Gregory Dudek Decision tree #1 size / \ large small / \ dept no {4,5} / \ cs ee / \ yes {1,3} no {2}

CS-424 Gregory Dudek Decision tree #2 status / \ faculty student / \ dept dept / \ / \ cs ee ee cs / \ / \ size no ? size / \ / \ large small large small / \ / \ yes no {4} yes no {5}

CS-424 Gregory Dudek Making a tree How can we build a decision tree (that might be good)? Objective: an algorithm that builds a decision tree from the root down. Each node in the decision tree is associated with a set of training examples that are split among its children. Input: a node in a decision tree with no children and associated with a set of training examples Output: a decision tree that classifies all of the examples i.e., all of the training examples stored in each leaf of the decision tree are in the same class

CS-424 Gregory Dudek Procedure: Buildtree If all of the training examples are in the same class, then quit, else 1. Choose an attribute to split the examples. 2. Create a new child node for each attribute value. 3. Redistribute the examples among the children according to the attribute values. 4. Apply buildtree to each child node. Is this a good decision tree? Maybe? How do we decide?

CS-424 Gregory Dudek A Bad tree To identify an animal (goat,dog,housecat,tiger) Is it a dog? Is it a housecat? Is it a tiger? Is it a goat?

CS-424 Gregory Dudek A good tree? It is a cat (cat family)? (if yes, what kind.) Is it a dog? Max depth 2 questions.

CS-424 Gregory Dudek Best Property Need to select property / feature / attribute Goal find short tree (Occam's razor) –Base this on MAXIMUM depth select most informative feature –One that best splits (classifies) the examples Use measure from information theory –Claude Shannon (1949)

CS-424 Gregory Dudek Entropy Entropy is often described as a measure of disorder. –Maybe better to think of a measure of unpredictability. Low entropy = highly ordered High entropy = unpredictable = surprising -> chaotic As defined, entropy is related to the number of bits needed. Over some set of states or outcomes with probability p: - ∑ p log p

CS-424 Gregory Dudek Entropy measures the (im) purity in collection S of examples Entropy(S) = - p_{+} log_2 (p_{+}) - p_{-} log_2 (p_{-}) p_{+} is proportion of positive examples. p_{-} is proportion of negative examples.

CS-424 Gregory Dudek

CS-424 Gregory Dudek Lecture 14 Learning –Probably approximately correct learning (cont’d) –Version spaces –Decision trees.

Similar presentations

Presentation on theme: "CS-424 Gregory Dudek Lecture 14 Learning –Probably approximately correct learning (cont’d) –Version spaces –Decision trees."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS-424 Gregory Dudek Lecture 14 Learning –Probably approximately correct learning (cont’d) –Version spaces –Decision trees.

Similar presentations

Presentation on theme: "CS-424 Gregory Dudek Lecture 14 Learning –Probably approximately correct learning (cont’d) –Version spaces –Decision trees."— Presentation transcript:

Similar presentations

About project

Feedback