CS-424 Gregory Dudek Today’s outline Administrative issues –Assignment deadlines: 1 day = 24 hrs (holidays are special) –The project –Assignment 3 –Midterm.

Slides:



Advertisements
Similar presentations
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Advertisements

Decision Trees Decision tree representation ID3 learning algorithm
CHAPTER 9: Decision Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Decision Tree.
Classification Techniques: Decision Tree Learning
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Decision Tree Algorithm
Ensemble Learning: An Introduction
Induction of Decision Trees
Tree-based methods, neutral networks
Lecture 5 (Classification with Decision Trees)
Decision Trees Chapter 18 From Data to Knowledge.
Three kinds of learning
LEARNING DECISION TREES
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
Learning decision trees
Learning….in a rather broad sense: improvement of performance on the basis of experience Machine learning…… improve for task T with respect to performance.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Classification.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Learning Chapter 18 and Parts of Chapter 20
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
CS-424 Gregory Dudek Lecture 14 Learning –Probably approximately correct learning (cont’d) –Version spaces –Decision trees.
CpSc 810: Machine Learning Decision Tree Learning.
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
CS Decision Trees1 Decision Trees Highly used and successful Iteratively split the Data Set into subsets one attribute at a time, using most informative.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Today’s Topics Read Chapter 3 & Section 4.1 (Skim Section 3.6 and rest of Chapter 4), Sections 5.1, 5.2, 5.3, 5,7, 5.8, & 5.9 (skim rest of Chapter 5)
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Today’s Topics Learning Decision Trees (Chapter 18) –We’ll use d-trees to introduce/motivate many general issues in ML (eg, overfitting reduction) “Forests”
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
CS-424 Gregory Dudek Today’s Lecture Administrative Details Learning –Decision trees: cleanup & details –Belief nets –Sub-symbolic learning Neural networks.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Lecture Notes for Chapter 4 Introduction to Data Mining
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) [Edited by J. Wiebe] Decision Trees.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
Exploration Seminar 8 Machine Learning Roy McElmurry.
Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 – 30/08/05 Prof. Pushpak Bhattacharyya Fundamentals of Information.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 9– Uncertainty.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Artificial Intelligence
Data Science Algorithms: The Basic Methods
SAD: 6º Projecto.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Machine Learning Chapter 3. Decision Tree Learning
Learning Chapter 18 and Parts of Chapter 20
Presentation transcript:

CS-424 Gregory Dudek Today’s outline Administrative issues –Assignment deadlines: 1 day = 24 hrs (holidays are special) –The project –Assignment 3 –Midterm results & review Learning –Decision trees Building them Building good ones Entropy ID3 –Sub-symbolic learning Neural networks

CS-424 Gregory Dudek The Midterm

CS-424 Gregory Dudek Decision trees: issues Constructing a decision tree is easy… really easy! –Just add examples in turn. Difficulty: how can we extract a simplified decision tree? –This implies (among other things) establishing a preference order (bias) among alternative decision trees. –Finding the smallest one proves to be VERY hard. Improving over the trivial one is okay.

CS-424 Gregory Dudek Office size example Training examples: 1. large ^ cs ^ faculty -> yes 2. large ^ ee ^ faculty -> no 3. large ^ cs ^ student -> yes 4. small ^ cs ^ faculty -> no 5. small ^ cs ^ student -> no The questions about office size, department and status tells use something about the mystery attribute. Let’s encode all this as a decision tree.

CS-424 Gregory Dudek Decision tree #1 size / \ large small / \ dept no {4,5} / \ cs ee / \ yes {1,3} no {2}

CS-424 Gregory Dudek Decision tree #2 status / \ faculty student / \ dept dept / \ / \ cs ee ee cs / \ / \ size no ? size / \ / \ large small large small / \ / \ yes no {4} yes no {5}

CS-424 Gregory Dudek Making a tree How can we build a decision tree (that might be good)? Objective: an algorithm that builds a decision tree from the root down. Each node in the decision tree is associated with a set of training examples that are split among its children. Input: a node in a decision tree with no children and associated with a set of training examples Output: a decision tree that classifies all of the examples i.e., all of the training examples stored in each leaf of the decision tree are in the same class

CS-424 Gregory Dudek Procedure: Buildtree If all of the training examples are in the same class, then quit, else 1. Choose an attribute to split the examples. 2. Create a new child node for each value of the attribute. 3. Redistribute the examples among the children according to the attribute values. 4. Apply buildtree to each child node. Is this a good decision tree? Maybe? How do we decide?

CS-424 Gregory Dudek A “Bad” tree To identify an animal (goat,dog,housecat,tiger) Is it a wolf? Is it in the cat family? Is it a tiger? wolf cat tiger dog no yes no

CS-424 Gregory Dudek Max depth 3. To get to fish or goat, it takes three questions.ions. In general, a bad tree for N categories can take N questions.

CS-424 Gregory Dudek Can’t we do better? A good tree? Max depth 2 questions. –More generally, log 2 (N) questions. Cat family? Tiger?Dog?

CS-424 Gregory Dudek Best Property Need to select property / feature / attribute Goal: find short tree (Occam's razor) 1. Base this on MAXIMUM depth 2. Base this on the AVERAGE depth A) over all leaves B) over all queries select most informative feature –One that best splits (classifies) the examples Use measure from information theory –Claude Shannon (1949)

CS-424 Gregory Dudek Optimizing the tree All based on buildtree. To minimize maximum depth, we want to build a balanced tree. Put the training set (TS) into any order. For each question Q –Construct a K-tuple of 0s and 1s The jth entry in the tuple is –1 if the jth instance in the TS has answer YES to Q –0 if it has answer NO Discard al questions of only one answer (all 0 or 1’s).

CS-424 Gregory Dudek Min Max Depth Minimize max depth: At each query, come as close as possible to cutting the number of samples in the subtree in half. This suggests the number of questions per subtree is given by the log 2 of the number of sample categories to be subdivided. –Why log 2 ??

CS-424 Gregory Dudek Entropy Measures the (im) purity in collection S of examples Entropy(S) = - [ p + log 2 (p + ) + p - log 2 (p - ) ] p + is the proportion of positive examples. p - is the proportion of negative examples. N.B. This is not a fully general definition of entropy.

CS-424 Gregory Dudek Example S, 14 examples, 9 positive, 5 negative Entropy([9+,5-]) = -(9/14) log 2 (9/14) - (5/14)log 2 (5/14) = 0.940

CS-424 Gregory Dudek Intuition / Extremes Entropy in collection is zero if all examples in same class. Entropy is 1 if equal number of positive and negative examples. Intuition: If you pick random example, how many bits do you need to specify what class the example belongs too?

CS-424 Gregory Dudek Entropy: definition Often referred to a “randomness”. How useful is a question: –How much guessing does knowing an answer save? How much “surprise” value is there in a question.

CS-424 Gregory Dudek Information Gain

CS-424 Gregory Dudek General definition Entropy(S) = c ∑ 1 p i log 2 (p i )

CS-424 Gregory Dudek ID3 Considered information implicit in a query about a set of examples. –This provides the total amount of information implicit in a decision tree. –Each question along the tree provides some fraction of this total information. How much ?? Consider information gain per attribute A. –Gain(Q:X) = E(Q:X) - E(A:X) –Info needed to complete tree is weighted sum of the subtrees E(P) = |E i |/|X| I(E i ) Has been used with success for diverse problems including chess endgames & loans [Quinlan 1983, 1986: Michalski et al. book, Machine Learning; Machine Learning journal]

CS-424 Gregory Dudek In this lecture we consider some alternative hypothesis spaces based on continuous functions. Consider the following boolean circuit. x |\ |\ | x | |/ |\ | |/ NOT AND | f(x1,x2,x3) |\ |/ | OR x |/ AND

CS-424 Gregory Dudek The topology is fixed and logic elements are fixed so there is a single Boolean function. Is there a fixed topology that can be used to represent a family of functions? Yes! Neural-like networks (aka artificial neural networks) allow us this flexibility and more; we can represent arbitrary families of continuous functions using fixed topology networks.

CS-424 Gregory Dudek The idealized neuron Artificial neural networks come in several “flavors”. –Most of based on a simplified model of a neuron. A set of (many) inputs. One output. Output is a function of the sum on the inputs. –Typical functions: Weighted sum Threshold Gaussian

CS-424 Gregory Dudek