Collective Intelligence Week 7: Decision Trees

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Decision Tree Algorithm
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Classification Continued
Ensemble Learning (2), Tree and Forest
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Introduction to Directed Data Mining: Decision Trees
ID3 Algorithm Allan Neymark CS157B – Spring 2007.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Mohammad Ali Keyvanrad
Chapter 9 – Classification and Regression Trees
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Support Vector Machines: a different approach to finding the decision boundary, particularly good at generalisation finishing off last lecture …
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1)
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
ECE 471/571 – Lecture 20 Decision Tree 11/19/15. 2 Nominal Data Descriptions that are discrete and without any natural notion of similarity or even ordering.
Make every interaction count™ Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting.
Classification and Regression Trees
Decision Trees Prof. Carolina Ruiz Dept. of Computer Science WPI.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Random Forests Feb., 2016 Roger Bohn Big Data Analytics 1.
Collective Intelligence Week 12: Kernel Methods & SVMs Old Dominion University Department of Computer Science CS 795/895 Spring 2009 Michael L. Nelson.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
10. Decision Trees and Markov Chains for Gene Finding.
Collective Intelligence Week 11: k-Nearest Neighbors
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
k-Nearest neighbors and decision tree
Introduction to Machine Learning and Tree Based Methods
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 - pruning decision trees
Artificial Intelligence
Decision Trees (suggested time: 30 min)
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
ID3 Algorithm.
Web Server Design Assignment #5 Extra Credit
Learning with Identification Trees
Economics of Information Week 1
Roberto Battiti, Mauro Brunato
Random Survival Forests
Machine Learning: Lecture 3
Decision Trees By Cole Daily CSCI 446.
Web Server Design Assignment #2: Conditionals & Persistence
Classification with CART
From Heather’s blog:
©Jiawei Han and Micheline Kamber
MIS2502: Data Analytics Classification Using Decision Trees
Decision trees MARIO REGIN.
Web Programming Assignment 4 - Extra Credit
Data Mining CSCI 307, Spring 2019 Lecture 15
Web Server Design Assignment #5 Extra Credit
Introduction to Digital Libraries Assignment #1
Presentation transcript:

Collective Intelligence Week 7: Decision Trees Old Dominion University Department of Computer Science CS 795/895 Spring 2009 Michael L. Nelson <mln@cs.odu.edu> 2/25/09

Decision Trees A decision tree for classifying fruit

Scenario: Predicting Subscriptions Your web site offers premium content. You run a promotion offering free subscriptions for some period of time. You collect mostly http-level information about the people signing up. Can we predict who will sign up for basic or premium service at the end of the trial period based on the data we’ve collected?

User Data add this line to make the table match the data / code slashdot UK no 21 None add this line to make the table match the data / code

Is Reading the FAQ a Good Predictor For Subscription? >>> import treepredict >>> treepredict.divideset(treepredict.my_data,2,'yes') ([['slashdot', 'USA', 'yes', 18, 'None'], ['google', 'France', 'yes', 23, 'Premium'], ['digg', 'USA', 'yes', 24, 'Basic'], ['kiwitobes', 'France', 'yes', 23, 'Basic'], ['slashdot', 'France', 'yes', 19, 'None'], ['digg', 'New Zealand', 'yes', 12, 'Basic'], ['google', 'UK', 'yes', 18, 'Basic'], ['kiwitobes', 'France', 'yes', 19, 'Basic']], [['google', 'UK', 'no', 21, 'Premium'], ['(direct)', 'New Zealand', 'no', 12, 'None'], ['(direct)', 'UK', 'no', 21, 'Basic'], ['google', 'USA', 'no', 24, 'Premium'], ['digg', 'USA', 'no', 18, 'None'], ['google', 'UK', 'no', 18, 'None'], ['kiwitobes', 'UK', 'no', 19, 'None'], ['slashdot', 'UK', 'no', 21, 'None']]) line breaks & spaces added for clarity (cf. Table 7-2) Eyeballing the result, it doesn’t appear FAQ is a good predictor

Gini Impurity a measure of set homogeneity; >>> treepredict.giniimpurity(treepredict.my_data) 0.6328125 >>> set1,set2=treepredict.divideset(treepredict.my_data,2,'yes') >>> treepredict.giniimpurity(set1) 0.53125 >>> treepredict.giniimpurity(set2) >>> set1 [['slashdot', 'USA', 'yes', 18, 'None'], ['google', 'France', 'yes', 23, 'Premium'], ['digg', 'USA', 'yes', 24, 'Basic'], ['kiwitobes', 'France', 'yes', 23, 'Basic'], ['slashdot', 'France', 'yes', 19, 'None'], ['digg', 'New Zealand', 'yes', 12, 'Basic'], ['google', 'UK', 'yes', 18, 'Basic'], ['kiwitobes', 'France', 'yes', 19, 'Basic']] a measure of set homogeneity; explanation/code on p. 147; 0 = homogeneous set IG = (#none/#total)(1-#none/#total) + (#basic/#total)(1-#basic/#total) + (#premium/#total)(1-#premium/#total) = (2/8)(6/8) + (5/8)(3/8) + (1/8)(7/8) = 0.53125

Entropy a measure of set disorder; explanation/code on p. 148; >>> treepredict.entropy(treepredict.my_data) 1.5052408149441479 >>> set1,set2=treepredict.divideset(treepredict.my_data,2,'yes') >>> treepredict.entropy(set1) 1.2987949406953985 >>> treepredict.entropy(set2) >>> set1 [['slashdot', 'USA', 'yes', 18, 'None'], ['google', 'France', 'yes', 23, 'Premium'], ['digg', 'USA', 'yes', 24, 'Basic'], ['kiwitobes', 'France', 'yes', 23, 'Basic'], ['slashdot', 'France', 'yes', 19, 'None'], ['digg', 'New Zealand', 'yes', 12, 'Basic'], ['google', 'UK', 'yes', 18, 'Basic'], ['kiwitobes', 'France', 'yes', 19, 'Basic']] a measure of set disorder; explanation/code on p. 148; 0 = homogeneous set H = -(2/8)(log22/8) + -(5/8)(log22/8) + -(1/8)(log21/8) = 0.5 + 0.423 + 0.375 = 1.298

Building a Decision Tree Maximize Information Gain,the difference between the entropy of the current set and the weighted average entropy of the two new groups max(H-H(i)) Recursively repeat on each branch of tree until Information Gain is < 0 i.e., stop when you’re creating more disorder

Building the Tree i=‘google’ gives max(H-H(i)) >>> treepredict.entropy(treepredict.my_data) 1.5052408149441479 >>> set1,set2=treepredict.divideset(treepredict.my_data,0,'slashdot') >>> treepredict.entropy(set1) 0.0 >>> set1 [['slashdot', 'USA', 'yes', 18, 'None'], ['slashdot', 'France', 'yes', 19, 'None'], ['slashdot', 'UK', 'no', 21, 'None']] >>> treepredict.entropy(set2) 1.5262349099495225 >>> set1,set2=treepredict.divideset(treepredict.my_data,0,'digg') 0.91829583405448956 >>> set1,set2=treepredict.divideset(treepredict.my_data,0,'(direct)') 1.0 1.5306189948485172 >>> set1,set2=treepredict.divideset(treepredict.my_data,0,'kiwitobes') >>> set1,set2=treepredict.divideset(treepredict.my_data,0,'google') 1.3709505944546687 0.99403021147695647 [['google', 'France', 'yes', 23, 'Premium'], ['google', 'UK', 'no', 21, 'Premium'], ['google', 'USA', 'no', 24, 'Premium'], ['google', 'UK', 'no', 18, 'None'], ['google', 'UK', 'yes', 18, 'Basic']] >>> set2 [['slashdot', 'USA', 'yes', 18, 'None'], ['digg', 'USA', 'yes', 24, 'Basic'], ['kiwitobes', 'France', 'yes', 23, 'Basic'], ['(direct)', 'New Zealand', 'no', 12, 'None'], ['(direct)', 'UK', 'no', 21, 'Basic'], ['slashdot', 'France', 'yes', 19, 'None'], ['digg', 'USA', 'no', 18, 'None'], ['kiwitobes', 'UK', 'no', 19, 'None'], ['digg', 'New Zealand', 'yes', 12, 'Basic'], ['slashdot', 'UK', 'no', 21, 'None'], ['kiwitobes', 'France', 'yes', 19, 'Basic']] >>> slashdot = 0*3/16 + 1.52*13/16 = 1.24 digg = 0.91*3/16 + 1.52*13/16 = 1.41 (direct) = 1.0*2/16 + 1.53*14/16 = 1.46 kiwitobes = 0.91*3/16 + 1.52*13/16 = 1.41 google = 1.37*5/16 + 0.99*11/16 = 1.10 i=‘google’ gives max(H-H(i)) set1,set2 cardinality not shown in python transcript for digg, (direct), kiwitobes

Viewing the Tree >>> tree=treepredict.buildtree(treepredict.my_data) >>> treepredict.printtree(tree) 0:google? T-> 3:21? T-> {'Premium': 3} F-> 2:yes? T-> {'Basic': 1} F-> {'None': 1} F-> 0:slashdot? T-> {'None': 3} T-> {'Basic': 4} F-> 3:21? F-> {'None': 3} >>> treepredict.drawtree(tree, jpeg='treeview.jpg') >>> http://mln-web.cs.odu.edu/~mln/cs895-s09/chapter7/treeview.jpg

Pruning The Tree The Tree can become overfitted to the >>> treepredict.printtree(tree) 0:google? T-> 3:21? T-> {'Premium': 3} F-> 2:yes? T-> {'Basic': 1} F-> {'None': 1} F-> 0:slashdot? T-> {'None': 3} T-> {'Basic': 4} F-> 3:21? F-> {'None': 3} >>> treepredict.prune(tree,0.1) (same tree) >>> treepredict.prune(tree,0.5) >>> treepredict.prune(tree,0.75) >>> treepredict.prune(tree,0.90) F-> {'None': 6, 'Basic': 5} >>> treepredict.drawtree(tree,jpeg='pruned-tree.jpeg') The Tree can become overfitted to the training data. Pruning checks pairs of nodes with a common parent to see if merging would increase H by < threshold. http://mln-web.cs.odu.edu/~mln/cs895-s09/chapter7/pruned-tree.jpeg

Missing Data >>> # reminder: referer,location,FAQ,pages >>> treepredict.mdclassify(['google',None,'yes',None],tree) {'Premium': 2.25, 'Basic': 0.25} >>> treepredict.mdclassify(['google','France',None,None],tree) {'None': 0.125, 'Premium': 2.25, 'Basic': 0.125} >>> treepredict.mdclassify(['google',None,None,'14'],tree) {'None': 0.5, 'Basic': 0.5} ex1: location & pages unknown FAQ=yes, so 1 outcome faq_weight = 1/1 basic = 1 * 1.0 if pages >20 then 3 outcomes else 1 outcome pages_true_weight=3/4 pages_false_weight=1/4 premium = 3 * 3/4, basic = 1.0 * 1/4 ex2: FAQ & pages unknown if FAQ then 1 outcome else 1 outcome faq_true_weight = 1/2 faq_false_weight = 1/2 none = 1 * 0.5, basic = 1 * 0.5 if pages >20 then 3 outcomes else 2 outcomes (each with weight = 0.5) pages_true_weight=3/4 pages_false_weight=1/4 premium = 3 * 3/4, basic = 0.5 * 1/4, none = 0.5 * 1/4

Numerical, Not Categorical Outcomes height (in) = {56,59,59,61,62,74,76,76,78} this list could be categorized as: short (<65”) tall (>72”) or we could use the integers as values we would use variance as our measure of dispersion, not Gini Impurity or Entropy

Zillow API I could not get the Cambridge, MA >>> import zillow >>> housedata=zillow.getpricelist() 510 Rhode Island 511 Rhode Island 516 Rhode Island 517 Rhode Island 519 Rhode Island 520 Rhode Island 523 Rhode Island 524 Rhode Island 527 Rhode Island 530 Rhode Island 532 Rhode Island 535 Rhode Island 536 Rhode Island 539 Rhode Island >>> import treepredict >>> housetree=treepredict.buildtree(housedata,scoref=treepredict.variance) >>> treepredict.drawtree(housetree,’norfolk.jpeg') >>> #zip,type,yearbuilt,bathrooms,bedrooms,rooms(always 1),est. value >>> housedata [(u'23508', u'SingleFamily', 1918, 2.0, 3, 1, u'281500'), (u'23508', u'SingleFamily', 1925, 2.0, 5, 1, u'408000'), (u'23508', u'SingleFamily', 1918, 1.0, 3, 1, u'367000'), (u'23508', u'SingleFamily', 1920, 1.0, 3, 1, u'317500'), (u'23508', u'SingleFamily', 1932, 2.0, 4, 1, u'329500'), (u'23508', u'SingleFamily', 1923, 1.0, 3, 1, u'239500'), (u'23508', u'SingleFamily', 1923, 2.5, 3, 1, u'262000'), (u'23508', u'SingleFamily', 1918, 1.5, 3, 1, u'272000'), (u'23508', u'SingleFamily', 1918, 2.0, 4, 1, u'279500'), (u'23508', u'SingleFamily', 1914, 2.0, 4, 1, u'306500'), (u'23508', u'SingleFamily', 1913, 1.0, 3, 1, u'266500'), (u'23508', u'Quadruplex', 1920, 4.0, 8, 1, u'541500'), (u'23508', u'SingleFamily', 1927, 2.0, 3, 1, u'321000'), (u'23508', u'SingleFamily', 1918, 1.0, 3, 1, u'229500')] >>> treepredict.mdclassify(['23508','SingleFamily',1920,2.0,4,1,None],housetree) {u'279500': 1} >>> treepredict.mdclassify(['23508','SingleFamily',1920,1.5,4,1,None],housetree) {u'317500': 1} Zillow API I could not get the Cambridge, MA example to work, so I did my street. N.B. -- nonconsecutive house numbers; zillow will just make up results for non-existent houses http://mln-web.cs.odu.edu/~mln/cs895-s09/chapter7/norfolk.jpeg not enough training?

Hot or Not? get 500 random profiles for each profile, get data >>> import hotornot >>> l1=hotornot.getrandomratings(500) >>> len(l1) 440 >>> pdata=hotornot.getpeopledata(l1) >>> pdata[0] (u'male', 18, 'Mid Atlantic', 9) >>> pdata[1] (u'male', 25, 'West', 9) >>> pdata[2] (u'male', 25, 'Midwest', 7) >>> hottree=treepredict.buildtree(pdata,scoref=treepredict.variance) >>> treepredict.drawtree(hottree,'hottree1.jpeg') >>> treepredict.prune(hottree,0.5) >>> treepredict.drawtree(hottree,'hottree2.jpeg') >>> south=treepredict.mdclassify((None,None,'South'),hottree) >>> ne=treepredict.mdclassify((None,None,'New England'),hottree) >>> south[10]/sum(south.values()) 2.1563156135103039e-05 >>> ne[10]/sum(ne.values()) 0.022630716686444379 >>> south {8: 3.7865227968936028, 9: 4.7331534961170032, 10: 0.00030714296600017508, 6: 0.044110401925615043, 7: 5.679784195340404} >>> ne {5: 0.12937459784334263, 6: 0.11040139147344154, 7: 4.3740057708741569, 8: 2.9160038472494381, 9: 3.6450048090617977, 10: 0.25874919568668525} >>> south[9]/sum(south.values()) 0.33229387987391373 >>> south[9]/sum(ne.values()) 0.41397097107803504 >>> ne[9]/sum(ne.values()) 0.31879933360059354 Hot or Not? get 500 random profiles for each profile, get data http://mln-web.cs.odu.edu/~mln/cs895-s09/chapter7/hottree1.jpeg http://mln-web.cs.odu.edu/~mln/cs895-s09/chapter7/hottree2.jpeg

Decision Trees Summary pros Easy to interpret predictive & descriptive categories + numerical data can have nodes with many outcomes (probabilistic outcomes) cons only <, > operators on numerical outcomes doesn’t handle many {inputs|outcomes} well can’t uncover complex relationships between inputs