DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.

DECISION TREES Asher Moody, CS 157B

Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion

Decision Tree  Decision trees are a fundamental technique used in data mining.  Decision trees are used for classification, clustering, feature selection, and prediction.

Motivation  Decision trees help accurate classify data  Decision trees help understand the predictive nature of the data by recognizing patterns  Decision trees depict the relationships between input data and target outputs

Algorithms  Decision trees algorithms are greedy so once test has been selected to partition the data other options will not be explored  Popular Algorithms  Computer Science: ID3, C4.5, and C5.0  Statistics: Classification and Regression Trees (CART)

ID3 Algorithm  Given: Examples(S); Target attribute (C); Attributes (R)  Initialize Root  Function ID3 (S,C,R)  Create a Root node for the tree  IF S = empty, return a single node with value Failure;  IF S = C, return a single node C;  IF R = empty, return a single node with most frequent target attribute (C);  ELSE  BEGIN… (next slide)

ID3 (cont)  BEGIN  Let D be the attribute with largest Gain Radio (D, S) among attributes in R;  Let {d j | j = 1, 2, …, n} be the values of attribute D;  Let {S j | j = 1, 2, …, n} be the subsets of S consisting respectively of records with value d j for attribute D;  Return a tree with root labeled D arcs d 1, d 2, …, d n going respectively to the trees;  For each branch in the tree  IF S = empty, add a new branch with most frequent C;  ELSE  ID3 (S1, C, R – {D}), ID3 (S2, C, R – {D}), …, IDC(Sn, C, R – {D})  END ID3  Return Root

Example 1

Example 2

Entropy  Entropy gives us a measure of how uncertain we are about the data  Maximum: The measure should be maximal if all the outcomes are equally likely (uncertainty is highest when all possible events are equiprobable). where Pi is the proportion of instances in the dataset that take the ith value of the target attribute

Information Gain  Gain calculates the reduction in entropy (gain in information) that would result from splitting the data at a particular attribute A. where v is a value of A, |Sv| is the subset of instances of S where A takes the value v, and |S| is the number of instances

Applications  Business: to track purchasing patterns  Medical: identify potential risks associated with diseases  Banks: identify potential credit risks  Governments: to determine features of potential terrorists  Seismology: to predict earthquakes

Conclusion  Search through attributes to find the proportions  Calculate the entropy for each possible data input for a particular attribute  Calculate the gain for each attribute  Make the attribute with the highest gain the root node  Continue the process until decision tree is complete

References  Berry, M. W. (2006). Lecture Notes in Data Mining. World Scientific  http://www.decisiontrees.net  http://en.wikipedia.org/wiki/Entropy  http://en.wikipedia.org/wiki/Information_gain_in_ decision_trees

DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.

Similar presentations

Presentation on theme: "DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.

Similar presentations

Presentation on theme: "DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion."— Presentation transcript:

Similar presentations

About project

Feedback