DECISION TREES. Decision trees  One possible representation for hypotheses.

DECISION TREES

Decision trees  One possible representation for hypotheses

Choosing an attribute  Idea : a good attribute splits the examples into subsets that are ( ideally ) " all positive " or " all negative "  Which is a better choice ?  Patrons

Using information theory  Implement Choose-Attribute in the DTL algorithm based on information content – measured by Entropy  Entropy is the measure of uncertainty of a random variable  More uncertainty leads to higher entropy  More knowledge leads to lower entropy

Entropy

Entropy Examples

Information Gain  Measures Reduction in Entropy achieved because of the split.  Choose the split that achieves most reduction ( maximizes Information Gain )  Disadvantage : Tends to prefer splits that result in large number of partitions, each being small but pure.

Information Gain Example  Consider the attributes Patrons and Type :  Patrons has the highest Information Gain of all attributes and so is chosen by the DTL algorithm as the root

Learned Restaurant Tree  Decision tree learned from the 12 examples :  Substantially simpler than the full tree  Raining and Reservation were not necessary to classify all the data.

Stopping Criteria  Stop expanding a node when all the records belong to the same class  Stop expanding a node when all the records have similar attribute values

Overfitting  Overfitting results in decision trees that are more complex than necessary  Training error does not provide a good estimate of how well the tree will perform on previously unseen records ( need a test set )

How to Address Overfitting 1 …

How to Address Overfitting 2 …

How to Address Overfitting…  Is the early stopping rule strictly better than pruning ( i. e., generating the full tree and then cutting it )?

Remaining Challenges…  Continuous values :  Need to be split into discrete categories.  Sort all values, then consider split points between two examples in sorted order that have different classifications.  Missing values :  Affect how an example is classified, information gain calculations, test set error rate.  Pretend that the example has all possible values for the missing attribute, weight by its frequency among all the examples in the current node.

Summary  Advantages of decision trees :  Inexpensive to construct  Extremely fast at classifying unknown records  Easy to interpret for small - sized trees  Accuracy is comparable to other classification techniques for many simple data sets  Learning performance = prediction accuracy measured on test set

K - NEAREST NEIGHBORS

K - Nearest Neighbors  What value do we assign to the green sample ?

K - Nearest Neighbors k = 1 k = 3

Decision Regions for 1- NN

K - Nearest Neighbors

Weighting the Distance to Remove Irrelevant Features + + + + + + + + o o oo o o o o o o o o o o o o o o ?

+ + + + + + + + o o oo o o o o o o o o o o o o o o ?

++++++++oooooooooooooooooo ?

Nearest Neighbors Search q p

Quadtree

28 Quadtree Construction Input : point set P while Some cell C contains more than 1 point do Split cell C end j k fg l d a b c e i h X 400 100 0 h b i a c de g f k j Y l X 25, Y 300 X 50, Y 200 X 75, Y 100

Nearest Neighbor Search

Quadtree - Query X Y X1,Y1 P≥X1 P≥Y1 P<X1 P<Y1 P≥X1 P<Y1 P<X1 P≥Y1 X1,Y1

Quadtree - Query X Y In many cases works X1,Y1 P<X1 P<Y1 P<X1 P≥Y1 X1,Y1 P≥X1 P≥Y1 P≥X1 P<Y1

Quadtree – Pitfall 1 X Y In some cases doesn’t: there could be points in adjacent buckets that are closer X1,Y1 P≥X1 P≥Y1 P<X1 P<Y1 P≥X1 P<Y1 P<X1 P≥Y1 X1,Y1

Quadtree – Pitfall 2 X Y Could result in Query time Exponential in dimensions

 Simple data structure.  Versatile, easy to implement.  Often space and time inefficient. Quadtree

kd - trees ( k - dimensional trees )  Main ideas :  one - dimensional splits  instead of splitting in the middle, choose the split “ carefully ” ( many variations )  nearest neighbor queries same as for quad - trees

2- dimensional kd - trees  Algorithm  Choose x or y coordinate ( alternate between them ).  Choose the median of the coordinate this defines a horizontal or vertical line.  Recurse on both sides until there is only one point left, which is stored as a leaf.  We get a binary tree  Size O ( n ).  Construction time O ( nlogn ).  Depth O ( logn ).

Nearest Neighbor with KD Trees We traverse the tree looking for the nearest neighbor of the query point.

Examine nearby points first: Explore the branch of the tree that is closest to the query point first. Nearest Neighbor with KD Trees

When we reach a leaf node: compute the distance to each point in the node. Nearest Neighbor with KD Trees

Then we can backtrack and try the other branch at each node visited. Nearest Neighbor with KD Trees

Each time a new closest node is found, we can update the distance bounds. Nearest Neighbor with KD Trees

Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor. Nearest Neighbor with KD Trees

Summary of K - Nearest Neighbor

DECISION TREES. Decision trees  One possible representation for hypotheses.

Similar presentations

Presentation on theme: "DECISION TREES. Decision trees  One possible representation for hypotheses."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DECISION TREES. Decision trees  One possible representation for hypotheses.

Similar presentations

Presentation on theme: "DECISION TREES. Decision trees  One possible representation for hypotheses."— Presentation transcript:

Similar presentations

About project

Feedback