Download presentation
Presentation is loading. Please wait.
Published byLaurel Gregory Modified over 9 years ago
1
Lecture 12, CS5671 Decisions, Decisions Concepts Naïve Bayesian Classification Decision Trees –General Algorithm –Refinements Accuracy Scalability –Strengths and Limitations
2
Lecture 12, CS5672 Concepts Problem –Will there be a pop quiz today? Data –(Duration of delay) in entering class –Instructor’s bag (bulging/not bulging) –Instructor (has/does not have) wicked/impish smile on face –There (was/wasn’t) a quiz last class Naïve Bayesian –Calculate P(Pop Quiz) from data with no regard to order of calculation Decision Tree –Evaluate data in a particular branching sequence If …. then elsif …..
3
Lecture 12, CS5673 Naïve Bayesian Goal: To estimate P(M|D) aka Posterior “Naïve” assumption –All data have “free will”- All attributes have independent probability distributions (Mutual information between every pair of attributes = 0) Prior = P(M). Binomial distribution with parameter p = P(Pop Quiz). Thus p = ? P(Pop Quiz|D) = P(D|Pop Quiz) P(Pop Quiz)/P(D) where, for i attributes constituting the data –P(D|Pop Quiz) = i P(D i |Pop Quiz) –P(D) = K (uniform assumption) OR P(D) = i P(D i ) Thus, either calculate explicit P(Pop Quiz|D) OR Max likelihood comparison of P(Pop Quiz|D) and P(No Pop Quiz|D)
4
Lecture 12, CS5674 Decision Trees Directed Graph for reaching a decision Decision = –Verdict –More generally, classification into no of several classes –If (..) OR (…).. then Pop Quiz; If (..) OR (..).. Then no Pop Quiz Given i attributes/data about an instance, navigate graph based on the values of {i} Based on minimizing uncertainty –Greedy approach: Largest drops in uncertainty occur first
5
Lecture 12, CS5675 Decision Trees - Training Given: Set of labeled data (D i = {a i }, C k ) Goal: To find best classification tree Maximum uncertainty = log 2 (|class|) = 1 bit for a 2-class problem Entropy for given data E(D) = - n P(C k ) log P(C k ), for n classes Conditional/Residual entropy E(D|a i, C k ) = l { [|D l | / |D|].E(D l ) } for l subsets Reduction in uncertainty = Gain of Information Gain G(a i ) = E(D) – E(D|a i, C k )
6
Lecture 12, CS5676 Decision Trees - Training Find a i | [G(a i ) > G(a j ) for all i j] Root = a i with children = subsets of data falling into each range of a i Iterate through remaining list of attributes till all a i have been considered Label each subset with majority class label Optional, highly recommended, steps: –Prepruning and postpruning: Avoid over-fitting
7
Lecture 12, CS5677 Decision Trees - Training Caveat with previous approach: –Subsetting with single or few data point(s) highly favored –In other words, variable with higher resolution (that have a wider range of possibilities) favored Gain ratio –Alternative to information gain as criterion for choice of attributes –Compensates for bias towards high scores for a i with high resolution (higher number of states for that attribute) –Gain ratio = G(a i ) / E(a i ) Gini –Recommended for attribute selection for large training sets for scalability –Gini Index = 1 - P k 2
8
Lecture 12, CS5678 Decision Trees - Limitation Rectangular (Hypercuboidal) data space partitioning assumption Not the best solution where the hyperline/plane is not orthogonal to data dimensions Greeedy strategy can easily lead to overfitting
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.