Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 12, CS5671 Decisions, Decisions Concepts Naïve Bayesian Classification Decision Trees –General Algorithm –Refinements Accuracy Scalability –Strengths.

Similar presentations


Presentation on theme: "Lecture 12, CS5671 Decisions, Decisions Concepts Naïve Bayesian Classification Decision Trees –General Algorithm –Refinements Accuracy Scalability –Strengths."— Presentation transcript:

1 Lecture 12, CS5671 Decisions, Decisions Concepts Naïve Bayesian Classification Decision Trees –General Algorithm –Refinements Accuracy Scalability –Strengths and Limitations

2 Lecture 12, CS5672 Concepts Problem –Will there be a pop quiz today? Data –(Duration of delay) in entering class –Instructor’s bag (bulging/not bulging) –Instructor (has/does not have) wicked/impish smile on face –There (was/wasn’t) a quiz last class Naïve Bayesian –Calculate P(Pop Quiz) from data with no regard to order of calculation Decision Tree –Evaluate data in a particular branching sequence If …. then elsif …..

3 Lecture 12, CS5673 Naïve Bayesian Goal: To estimate P(M|D) aka Posterior “Naïve” assumption –All data have “free will”- All attributes have independent probability distributions (Mutual information between every pair of attributes = 0) Prior = P(M). Binomial distribution with parameter p = P(Pop Quiz). Thus p = ? P(Pop Quiz|D) = P(D|Pop Quiz) P(Pop Quiz)/P(D) where, for i attributes constituting the data –P(D|Pop Quiz) =  i P(D i |Pop Quiz) –P(D) = K (uniform assumption) OR P(D) =  i P(D i ) Thus, either calculate explicit P(Pop Quiz|D) OR Max likelihood comparison of P(Pop Quiz|D) and P(No Pop Quiz|D)

4 Lecture 12, CS5674 Decision Trees Directed Graph for reaching a decision Decision = –Verdict –More generally, classification into no of several classes –If (..) OR (…).. then Pop Quiz; If (..) OR (..).. Then no Pop Quiz Given i attributes/data about an instance, navigate graph based on the values of {i} Based on minimizing uncertainty –Greedy approach: Largest drops in uncertainty occur first

5 Lecture 12, CS5675 Decision Trees - Training Given: Set of labeled data (D i = {a i }, C k ) Goal: To find best classification tree Maximum uncertainty = log 2 (|class|) = 1 bit for a 2-class problem Entropy for given data E(D) = -  n P(C k ) log P(C k ), for n classes Conditional/Residual entropy E(D|a i, C k ) =  l { [|D l | / |D|].E(D l ) } for l subsets Reduction in uncertainty = Gain of Information Gain G(a i ) = E(D) – E(D|a i, C k )

6 Lecture 12, CS5676 Decision Trees - Training Find a i | [G(a i ) > G(a j ) for all i  j] Root = a i with children = subsets of data falling into each range of a i Iterate through remaining list of attributes till all a i have been considered Label each subset with majority class label Optional, highly recommended, steps: –Prepruning and postpruning: Avoid over-fitting

7 Lecture 12, CS5677 Decision Trees - Training Caveat with previous approach: –Subsetting with single or few data point(s) highly favored –In other words, variable with higher resolution (that have a wider range of possibilities) favored Gain ratio –Alternative to information gain as criterion for choice of attributes –Compensates for bias towards high scores for a i with high resolution (higher number of states for that attribute) –Gain ratio = G(a i ) / E(a i ) Gini –Recommended for attribute selection for large training sets for scalability –Gini Index = 1 -  P k 2

8 Lecture 12, CS5678 Decision Trees - Limitation Rectangular (Hypercuboidal) data space partitioning assumption Not the best solution where the hyperline/plane is not orthogonal to data dimensions Greeedy strategy can easily lead to overfitting


Download ppt "Lecture 12, CS5671 Decisions, Decisions Concepts Naïve Bayesian Classification Decision Trees –General Algorithm –Refinements Accuracy Scalability –Strengths."

Similar presentations


Ads by Google