Download presentation
Presentation is loading. Please wait.
Published bySteven Abner Harvey Modified over 9 years ago
1
CIS 335 CIS 335 Data Mining Classification Part I
2
CIS 335 what is a model? it can be a set of statistics or rules, tree, neural net, linear, etc how to build the model? assumptions?
3
CIS 335 what are applications of classification?
4
CIS 335 Labels the goal is to predict the class of an unlabeled instance what are examples of classes? how many labels can each have? is it feasible to get labeled instances? class label is discrete and unordered - why? numeric prediction is done by regression
5
CIS 335 sets training set validation set test set cross-validation (n-fold)
6
CIS 335 definitions instances, tuples, records, samples, rows,... attributes, features, variables,
7
CIS 335 two step process: learning (induction) predicting how does this relate to your own decision- making process?
8
CIS 335 data mining supervised - there is a class and labeled instances are available classification anomaly detection unsupervised - no class clustering association analysis
9
CIS 335 mapping function y = f(X) X is the instance f is the model, learned from the training data y is the class sometimes there are several "discriminators": f 1, f 2, f 3 - one for each class
10
CIS 335 overfitting model too accurately describes the training data doesn't do very well on new instances imagine a classifier that predicts student success based on g-number one that generalizes can be better sometimes post-processing can improve generalization how do you overfit?
11
CIS 335 accuracy number of correct predictions / total predictions for the confusion matrix: it is 98/115 =.85 what about the one below: 01 04610 1752 abc a2131 b5452 c7427
12
CIS 335 decision trees model is the tree itself each branch is a test and the leafs are labels to classify an instance, trace the path through the tree where have you seen decision trees? old male uncle aunt cousin y y n n
13
CIS 335 what are they good for? classifying, of course give a description of the data (exploratory) tree form is intuitive simple and fast
14
CIS 335 induction ID3 -> c4.5 and CART were early classifiers (J48 is c4.5) input is instances with attributes and labels output is tree
15
CIS 335 Goal: pure leaves use splits to isolate each split makes leaves more pure yellow small tang orange lemon n y y n colorsizefruit orangesmalltang yellowsmalllemon yellowsmalllemon orangesmalltang orangesmallorange largeorange largeorange
16
CIS 335 Measuring Purity +6 -14 attr x + n y - +8 -2 gini is a common metric for left leaf=1-(.8 2 +.2 2 )=.32 for right leaf=1-(.7 2 +.3 2 )=.42 for the entire split, use weighted sum gini(split)=.32*.33+.42*.67=.38 7
17
CIS 335 Expanding the tree nodes that are not very pure can be further split on another attribute process can continue until all nodes are pure a threshold is met +5 -2 attr x + n y +8 -2 attr y + 0 1 - +1 -12 +6 -14
18
CIS 335 Numeric attributes and other splits choose a good number – one that produces the lowest gini evaluate all possible splits multiway splits are also possible e.g. marital status: S, D, M attr z + <10 ≥10 -
19
CIS 335 Greedy Algorithms example TSP
20
CIS 335 greedy algorithm look through each attribute calc result of split using gini or other measure select attribute/split with best result split can be discrete continous value binary with splitting sets (careful about ordinal)
21
CIS 335 selection measures based on purity information gain gain ratio gini
22
CIS 335 pruning postprocessing subtrees can be removed if the purity is "good enough" sometimes subtrees can be repeated or replicated
23
CIS 335 Bayes classifier based on Bayes theorem good accuracy and speed assumes iid independence of attributes
24
CIS 335 Probability teenm/fbuy yfy nmy nfy nfy ymy nfy nmn ymn ymn nfn yfn ymn yfn ymn counts how many total records _______ how many teens _______ how many female _______ how many buy _______ what is probability of teens → p(teen=y) ______ of males → p(gender=male) _____ of buying → p(buy = y) _______
25
CIS 335 Conditional Probability teenm/fbuy yfy nmy nfy nfy ymy nfy nmn ymn ymn nfn yfn ymn yfn ymn of those that bought how many teens _____ how many male _____ p(teen=y | buy=y): probability of being a teen given that you bought what is the conditional probability p(teen | buy) ______ p(female | not buy) ______
26
CIS 335 Conditional Probability, cont. teenm/fbuy yfy nmy nfy nfy ymy nfy nmn ymn ymn nfn yfn ymn yfn ymn formula: let x be the event that cust is teen and y be the event that they buy what is p(x,y)? _______ what is p(x)? _______ what is p(x|y)? ________
27
CIS 335 Bayes formula derivation p(x,y) is the same as p(y,x) according to definition of conditional prob: and so and thus rearranging, we have
28
CIS 335 Bayes theorem variables: X is an instance, C is the class want to know p(C 0 | X) (probability that class is 0 given the evidence X) p(C 0 | X) is the posterior probability p(C 0 ) is the prior p(X) is the evidence p(X | C 0 ) is the likelihood
29
CIS 335 Calculating posterior directly teenm/fbuy yfy nmy nfy nfy ymy nfy nmn ymn ymn nfn yfn ymn yfn ymn p(buy | teen) = 2/8 p(buy | male) = 2/7 this can be done easily for one attribute p(buy | not teen, male) there are only two instances think about it for 100 attributes – data is just not available
30
CIS 335 example want to predict whether or not you will have a good day based on if you have had breakfast and whether the sun is shining let X={x 1,x 2 } be an instance, x 1 is breakfast(Y/N), x 2 is sunshine(Y/N) C is the class0=bad day, 1=good day
31
CIS 335 Naive Bayes p(C 0 | x 1,x 2 ) = p(x 1,x 2 | C 0 ) p(C 0 ) / p(X) problem is p(x1,x2 | C 0 ) is complex simplify by assuming attribute values are independent of each other
32
CIS 335 collecting data for discrete attributes p(C 0 ) = number of bad days / number of days p(x 1 =1 | C 0 ) is the number of bad days you had breakfast p(x 2 =1 | C 0 ) is the number of bad days the sun was shining p(x 1 =0 | C 0 ) is the number of bad days you didn't have breakfast p(x 2 =0 | C 0 ) is the number of bad days the sun wasn't shining p(x 1 =0,x 2 =0) percentage of days you didn't have breakfast and the sun was shining
33
CIS 335 another simplification do not have to calculate p(X) since it is the same for all posteriors regardless of class if > then >
34
CIS 335 collecting data for continuous attributes same general idea for the discrete attributes separate all values for an attribute for a particular class calculate mean and s.d. use these to calculate the prob for a particular value
35
CIS 335 comparing results confusion matrix precision TP/(TP+FP) recallTP/(TP+FN) accuracy(TP+TN)/(TP+FN+FP+TN) f1 metric2*prec*recall / (prec+recall) predicted actual yesno yesTPFN noFPTN
36
CIS 335 example precision = 95 / 109 = 0.87 recall = 95 / 98 = 0.97 accuracy = 182 / 199 = 0.91 f1 = 0.92 predicted actual yesno yes953 no1487
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.