Download presentation
Presentation is loading. Please wait.
1
Intro. to Data Mining Chapter 6. Bayesian
2
What Is Classification?
Model Learning Training Instances Positive Prediction Model ok Test Instances Negative
3
Typical Classification Methods
age? student? credit rating? <=30 >40 no yes 31..40 fair excellent Decision Tree Support Vector Machine and many more… Family History LungCancer PositiveXRay Smoker Emphysema Dyspnea Bayesian Network Neural Network ok
4
Pattern-Based Classification, Why?
Frequent Pattern Mining Classification Pattern-Based Pattern-based classification: An integration of both themes Why pattern-based classification? Feature construction Higher order; compact; discriminative E.g., single word → phrase (Apple pie, Apple i-pad) Complex data modeling Graphs (no predefined feature vectors) Sequences Semi-structured/unstructured Data Single feature is not enough Complex data is difficult Background
5
Pattern-Based Classification on Graphs
Inactive Frequent subgraphs Use frequent patterns as features for classification g1 g2 g1 g2 Class 1 Active Mining Transform min_sup=2 Related work Major ones: good and problems: not confined to rule-based, most discriminative features, any classifier Accurate Emerging patterns 2 slides Inactive Inactive
6
Discrete Random Variables
Finite set of possible outcomes X binary:
7
Continuous Random Variable
Probability distribution (density function) over continuous values
8
Conditional probability
9
Mutually exclusive / independence
10
Joint / marginal probability
11
Example
12
Bayes Rule Uses prior probability (事前機率; 先天機率) of each category given no information about an item. Categorization produces a posterior probability (事後機率; 條件機率) distribution over the possible categories given a description of an item.
13
Naïve Bayes Classifier: Training Dataset
C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’ Data to be classified: X = (age <=30, Income = medium, Student = yes Credit_rating = Fair)
14
Naïve Bayes Classifier: another calculation example
P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643 P(buys_computer = “no”) = 5/14= 0.357 Compute P(X|Ci) for each class P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2 P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667 P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4 X = (age <= 30 , income = medium, student = yes, credit_rating = fair) P(X|Ci) : P(X|buys_computer = “yes”) = x x x = 0.044 P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019 P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028 P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007 Therefore, X belongs to class (“buys_computer = yes”)
15
Naive Bayes
16
Naive Bayes example
17
Naive Bayes example
18
Naive Bayes example
19
Different types of variables
20
Discrete variables
21
Continuous variables
22
Continuous variables example
23
Bayes example
24
Bayes classifier example
25
Bayes classifier example
26
Bayes classifier example
27
Bayes classifier example
28
Bayes classifier with several features
29
Bayes classifier with several features
30
language model How to compute this joint probability:
Recall the definition of conditional probabilities p(B|A) = P(A,B)/P(A) Rewriting: P(A,B) = P(A)P(B|A) More variables: P(A,B,C,D) = P(A)P(B|A)P(C|A,B)P(D|A,B,C) P(its, water, is, so, transparent, that) P(“its water is so transparent”) = P(its) × P(water|its) × P(is|its water) × P(so|its water is) × P(transparent|its water is so)
31
N-gram models <s> I am Sam </s>
In general this is an insufficient model of language because language has long- distance dependencies, but… <s> I am Sam </s> <s> Sam I am </s> <s> I do not like green eggs and ham </s>
32
NaiveBayes Example
33
Discussion of bayes’
34
Discussion of bayes’
35
Example of bayes’
36
Laplace estimator
37
Laplace estimator
38
Laplace estimator
39
M-estimate
40
M-estimate
41
M-estimator example
42
Naïve Bayes Classifier: Comments
Advantages Easy to implement Good results obtained in most of the cases Disadvantages Assumption: class conditional independence, therefore loss of accuracy Practically, dependencies exist among variables E.g., hospitals: patients: Profile: age, family history, etc. Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc. Dependencies among these cannot be modeled by Naïve Bayes Classifier How to deal with these dependencies? Bayesian Belief Networks
43
Discussion of bayes’
44
Bayesian Belief Networks
Bayesian belief networks (also known as Bayesian networks, probabilistic networks): allow class conditional independencies between subsets of variables A (directed acyclic) graphical model of causal relationships Represents dependency among the variables Gives a specification of joint probability distribution Nodes: random variables Links: dependency X and Y are the parents of Z, and Y is the parent of P No dependency between Z and P Has no loops/cycles Y Z P X 44 44
45
Bayesian Belief Networks
46
Bayesian Belief Networks
47
Examples of 3-way Bayesian Networks
Marginal Independence: p(A,B,C) = p(A) p(B) p(C) A C B Conditionally independent effects: p(A,B,C) = p(B|A)p(C|A)p(A) B and C are conditionally independent Given A e.g., A is a disease, and we model B and C as conditionally independent symptoms given A A C B
48
Examples of 3-way Bayesian Networks
C Independent Causes: p(A,B,C) = p(C|A,B)p(A)p(B) “Explaining away” effect: Given C, observing A makes B less likely e.g., earthquake/burglary/alarm example A and B are (marginally) independent but become dependent once C is known A C B Markov dependence: p(A,B,C) = p(C|B) p(B|A)p(A)
49
Bayesian Belief Networks
50
Bayesian Belief Networks example
51
Bayesian Belief Networks example
52
Discussion of Bayesian Belief Networks
53
Conditional Independence
A variable (node) is conditionally independent of its non-descendants given its parents. Age Gender Non-Descendants Exposure to Toxics Smoking Parents Cancer is independent of Age and Gender given Exposure to Toxics and Smoking. Cancer Serum Calcium Lung Tumor Descendants
54
The learning task Output: BN modeling data ... Input: training data
B E A C N Call Alarm Burglary Earthquake Newscast Output: BN modeling data e a c b n b e a c n ... Input: training data Input: fully or partially observable data cases? Output: parameters or also structure?
55
Structure learning Goal: find “good” BN structure (relative to data)
Solution: do heuristic search over space of network structures.
56
Search space Space = network structures
Operators = add/reverse/delete edges
57
Heuristic search Use scoring function to do heuristic search (any algorithm). Greedy hill-climbing with randomness works pretty well. score
58
Statistical Independency testing
Statistic formula is used to test for the independence of A and B where = the number of times the expression level of A = a = the number of times the expression level of B = b = the number of times both the expression levels of A = a and B = b respectively. M = total number of data. G2 has the chi-square distribution with appropriate degrees of freedom where rA, rB are the number of expression levels of the data spaces. [Richard E, “Learning Bayesian networks”, 2004 ]
59
Example (Statistical Independency testing)
Suppose G1 and G2 each have expression level {+,-} G1 G2 1 + - 2 3 4 5 6 7 8 G2 = + G2 = - G1 = + 1 2 3 G1 = - 5 4 We cannot reject the hypothesis that the G1 and G2 are independence.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.