Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intro. to Data Mining Chapter 6. Bayesian.

Similar presentations


Presentation on theme: "Intro. to Data Mining Chapter 6. Bayesian."— Presentation transcript:

1 Intro. to Data Mining Chapter 6. Bayesian

2 What Is Classification?
Model Learning Training Instances Positive Prediction Model ok Test Instances Negative

3 Typical Classification Methods
age? student? credit rating? <=30 >40 no yes 31..40 fair excellent Decision Tree Support Vector Machine and many more… Family History LungCancer PositiveXRay Smoker Emphysema Dyspnea Bayesian Network Neural Network ok

4 Pattern-Based Classification, Why?
Frequent Pattern Mining Classification Pattern-Based Pattern-based classification: An integration of both themes Why pattern-based classification? Feature construction Higher order; compact; discriminative E.g., single word → phrase (Apple pie, Apple i-pad) Complex data modeling Graphs (no predefined feature vectors) Sequences Semi-structured/unstructured Data Single feature is not enough Complex data is difficult Background

5 Pattern-Based Classification on Graphs
Inactive Frequent subgraphs Use frequent patterns as features for classification g1 g2 g1 g2 Class 1 Active Mining Transform min_sup=2 Related work Major ones: good and problems: not confined to rule-based, most discriminative features, any classifier Accurate Emerging patterns 2 slides Inactive Inactive

6 Discrete Random Variables
Finite set of possible outcomes X binary:

7 Continuous Random Variable
Probability distribution (density function) over continuous values

8 Conditional probability

9 Mutually exclusive / independence

10 Joint / marginal probability

11 Example

12 Bayes Rule Uses prior probability (事前機率; 先天機率) of each category given no information about an item. Categorization produces a posterior probability (事後機率; 條件機率) distribution over the possible categories given a description of an item.

13 Naïve Bayes Classifier: Training Dataset
C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’ Data to be classified: X = (age <=30, Income = medium, Student = yes Credit_rating = Fair)

14 Naïve Bayes Classifier: another calculation example
P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643 P(buys_computer = “no”) = 5/14= 0.357 Compute P(X|Ci) for each class P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2 P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667 P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4 X = (age <= 30 , income = medium, student = yes, credit_rating = fair) P(X|Ci) : P(X|buys_computer = “yes”) = x x x = 0.044 P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019 P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028 P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007 Therefore, X belongs to class (“buys_computer = yes”)

15 Naive Bayes

16 Naive Bayes example

17 Naive Bayes example

18 Naive Bayes example

19 Different types of variables

20 Discrete variables

21 Continuous variables

22 Continuous variables example

23 Bayes example

24 Bayes classifier example

25 Bayes classifier example

26 Bayes classifier example

27 Bayes classifier example

28 Bayes classifier with several features

29 Bayes classifier with several features

30 language model How to compute this joint probability:
Recall the definition of conditional probabilities p(B|A) = P(A,B)/P(A) Rewriting: P(A,B) = P(A)P(B|A) More variables: P(A,B,C,D) = P(A)P(B|A)P(C|A,B)P(D|A,B,C) P(its, water, is, so, transparent, that) P(“its water is so transparent”) = P(its) × P(water|its) × P(is|its water) × P(so|its water is) × P(transparent|its water is so)

31 N-gram models <s> I am Sam </s>
In general this is an insufficient model of language because language has long- distance dependencies, but… <s> I am Sam </s> <s> Sam I am </s> <s> I do not like green eggs and ham </s>

32 NaiveBayes Example

33 Discussion of bayes’

34 Discussion of bayes’

35 Example of bayes’

36 Laplace estimator

37 Laplace estimator

38 Laplace estimator

39 M-estimate

40 M-estimate

41 M-estimator example

42 Naïve Bayes Classifier: Comments
Advantages Easy to implement Good results obtained in most of the cases Disadvantages Assumption: class conditional independence, therefore loss of accuracy Practically, dependencies exist among variables E.g., hospitals: patients: Profile: age, family history, etc. Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc. Dependencies among these cannot be modeled by Naïve Bayes Classifier How to deal with these dependencies? Bayesian Belief Networks

43 Discussion of bayes’

44 Bayesian Belief Networks
Bayesian belief networks (also known as Bayesian networks, probabilistic networks): allow class conditional independencies between subsets of variables A (directed acyclic) graphical model of causal relationships Represents dependency among the variables Gives a specification of joint probability distribution Nodes: random variables Links: dependency X and Y are the parents of Z, and Y is the parent of P No dependency between Z and P Has no loops/cycles Y Z P X 44 44

45 Bayesian Belief Networks

46 Bayesian Belief Networks

47 Examples of 3-way Bayesian Networks
Marginal Independence: p(A,B,C) = p(A) p(B) p(C) A C B Conditionally independent effects: p(A,B,C) = p(B|A)p(C|A)p(A) B and C are conditionally independent Given A e.g., A is a disease, and we model B and C as conditionally independent symptoms given A A C B

48 Examples of 3-way Bayesian Networks
C Independent Causes: p(A,B,C) = p(C|A,B)p(A)p(B) “Explaining away” effect: Given C, observing A makes B less likely e.g., earthquake/burglary/alarm example A and B are (marginally) independent but become dependent once C is known A C B Markov dependence: p(A,B,C) = p(C|B) p(B|A)p(A)

49 Bayesian Belief Networks

50 Bayesian Belief Networks example

51 Bayesian Belief Networks example

52 Discussion of Bayesian Belief Networks

53 Conditional Independence
A variable (node) is conditionally independent of its non-descendants given its parents. Age Gender Non-Descendants Exposure to Toxics Smoking Parents Cancer is independent of Age and Gender given Exposure to Toxics and Smoking. Cancer Serum Calcium Lung Tumor Descendants

54 The learning task Output: BN modeling data ... Input: training data
B E A C N Call Alarm Burglary Earthquake Newscast Output: BN modeling data e a c b n b e a c n ... Input: training data Input: fully or partially observable data cases? Output: parameters or also structure?

55 Structure learning Goal: find “good” BN structure (relative to data)
Solution: do heuristic search over space of network structures.

56 Search space Space = network structures
Operators = add/reverse/delete edges

57 Heuristic search Use scoring function to do heuristic search (any algorithm). Greedy hill-climbing with randomness works pretty well. score

58 Statistical Independency testing
Statistic formula is used to test for the independence of A and B where = the number of times the expression level of A = a = the number of times the expression level of B = b = the number of times both the expression levels of A = a and B = b respectively. M = total number of data. G2 has the chi-square distribution with appropriate degrees of freedom where rA, rB are the number of expression levels of the data spaces. [Richard E, “Learning Bayesian networks”, 2004 ]

59 Example (Statistical Independency testing)
Suppose G1 and G2 each have expression level {+,-} G1 G2 1 + - 2 3 4 5 6 7 8 G2 = + G2 = - G1 = + 1 2 3 G1 = - 5 4 We cannot reject the hypothesis that the G1 and G2 are independence.


Download ppt "Intro. to Data Mining Chapter 6. Bayesian."

Similar presentations


Ads by Google