Presentation is loading. Please wait.

Presentation is loading. Please wait.

Final Exam: May 10 Thursday. If event E occurs, then the probability that event H will occur is p ( H | E ) IF E ( evidence ) is true THEN H ( hypothesis.

Similar presentations


Presentation on theme: "Final Exam: May 10 Thursday. If event E occurs, then the probability that event H will occur is p ( H | E ) IF E ( evidence ) is true THEN H ( hypothesis."— Presentation transcript:

1 Final Exam: May 10 Thursday

2 If event E occurs, then the probability that event H will occur is p ( H | E ) IF E ( evidence ) is true THEN H ( hypothesis ) is true with probability p Bayesian reasoning

3 Bayesian reasoning Example: Cancer and Test P(C) = 0.01 P(¬C) = 0.99 P(+|C) = 0.9 P(-|C) = 0.1 P(+|¬C) = 0.2P(-|¬C) = 0.8 P(C|+) = ?

4 Expand the Bayesian rule to work with multiple hypotheses ( H 1... H m ) and evidences ( E 1... E n ) Assuming conditional independence among evidences E 1... E n Bayesian reasoning with multiple hypotheses and evidences

5 Expert data: Bayesian reasoning Example

6 user observes E 3 E 1 E 2

7 Bayesian reasoning Example expert system computes posterior probabilities user observes E 2

8 Propagation of CFs For a single antecedent rule: cf(E) is the certainty factor of the evidence. cf(R) is the certainty factor of the rule.

9 Single antecedent rule example IF patient has toothache THEN problem is cavity {cf 0.3} Patient has toothache {cf 0.9} What is the cf(cavity, toothache)?

10 Propagation of CFs (multiple antecedents) For conjunctive rules: IF AND... AND THEN {cf} For two evidences E1 and E2: cf(E1 AND E2) = min(cf(E1), cf(E2))

11 Propagation of CFs (multiple antecedents) For disjunctive rules: IF OR... OR THEN {cf} For two evidences E1 and E2: cf(E1 OR E2) = max(cf(E1), cf(E2))

12 Exercise IF (P1 AND P2) OR P3 THEN C1 (0.7) AND C2 (0.3) Assume cf(P1) = 0.6, cf(P2) = 0.4, cf(P3) = 0.2 What is cf(C1), cf(C2)?

13 Defining fuzzy sets with fit-vectors A can be defined as: So, for example: Tall men = (0/180, 1/190) Short men=(1/160, 0/170) Average men=(0/165,1/175,0/185)

14 What about linguistic values with qualifiers ? e.g. very tall, extremely short, etc. Hedges are qualifying terms that modify the shape of fuzzy sets e.g. very, somewhat, quite, slightly, extremely, etc. Qualifiers & Hedges

15 Representing Hedges

16

17

18 Crisp Set Operations

19 Complement To what degree do elements not belong to this set? tall men = {0/180, 0.25/182, 0.5/185, 0.75/187, 1/190}; Not tall men = {1/180, 0.75/182, 0.5/185, 0.25/187, 1/190}; Fuzzy Set Operations  ¬ A ( x ) = 1 –  A ( x )

20 Containment Which sets belong to other sets? tall men = {0/180, 0.25/182, 0.5/185, 0.75/187, 1/190}; very tall men = {0/180, 0.06/182, 0.25/185, 0.56/187, 1/190}; Fuzzy Set Operations Each element of the fuzzy subset has smaller membership than in the containing set

21 Intersection To what degree is the element in both sets? Fuzzy Set Operations  A ∩ B ( x ) = min [  A ( x ),  B ( x ) ]

22 tall men = {0/165, 0/175, 0/180, 0.25/182, 0.5/185, 1/190}; average men = {0/165, 1/175, 0.5/180, 0.25/182, 0/185, 0/190}; tall men ∩ average men = {0/165, 0/175, 0/180, 0.25/182, 0/185, 0/190}; or tall men ∩ average men = {0/180, 0.25/182, 0/185};  A ∩ B ( x ) = min [  A ( x ),  B ( x ) ]

23 Union To what degree is the element in either or both sets? Fuzzy Set Operations  A  B ( x ) = max [  A ( x ),  B ( x ) ]

24 tall men = {0/165, 0/175, 0/180, 0.25/182, 0.5/185, 1/190}; average men = {0/165, 1/175, 0.5/180, 0.25/182, 0/185, 0/190}; tall men  average men = {0/165, 1/175, 0.5/180, 0.25/182, 0.5/185, 1/190};  A  B ( x ) = max [  A ( x ),  B ( x ) ]

25 25 Choosing the Best Attribute: Binary Classification Want a formal measure that returns a maximum value when attribute makes a perfect split and minimum when it makes no distinction Information theory (Shannon and Weaver 49) Entropy: a measure of uncertainty of a random variable A coin that always comes up heads --> 0 A flip of a fair coin (Heads or tails) --> 1(bit) The roll of a fair four-sided die --> 2(bit) Information gain: the expected reduction in entropy caused by partitioning the examples according to this attribute

26 26 Formula for Entropy Examples: Suppose we have a collection of 10 examples, 5 positive, 5 negative: H(1/2,1/2) = -1/2log 2 1/2 -1/2log 2 1/2 = 1 bit Suppose we have a collection of 100 examples, 1 positive and 99 negative: H(1/100,99/100) = -.01log 2.01 -.99log 2.99 =.08 bits

27 Information gain Information gain (from attribute test) = difference between the original information requirement and new requirement Information Gain (IG) or reduction in entropy from the attribute test: Choose the attribute with the largest IG

28 Information gain For the training set, p = n = 6, I(6/12, 6/12) = 1 bit Consider the attributes Patrons and Type (and others too): Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root

29 Example contd. Decision tree learned from the 12 examples: Substantially simpler than “true”

30 Perceptrons X = x 1 w 1 + x 2 w 2 Y = Y step

31 Perceptrons How does a perceptron learn? A perceptron has initial (often random) weights typically in the range [-0.5, 0.5] Apply an established training dataset Calculate the error as expected output minus actual output : error e = Y expected – Y actual Adjust the weights to reduce the error

32 Perceptrons How do we adjust a perceptron’s weights to produce Y expected ? If e is positive, we need to increase Y actual (and vice versa) Use this formula:, where and α is the learning rate (between 0 and 1) e is the calculated error

33 Perceptron Example – AND Train a perceptron to recognize logical AND Use threshold Θ = 0.2 and learning rate α = 0.1

34 Perceptron Example – AND Train a perceptron to recognize logical AND Use threshold Θ = 0.2 and learning rate α = 0.1

35 Perceptron Example – AND Repeat until convergence i.e. final weights do not change and no error Use threshold Θ = 0.2 and learning rate α = 0.1


Download ppt "Final Exam: May 10 Thursday. If event E occurs, then the probability that event H will occur is p ( H | E ) IF E ( evidence ) is true THEN H ( hypothesis."

Similar presentations


Ads by Google