Download presentation
Presentation is loading. Please wait.
Published byEric Jennings Modified over 9 years ago
1
Final Exam: May 10 Thursday
2
If event E occurs, then the probability that event H will occur is p ( H | E ) IF E ( evidence ) is true THEN H ( hypothesis ) is true with probability p Bayesian reasoning
3
Bayesian reasoning Example: Cancer and Test P(C) = 0.01 P(¬C) = 0.99 P(+|C) = 0.9 P(-|C) = 0.1 P(+|¬C) = 0.2P(-|¬C) = 0.8 P(C|+) = ?
4
Expand the Bayesian rule to work with multiple hypotheses ( H 1... H m ) and evidences ( E 1... E n ) Assuming conditional independence among evidences E 1... E n Bayesian reasoning with multiple hypotheses and evidences
5
Expert data: Bayesian reasoning Example
6
user observes E 3 E 1 E 2
7
Bayesian reasoning Example expert system computes posterior probabilities user observes E 2
8
Propagation of CFs For a single antecedent rule: cf(E) is the certainty factor of the evidence. cf(R) is the certainty factor of the rule.
9
Single antecedent rule example IF patient has toothache THEN problem is cavity {cf 0.3} Patient has toothache {cf 0.9} What is the cf(cavity, toothache)?
10
Propagation of CFs (multiple antecedents) For conjunctive rules: IF AND... AND THEN {cf} For two evidences E1 and E2: cf(E1 AND E2) = min(cf(E1), cf(E2))
11
Propagation of CFs (multiple antecedents) For disjunctive rules: IF OR... OR THEN {cf} For two evidences E1 and E2: cf(E1 OR E2) = max(cf(E1), cf(E2))
12
Exercise IF (P1 AND P2) OR P3 THEN C1 (0.7) AND C2 (0.3) Assume cf(P1) = 0.6, cf(P2) = 0.4, cf(P3) = 0.2 What is cf(C1), cf(C2)?
13
Defining fuzzy sets with fit-vectors A can be defined as: So, for example: Tall men = (0/180, 1/190) Short men=(1/160, 0/170) Average men=(0/165,1/175,0/185)
14
What about linguistic values with qualifiers ? e.g. very tall, extremely short, etc. Hedges are qualifying terms that modify the shape of fuzzy sets e.g. very, somewhat, quite, slightly, extremely, etc. Qualifiers & Hedges
15
Representing Hedges
18
Crisp Set Operations
19
Complement To what degree do elements not belong to this set? tall men = {0/180, 0.25/182, 0.5/185, 0.75/187, 1/190}; Not tall men = {1/180, 0.75/182, 0.5/185, 0.25/187, 1/190}; Fuzzy Set Operations ¬ A ( x ) = 1 – A ( x )
20
Containment Which sets belong to other sets? tall men = {0/180, 0.25/182, 0.5/185, 0.75/187, 1/190}; very tall men = {0/180, 0.06/182, 0.25/185, 0.56/187, 1/190}; Fuzzy Set Operations Each element of the fuzzy subset has smaller membership than in the containing set
21
Intersection To what degree is the element in both sets? Fuzzy Set Operations A ∩ B ( x ) = min [ A ( x ), B ( x ) ]
22
tall men = {0/165, 0/175, 0/180, 0.25/182, 0.5/185, 1/190}; average men = {0/165, 1/175, 0.5/180, 0.25/182, 0/185, 0/190}; tall men ∩ average men = {0/165, 0/175, 0/180, 0.25/182, 0/185, 0/190}; or tall men ∩ average men = {0/180, 0.25/182, 0/185}; A ∩ B ( x ) = min [ A ( x ), B ( x ) ]
23
Union To what degree is the element in either or both sets? Fuzzy Set Operations A B ( x ) = max [ A ( x ), B ( x ) ]
24
tall men = {0/165, 0/175, 0/180, 0.25/182, 0.5/185, 1/190}; average men = {0/165, 1/175, 0.5/180, 0.25/182, 0/185, 0/190}; tall men average men = {0/165, 1/175, 0.5/180, 0.25/182, 0.5/185, 1/190}; A B ( x ) = max [ A ( x ), B ( x ) ]
25
25 Choosing the Best Attribute: Binary Classification Want a formal measure that returns a maximum value when attribute makes a perfect split and minimum when it makes no distinction Information theory (Shannon and Weaver 49) Entropy: a measure of uncertainty of a random variable A coin that always comes up heads --> 0 A flip of a fair coin (Heads or tails) --> 1(bit) The roll of a fair four-sided die --> 2(bit) Information gain: the expected reduction in entropy caused by partitioning the examples according to this attribute
26
26 Formula for Entropy Examples: Suppose we have a collection of 10 examples, 5 positive, 5 negative: H(1/2,1/2) = -1/2log 2 1/2 -1/2log 2 1/2 = 1 bit Suppose we have a collection of 100 examples, 1 positive and 99 negative: H(1/100,99/100) = -.01log 2.01 -.99log 2.99 =.08 bits
27
Information gain Information gain (from attribute test) = difference between the original information requirement and new requirement Information Gain (IG) or reduction in entropy from the attribute test: Choose the attribute with the largest IG
28
Information gain For the training set, p = n = 6, I(6/12, 6/12) = 1 bit Consider the attributes Patrons and Type (and others too): Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root
29
Example contd. Decision tree learned from the 12 examples: Substantially simpler than “true”
30
Perceptrons X = x 1 w 1 + x 2 w 2 Y = Y step
31
Perceptrons How does a perceptron learn? A perceptron has initial (often random) weights typically in the range [-0.5, 0.5] Apply an established training dataset Calculate the error as expected output minus actual output : error e = Y expected – Y actual Adjust the weights to reduce the error
32
Perceptrons How do we adjust a perceptron’s weights to produce Y expected ? If e is positive, we need to increase Y actual (and vice versa) Use this formula:, where and α is the learning rate (between 0 and 1) e is the calculated error
33
Perceptron Example – AND Train a perceptron to recognize logical AND Use threshold Θ = 0.2 and learning rate α = 0.1
34
Perceptron Example – AND Train a perceptron to recognize logical AND Use threshold Θ = 0.2 and learning rate α = 0.1
35
Perceptron Example – AND Repeat until convergence i.e. final weights do not change and no error Use threshold Θ = 0.2 and learning rate α = 0.1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.