Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probability and Information Copyright, 1996 © Dale Carnegie & Associates, Inc. A brief review (Chapter 13)

Similar presentations


Presentation on theme: "Probability and Information Copyright, 1996 © Dale Carnegie & Associates, Inc. A brief review (Chapter 13)"— Presentation transcript:

1 Probability and Information Copyright, 1996 © Dale Carnegie & Associates, Inc. A brief review (Chapter 13)

2 Fall 04CSE 471, 598, CBS 598 by H. Liu 2 Probability Probability provides a way of summarizing uncertainty that comes from our laziness and ignorance - how wonderful it is! Probability, belief of the truth of a sentence 1 - true, 0 - false, 0<P<1 - intermediate degrees of belief in the truth of the sentence Degree of truth (fuzzy logic) vs. degree of belief

3 Fall 04CSE 471, 598, CBS 598 by H. Liu 3 All probability statements must indicate the evidence wrt which the probability is being assessed. Prior or unconditional probability Posterior or conditional probability

4 Fall 04CSE 471, 598, CBS 598 by H. Liu 4 Basic probability notation Prior probability Proposition: P(Sunny) Random variable: P(Weather=Sunny) Each Random Variable has a domain Sunny, Cloudy, Rain, Snow Probability distribution P(Weather) = A random variable is not a number; a number can be obtained by observing a RV. A random variable can be continuous or discrete

5 Fall 04CSE 471, 598, CBS 598 by H. Liu 5 Conditional Probability Definition P(A|B) = P(A^B)/P(B) Product rule P(A^B) = P(A|B)P(B) Probabilistic inference does not work like logical inference.

6 Fall 04CSE 471, 598, CBS 598 by H. Liu 6 The axioms of probability All probabilities are between 0 and 1 Necessarily true (valid) propositions have probability 1, false (unsatisfiable) 0 The probability of a disjunction P(AvB)=P(A)+P(B)-P(A^B) Ex: Deriving the rule of Negation from P(a v !a)

7 Fall 04CSE 471, 598, CBS 598 by H. Liu 7 The joint probability distribution Joint completely specifies one’s probability assignments to all propositions in the domain A probabilistic model consists of a set of random variables (X1, …,Xn). An atomic event is an assignment of particular values to all the variables. Marginalization rule for RV Y and Z: P(Y) = ΣP(Y,z) over z in Z Let’s see an example next.

8 Fall 04CSE 471, 598, CBS 598 by H. Liu 8 Joint Probability An example of two Boolean variables Observations: mutually exclusive and collectively exhaustive What are P(Cavity) = P(Cavity V Toothache) = P(Cavity|Toothache) = P(Cavity ^ Toothache) =

9 Fall 04CSE 471, 598, CBS 598 by H. Liu 9 Bayes’ rule Deriving the rule via the product rule P(B|A) = P(A|B)P(B)/P(A) P(A) can be viewed as a normalization factor that makes P(B|A) + (!B|A) = 1 P(A) = P(A|B)P(B)+P(A|!B)P(!B) A more general case is P(X|Y) = P(Y|X)P(X)/P(Y) Bayes’ rule conditionalized on evidence E P(X|Y,E) = P(Y|X,E)P(X|E)/P(Y|E)

10 Fall 04CSE 471, 598, CBS 598 by H. Liu 10 Applying Bayes’ rule Disease meningitis m causes a stiff neck s with a chance of 50% One has m with probability 1/50,000 One has s with probability 1/20 Answer what is the probability having m given one has s

11 Fall 04CSE 471, 598, CBS 598 by H. Liu 11 Independence Independent events A, B P(B|A)=P(B), P(A|B)=P(A), P(A,B)=P(A)P(B) Conditional independence P(X|Y,Z)=P(X|Z) – given Z, X and Y are independent

12 Fall 04CSE 471, 598, CBS 598 by H. Liu 12 Entropy Entropy measures homogeneity/purity of sets of examples Or as information content: the less you know, the more information you have With two classes (P,N) in S, p & n instances; let t=p+n. View [p, n] as class distribution of S. Entropy(S) = - (p/t) log 2 (p/t) - (n/t) log 2 (n/t) E.g., p=9, n=5; Entropy(S) = Entropy([9,5]) = - (9/14) log 2 (9/14) - (5/14) log 2 (5/14) = 0.940 E.g., Entropy([14,0])=0; Entropy([7,7])=1

13 Fall 04CSE 471, 598, CBS 598 by H. Liu 13 Entropy curve For p/(p+n) between 0 & 1, the 2-class entropy is 0 when p/(p+n) is 0 1 when p/(p+n) is 0.5 0 when p/(p+n) is 1 monotonically increasing between 0 and 0.5 monotonically decreasing between 0.5 and 1 When the data is pure, only need to send 1 bit 1 0.5


Download ppt "Probability and Information Copyright, 1996 © Dale Carnegie & Associates, Inc. A brief review (Chapter 13)"

Similar presentations


Ads by Google