Probability and Information Copyright, 1996 © Dale Carnegie & Associates, Inc. A brief review (Chapter 13)
Fall 04CSE 471, 598, CBS 598 by H. Liu 2 Probability Probability provides a way of summarizing uncertainty that comes from our laziness and ignorance - how wonderful it is! Probability, belief of the truth of a sentence 1 - true, 0 - false, 0<P<1 - intermediate degrees of belief in the truth of the sentence Degree of truth (fuzzy logic) vs. degree of belief
Fall 04CSE 471, 598, CBS 598 by H. Liu 3 All probability statements must indicate the evidence wrt which the probability is being assessed. Prior or unconditional probability Posterior or conditional probability
Fall 04CSE 471, 598, CBS 598 by H. Liu 4 Basic probability notation Prior probability Proposition: P(Sunny) Random variable: P(Weather=Sunny) Each Random Variable has a domain Sunny, Cloudy, Rain, Snow Probability distribution P(Weather) = A random variable is not a number; a number can be obtained by observing a RV. A random variable can be continuous or discrete
Fall 04CSE 471, 598, CBS 598 by H. Liu 5 Conditional Probability Definition P(A|B) = P(A^B)/P(B) Product rule P(A^B) = P(A|B)P(B) Probabilistic inference does not work like logical inference.
Fall 04CSE 471, 598, CBS 598 by H. Liu 6 The axioms of probability All probabilities are between 0 and 1 Necessarily true (valid) propositions have probability 1, false (unsatisfiable) 0 The probability of a disjunction P(AvB)=P(A)+P(B)-P(A^B) Ex: Deriving the rule of Negation from P(a v !a)
Fall 04CSE 471, 598, CBS 598 by H. Liu 7 The joint probability distribution Joint completely specifies one’s probability assignments to all propositions in the domain A probabilistic model consists of a set of random variables (X1, …,Xn). An atomic event is an assignment of particular values to all the variables. Marginalization rule for RV Y and Z: P(Y) = ΣP(Y,z) over z in Z Let’s see an example next.
Fall 04CSE 471, 598, CBS 598 by H. Liu 8 Joint Probability An example of two Boolean variables Observations: mutually exclusive and collectively exhaustive What are P(Cavity) = P(Cavity V Toothache) = P(Cavity|Toothache) = P(Cavity ^ Toothache) =
Fall 04CSE 471, 598, CBS 598 by H. Liu 9 Bayes’ rule Deriving the rule via the product rule P(B|A) = P(A|B)P(B)/P(A) P(A) can be viewed as a normalization factor that makes P(B|A) + (!B|A) = 1 P(A) = P(A|B)P(B)+P(A|!B)P(!B) A more general case is P(X|Y) = P(Y|X)P(X)/P(Y) Bayes’ rule conditionalized on evidence E P(X|Y,E) = P(Y|X,E)P(X|E)/P(Y|E)
Fall 04CSE 471, 598, CBS 598 by H. Liu 10 Applying Bayes’ rule Disease meningitis m causes a stiff neck s with a chance of 50% One has m with probability 1/50,000 One has s with probability 1/20 Answer what is the probability having m given one has s
Fall 04CSE 471, 598, CBS 598 by H. Liu 11 Independence Independent events A, B P(B|A)=P(B), P(A|B)=P(A), P(A,B)=P(A)P(B) Conditional independence P(X|Y,Z)=P(X|Z) – given Z, X and Y are independent
Fall 04CSE 471, 598, CBS 598 by H. Liu 12 Entropy Entropy measures homogeneity/purity of sets of examples Or as information content: the less you know, the more information you have With two classes (P,N) in S, p & n instances; let t=p+n. View [p, n] as class distribution of S. Entropy(S) = - (p/t) log 2 (p/t) - (n/t) log 2 (n/t) E.g., p=9, n=5; Entropy(S) = Entropy([9,5]) = - (9/14) log 2 (9/14) - (5/14) log 2 (5/14) = E.g., Entropy([14,0])=0; Entropy([7,7])=1
Fall 04CSE 471, 598, CBS 598 by H. Liu 13 Entropy curve For p/(p+n) between 0 & 1, the 2-class entropy is 0 when p/(p+n) is 0 1 when p/(p+n) is when p/(p+n) is 1 monotonically increasing between 0 and 0.5 monotonically decreasing between 0.5 and 1 When the data is pure, only need to send 1 bit 1 0.5