1 Intro to Probability Zhi Wei 2 Outline  Basic concepts in probability theory  Random variable and probability distribution  Bayes ’ rule.

Slides:

Advertisements

Similar presentations

MOMENT GENERATING FUNCTION AND STATISTICAL DISTRIBUTIONS

Advertisements

Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.

Lecture (7) Random Variables and Distribution Functions.

7 Probability Experiments, Sample Spaces, and Events

Chapter 4 Probability and Probability Distributions

June 3, 2008Stat Lecture 6 - Probability1 Probability Introduction to Probability, Conditional Probability and Random Variables Statistics 111 -

Chapter 4 Using Probability and Probability Distributions

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 4-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.

Probability Probability Principles of EngineeringTM

Introduction to Probability Theory Rong Jin. Outline  Basic concepts in probability theory  Bayes’ rule  Random variable and distributions.

Probability Distributions

Visualizing Events Contingency Tables Tree Diagrams Ace Not Ace Total Red Black Total

Pattern Classification, Chapter 1 1 Basic Probability.

Engineering Probability and Statistics

Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.

Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.

Welcome To B ASIC C ONCEPT O F P ROBABILITY D EPARTMENT OF S TATISTICS D R. M D. M ESBAHUL A LAM, A SSOCIATE P ROFESSOR Tel:

Copyright ©2011 Nelson Education Limited. Probability and Probability Distributions CHAPTER 4 Part 2.

Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.

5.1 Basic Probability Ideas

Class 3 Binomial Random Variables Continuous Random Variables Standard Normal Distributions.

Chapter 1 Probability and Distributions Math 6203 Fall 2009 Instructor: Ayona Chatterjee.

Lecture Slides Elementary Statistics Twelfth Edition

IBS-09-SL RM 501 – Ranjit Goswami 1 Basic Probability.

Theory of Probability Statistics for Business and Economics.

MTH3003 PJJ SEM I 2015/2016.  ASSIGNMENT :25% Assignment 1 (10%) Assignment 2 (15%)  Mid exam :30% Part A (Objective) Part B (Subjective)  Final Exam:

Engineering Probability and Statistics Dr. Leonore Findsen Department of Statistics.

OPIM 5103-Lecture #3 Jose M. Cruz Assistant Professor.

 Review Homework Chapter 6: 1, 2, 3, 4, 13 Chapter 7 - 2, 5, 11  Probability  Control charts for attributes  Week 13 Assignment Read Chapter 10: “Reliability”

5.3 Random Variables  Random Variable  Discrete Random Variables  Continuous Random Variables  Normal Distributions as Probability Distributions 1.

BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.

School of Information University of Michigan Discrete and continuous distributions.

Probability The calculated likelihood that a given event will occur

Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.

12/7/20151 Probability Introduction to Probability, Conditional Probability and Random Variables.

Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.

Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.

Binomial Distribution

Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Review of Statistics I: Probability and Probability Distributions.

Probability and Distributions. Deterministic vs. Random Processes In deterministic processes, the outcome can be predicted exactly in advance Eg. Force.

Engineering Probability and Statistics Dr. Leonore Findsen Department of Statistics.

Probability Theory, Bayes’ Rule & Random Variable Lecture 6.

Statistics October 6, Random Variable – A random variable is a variable whose value is a numerical outcome of a random phenomenon. – A random variable.

Chapter 6 Probability Mohamed Elhusseiny

Introduction to Probability Theory

Math 145 October 5, 2010.

Math 145 June 9, 2009.

PROBABILITY AND PROBABILITY RULES

Natural Language Processing

What is Probability? Quantification of uncertainty.

Introduction to Probability Theory

Basic Probability aft A RAJASEKHAR YADAV.

Natural Language Processing

Econ 113 Lecture Module 2.

Probability distributions

Math 145 February 22, 2016.

Random Variable Two Types:

Random Variables and Probability Distributions

Math 145 September 4, 2011.

Math 145 February 26, 2013.

Math 145 June 11, 2014.

Math 145 September 29, 2008.

Math 145 June 8, 2010.

Math 145 October 3, 2006.

Math 145 September 24, 2014.

Math 145 October 1, 2013.

Math 145 February 24, 2015.

Math 145 July 2, 2012.

Presentation transcript:

1 Intro to Probability Zhi Wei

2 Outline  Basic concepts in probability theory  Random variable and probability distribution  Bayes ’ rule

3 Introduction  Probability is the study of randomness and uncertainty.  In the early days, probability was associated with games of chance (gambling).

4 Simple Games Involving Probability Game: A fair die is rolled. If the result is 2, 3, or 4, you win $1; if it is 5, you win $2; but if it is 1 or 6, you lose $3. Should you play this game?

5 Random Experiment  a random experiment is a process whose outcome is uncertain. Examples:  Tossing a coin once or several times  Picking a card or cards from a deck  Measuring temperature of patients ...

6 Sample Space The sample space is the set of all possible outcomes. Simple Events The individual outcomes are called simple events. Event An event is any subset of the whole sample space Events & Sample Spaces

7 Example Experiment: Toss a coin 3 times.  Sample space  = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.  Examples of events include A = {at least two heads} B = {exactly two tails.} = {HHH, HHT,HTH, THH} = {HTT, THT,TTH}

8 Basic Concepts (from Set Theory)  A  B, the union of two events A and B, is the event consisting of all outcomes that are either in A or in B or in both events.  A  B (AB), the intersection of two events A and B, is the event consisting of all outcomes that are in both events.  A c, the complement of an event A, is the set of all outcomes in  that are not in A.  A-B, the set difference, is the event consisting of all outcomes that in A but not in B  When two events A and B have no outcomes in common, they are said to be mutually exclusive, or disjoint, events.

9 Example Experiment: toss a coin 10 times and the number of heads is observed.  Let A = { 0, 2, 4, 6, 8, 10}.  B = { 1, 3, 5, 7, 9}, C = {0, 1, 2, 3, 4, 5}.  A  B= {0, 1, …, 10} = .  A  B contains no outcomes. So A and B are mutually exclusive.  C c = {6, 7, 8, 9, 10}, A  C = {0, 2, 4}, A-C={6,8,10}

10 Rules  Commutative Laws: A  B = B  A, A  B = B  A  Associative Laws: (A  B)  C = A  (B  C ) (A  B)  C = A  (B  C).  Distributive Laws: (A  B)  C = (A  C)  (B  C) (A  B)  C = (A  C)  (B  C)  DeMorgan ’ s Laws:

11 Venn Diagram  A B A∩B A B

12 Probability  A Probability is a number assigned to each subset (events) of a sample space .  Probability distributions satisfy the following rules:

13 Axioms of Probability  For any event A, 0  P(A)  1.  P() =1.  If A 1, A 2, … A n is a partition of A, then P(A) = P(A 1 ) + P(A 2 ) P(A n ) (A 1, A 2, … A n is called a partition of A if A 1  A 2  …  A n = A and A 1, A 2, … A n are mutually exclusive.)

14 Properties of Probability  For any event A, P(A c ) = 1 - P(A).  P(A-B)=P(A) – P(A  B) If A  B, then P(A - B) = P(A) – P(B)  For any two events A and B, P(A  B) = P(A) + P(B) - P(A  B). For three events, A, B, and C, P(ABC) = P(A) + P(B) + P(C) - P(AB) - P(AC) - P(BC) + P(AB C).

15 Example  In a certain population, 10% of the people are rich, 5% are famous, and 3% are both rich and famous. A person is randomly selected from this population. What is the chance that the person is not rich? rich but not famous? either rich or famous?

16 Joint Probability  For events A and B, joint probability Pr(AB) stands for the probability that both events happen.  Example: A={HT}, B={HT, TH}, what is the joint probability Pr(AB)?

17 Independence  Two events A and B are independent in case Pr(AB) = Pr(A)Pr(B)

18 Independence  Two events A and B are independent in case Pr(AB) = Pr(A)Pr(B)  Example 1: Drug test WomenMen Success Failure A = {A patient is a Women} B = {Drug fails} Are A and B independent?

19 Independence  Two events A and B are independent in case Pr(AB) = Pr(A)Pr(B)  Example 1: Drug test  Example 2: toss a coin 3 times, Let  = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}. A={having both T and H}, B={at most one T} Are A and B independent? How about toss 2 times? WomenMen Success Failure A = {A patient is a Women} B = {Drug fails} Are A and B independent?

20 Independence  If A and B independent, then A and B c, B and A c, A c and B c independent  Consider the experiment of tossing a coin twice  Example I: A = {HT, HH}, B = {HT} Will event A independent from event B?  Example II: A = {HT}, B = {TH} Is event A independent from event B?  Disjoint  Independence  If A is independent from B, B is independent from C, will A be independent from C?

21  If A and B are events with Pr(A) > 0, the conditional probability of B given A is Conditioning

22  If A and B are events with Pr(A) > 0, the conditional probability of B given A is  Example: Drug test Conditioning WomenMen Success Failure A = {Patient is a Women} B = {Drug fails} Pr(B|A) = ? Pr(A|B) = ?

23  If A and B are events with Pr(A) > 0, the conditional probability of B given A is  Example: Drug test  Given A is independent from B, what is the relationship between Pr(A|B) and Pr(A)? Conditioning WomenMen Success Failure A = {Patient is a Women} B = {Drug fails} Pr(B|A) = ? Pr(A|B) = ?

24 Which Drug is Better ?

25 Simpson ’ s Paradox: View I Drug IDrug II Success Failure A = {Using Drug I} B = {Using Drug II} C = {Drug succeeds} Pr(C|A) ~ 10% Pr(C|B) ~ 50% Drug II is better than Drug I

26 Simpson ’ s Paradox: View II Female Patient A = {Using Drug I} B = {Using Drug II} C = {Drug succeeds} Pr(C|A) ~ 10% Pr(C|B) ~ 5%

27 Simpson ’ s Paradox: View II Female Patient A = {Using Drug I} B = {Using Drug II} C = {Drug succeeds} Pr(C|A) ~ 10% Pr(C|B) ~ 5% Male Patient A = {Using Drug I} B = {Using Drug II} C = {Drug succeeds} Pr(C|A) ~ 100% Pr(C|B) ~ 50%

28 Simpson ’ s Paradox: View II Female Patient A = {Using Drug I} B = {Using Drug II} C = {Drug succeeds} Pr(C|A) ~ 10% Pr(C|B) ~ 5% Male Patient A = {Using Drug I} B = {Using Drug II} C = {Drug succeeds} Pr(C|A) ~ 100% Pr(C|B) ~ 50% Drug I is better than Drug II

29 Conditional Independence  Event A and B are conditionally independent given C in case Pr(AB|C)=Pr(A|C)Pr(B|C)  A set of events {A i } is conditionally independent given C in case

30 Conditional Independence (cont ’ d)  Example: There are three events: A, B, C Pr(A) = Pr(B) = Pr(C) = 1/5 Pr(AC) = Pr(BC) = 1/25, Pr(AB) = 1/10 Pr(ABC) = 1/125 Whether A, B are independent? Whether A, B are conditionally independent given C?  A and B are independent  A and B are conditionally independent

31 Outline  Basic concepts in probability theory  Random variable and probability distribution  Bayes ’ rule

32 Random Variable and Distribution  A random variable X is a numerical outcome of a random experiment  The distribution of a random variable is the collection of possible outcomes along with their probabilities: Categorical case: Numerical case:

33 Random Variables Distributions  Cumulative Probability Distribution (CDF): Probability Density Function (PDF): Probability Density Function (PDF):

34 Random Variable: Example  Let S be the set of all sequences of three rolls of a die. Let X be the sum of the number of dots on the three rolls.  What are the possible values for X?  Pr(X = 5) = ?, Pr(X = 10) = ?

35 Expectation value  Division of the stakes problem Henry and Tony play a game. They toss a fair coin, if get a Head, Henry wins; Tail, Tony wins. They contribute equally to a prize pot of $100, and agree in advance that the first player who has won 3 rounds will collect the entire prize. However, the game is interrupted for some reason after 3 rounds. They got 2 H and 1 T. How should they divide the pot fairly? a) It seems unfair to divide the pot equally Since Henry has won 2 out of 3 rounds. Then, how about Henry gets 2/3 of $100? b) Other thoughts? X0100 P X is what Henry will win if the game not interrupted

36 Expectation  Definition: the expectation of a random variable is, discrete case, continuous case  Properties Summation: For any n≥1, and any constants k 1, …,k n Product: If X 1, X 2, …, X n are independent

37 Expectation: Example  Let S be the set of all sequence of three rolls of a die. Let X be the sum of the number of dots on the three rolls.  What is E(X)?  Let S be the set of all sequence of three rolls of a die. Let X be the product of the number of dots on the three rolls.  What is E(X)?

38 Variance  Definition: the variance of a random variable X is the expectation of (X-E[x]) 2 :  Properties For any constant C, Var(CX)=C 2 Var(X) If X 1, X 2, …, X n are independent

39 Bernoulli Distribution  The outcome of an experiment can either be success (i.e., 1) and failure (i.e., 0).  Pr(X=1) = p, Pr(X=0) = 1-p, or  E[X] = p, Var(X) = p(1-p)  Using sample() to generate Bernoulli samples > n = 10; p = 1/4; > sample(0:1, size=n, replace=TRUE, prob=c(1-p, p)) [1]

40 Binomial Distribution  n draws of a Bernoulli distribution X i ~Bernoulli(p), X= i=1 n X i, X~Bin(p, n)  Random variable X stands for the number of times that experiments are successful. n = the number of trials x = the number of successes p = the probability of success  E[X] = np, Var(X) = np(1-p)

the binomial distribution in R  dbinom(x, size, prob)  Try 7 times, equally likely succeed or fail > dbinom(3,7,0.5) [1] >barplot(dbinom(0:7,7,0. 5),names.arg=0:7)

what if p ≠ 0.5?  > barplot(dbinom(0:7,7,0.1),names.arg=0:7)

Which distribution has greater variance? p = 0.5 p = 0.1 var = n*p*(1-p) = 7*0.1*0.9=7*0.09 var = n*p*(1-p) = 7*0.5*0.5 = 7*0.25

briefly comparing an experiment to a distribution nExpr = 1000 tosses = 7; y=rep(0,nExpr); for (i in 1:nExpr) { x = sample(c("H","T"), tosses, replace = T) y[i] = sum(x=="H") } hist(y,breaks=-0.5:7.5) lines(0:7,dbinom(0:7,7,0.5)* nExpr) points(0:7,dbinom(0:7,7,0.5 )*nExpr) theoretical distribution result of 1000 trials Histogram of y y Frequency

Cumulative distribution P(X=x)P(X≤x) > barplot(dbinom(0:7,7,0.5),names.arg=0:7)> barplot(pbinom(0:7,7,0.5),names.arg=0:7)

cumulative distribution probability distribution cumulative distribution P(X=x)P(X≤x)

example: surfers on a website  Your site has a lot of visitors 45% of whom are female  You’ve created a new section on gardening  Out of the first 100 visitors, 55 are female.  What is the probability that this many or more of the visitors are female?  P(X≥55) = 1 – P(X≤54) = 1- pbinom(54,100,0.45)

Another way to calculate cumulative probabilities  ?pbinom  P(X≤x) = pbinom(x, size, prob, lower.tail = T)  P(X>x) = pbinom(x, size, prob, lower.tail = F) > 1-pbinom(54,100,0.45) [1] > pbinom(54,100,0.45,lower.tail=F) [1]

Female surfers visiting a section of a website what is the area under the curve?

Cumulative distribution <3 % > 1-pbinom(54,100,0.45) [1]

51 Plots of Binomial Distribution

Another discrete distribution: hypergeometric  Randomly draw n elements without replacement from a set of N elements, r of which are S’s (successes) and (N-r) of which are F’s (failures)  hypergeometric random variable x is the number of S’s in the draw of n elements

hypergeometric example  fortune cookies  there are N = 20 fortune cookies  r = 18 have a fortune, N-r = 2 are empty  What is the probability that out of n = 5 cookies, s=5 have a fortune (that is we don’t notice that some cookies are empty)  > dhyper(5, 18, 2, 5)  [1]  So there is a greater than 50% chance that we won’t notice.  Gene Set Enrichment Analysis

hypergeometric and binomial  When the population N is (very) big, whether one samples with or without replacement is pretty much the same  100 cookies, 10 of which are empty binomial hypergeometric number of full cookies out of 5

code aside > x = 1:5 > y1 = dhyper(1:5,90,10,5) > y2 = dbinom(1:5,5,0.9) > tmp = as.matrix(t(cbind(y1,y2))) > barplot(tmp,beside=T,names.arg=x) hypergeometric probability binomial probability

Poisson distribution  # of events in a given interval e.g. number of light bulbs burning out in a building in a year # of people arriving in a queue per minute = mean # of events in a given interval  E[X] =, Var(X) =

Example: Poisson distribution  You got a box of 1,000 widgets.  The manufacturer says that the failure rate is 5 per box on average.  Your box contains 10 defective widgets. What are the odds? > ppois(9,5,lower.tail=F) [1]  Less than 3%, maybe the manufacturer is not quite honest.  Or the distribution is not Poisson?

Poisson approximation to binomial  If n is large (e.g. > 100) and n*p is moderate (p should be small) (e.g. < 10), the Poisson is a good approximation to the binomial with = n*p binomial Poisson

59 Plots of Poisson Distribution

Normal (Gaussian) Distribution  Normal distribution (aka “bell curve”)  fits many biological data well e.g. height, weight  serves as an approximation to binomial, hypergeometric, Poisson because of the Central Limit Theorem  Well studied

61 Normal (Gaussian) Distribution  X~N(,)  E[X]= , Var(X)=  2  If X 1 ~N( 1, 1 ), X 2 ~N( 2, 2 ), and X 1, X 2 are independent X= X 1 + X 2 ? X= X 1 - X 2 ?

sampling from a normal distribution x <- rnorm(1000) h <- hist(x, plot=F) ylim <- range(0,h$density,dnor m(0)) hist(x,freq=F,ylim=ylim) curve(dnorm(x),add=T)

Normal Approximation based on Central Limit Theorem  Central Limit Theorem If x i ~i.i.d with (μ, σ 2 ) and when n is large, then (x 1 +…+x n )/n ~ N(μ, σ 2 /n) Or (x 1 +…+x n ) ~ N(nμ, nσ 2 )  Example A population is evenly divided on an issue (p=0.5). For a random sample of size 1000, what is the probability of having ≥550 in favor of it? n=1000, x i ~Bernoulli (p=0.5), i.e. E(x i )=p; V(x i )=p(1-p) (x 1 +…+x n ) ~ Binomial(n=1000, p=0.5) Pr((x 1 +…+x n )>=550) =1-pbinom(550,1000,0.5) Normal Approximation: (x 1 +…+x n ) ~ N(np, np(1-p))=N(500, 250) Pr((x 1 +…+x n )>=550) =1-pnorm(550, mean=500, sd=sqrt(250)) 63

d, p, q, and r functions in R  In R, a set of functions have been implemented for each of almost all known distributions.  r (n, ) Possible distributions: binom, pois, hyper, norm, beta, chisq, f, gamma, t, unif, etc  You find other characteristics of distributions as well d (x, ): density at x p (x, ): cumulative distribution function to x q (p, ): inverse cdf 64

Example: Uniform Distribution  The uniform distr. On [a,b] has two parameter. The family name is unif. In R, the parameters are named min and max > dunif(x=1, min=0, max=3) [1] > punif(q=2, min=0, max=3) [1] > qunif(p=0.5, min=0, max=3) [1] 1.5 > runif(n=5, min=0, max=3) [1]

Lab Exercise  Using R for Introductory Statistics Page 39: 2.4, Page 54: 2.16, 2.23, 2.35, 2.26 Page 66: 2.30, 2.32, , 2.39, 2.41, 2.42,

67 Outline  Basic concepts in probability theory  Random variable and probability distribution  Bayes ’ rule

68  Given two events A and B and suppose that Pr(A) > 0. Then  Example: Bayes ’ Rule Pr(W|R)R RR W WW R: It is a rainy day W: The grass is wet Pr(R|W) = ? Pr(R) = 0.8

69 Bayes ’ Rule R RR W WW R: It rains W: The grass is wet RW Information Pr(W|R) Inference Pr(R|W)

70 Bayes ’ Rule R RR W WW R: The weather rains W: The grass is wet Hypothesis H Evidence E Information: Pr(E|H) Inference: Pr(H|E) PriorLikelihoodPosterior

71 Summation (Integration) out tip  Suppose that B 1, B 2, … B k form a partition of : B 1  B 2  …  B k =  and B 1, B 2, … B k are mutually exclusive Suppose that Pr(B i ) > 0 and Pr(A) > 0. Then

72 Summation (Integration) out tip  Suppose that B 1, B 2, … B k form a partition of : B 1  B 2  …  B k =  and B 1, B 2, … B k are mutually exclusive Suppose that Pr(B i ) > 0 and Pr(A) > 0. Then

73 Summation (Integration) out tip  Suppose that B 1, B 2, … B k form a partition of : B 1  B 2  …  B k =  and B 1, B 2, … B k are mutually exclusive Suppose that Pr(B i ) > 0 and Pr(A) > 0. Then Key: Joint distribution!

74 Application of Bayes’ Rule RIt rains WThe grass is wet UPeople bring umbrella Pr(UW|R)=Pr(U|R)Pr(W|R) Pr(UW| R)=Pr(U| R)Pr(W| R) R WU Pr(W|R)R RR W WW Pr(U|R)R RR U UU Pr(U|W) = ? Pr(R) = 0.8

75 A More Complicated Example RIt rains WThe grass is wet UPeople bring umbrella Pr(UW|R)=Pr(U|R)Pr(W|R) Pr(UW| R)=Pr(U| R)Pr(W| R) R WU Pr(W|R)R RR W WW Pr(U|R)R RR U UU Pr(U|W) = ? Pr(R) = 0.8

76 A More Complicated Example RIt rains WThe grass is wet UPeople bring umbrella Pr(UW|R)=Pr(U|R)Pr(W|R) Pr(UW| R)=Pr(U| R)Pr(W| R) R WU Pr(W|R)R RR W WW Pr(U|R)R RR U UU Pr(U|W) = ? Pr(R) = 0.8

77 Acknowledgments  Peter N. Belhumeur: for some of the slides adapted or modified from his lecture slides at Columbia University  Rong Jin: for some of the slides adapted or modified from his lecture slides at Michigan State University  Jeff Solka: for some of the slides adapted or modified from his lecture slides at George Mason University  Brian Healy: for some of the slides adapted or modified from his lecture slides at Harvard University