Download presentation
Presentation is loading. Please wait.
Published byKristopher Reed Modified over 9 years ago
1
Chapter 10: Probability Probability of an event ‘Pr(event)’: the relative frequency that this event would be observed over an infinite number of repetitions. Pr is always between 0 and 1. Equal likely model of probability: the probability of any of n equally likely events occurring is 1/n. Example: The probability of obtaining a heads in a fair coin toss: Pr(heads) = 1/2 = 0.5
2
A probability distribution is a list of probabilities (expected relative frequency) for all possible outcomes. The sum of all probabilities in a probability distribution should add up to 1. Probability distributions are often represented as bar graphs. HeadsTails 0 0.25 0.5 0.75 1 Probability Probability distribution for a ‘fair’ coin
3
123456 0 0.1667 0.3333 Probability distribution for the role of one ‘fair six-sided die.
4
Two rules of probability Addition: the probability that any one of several events occurring is the sum of their probabilities – as long as they are mutually exclusive. Think of this as the ‘or’ rule. Multiplication: the probability of several events occurring together is the product of their separate probabilities – as long as the events are independent. Think of this as the ‘and’ rule.
5
Example: When drawing from a standard deck of 52 cards, each card has equal probability. So the probability of drawing, say an Ace of clubs is Pr(Ace of clubs) = 1/52. Example: What is the probability of drawing any club? There are 13 clubs, each have a probability of 1/52 of being drawn. Since they are mutually exclusive we can use the additive rule. The probability of any of these Pr(Ace of clubs or 2 of clubs or …. King of clubs) = Pr(Ace of clubs) + Pr(2 of clubs) + … + Pr(King of clubs) = 1/52 + 1/52 + … + 1/52 = 13/52 = 0.25.
6
Example: What is the probability of drawing clubs twice in a row? (if we put the cards back after drawing)? These two events are independent, so we can use the multiplication rule (any club and any club). The answer is: Pr(clubs)xPr(clubs)=Pr(1/4)xPr(1/4) = 1/16 = 0.0625. (Notice that we actually used both the additive rule (from the last example) and the multiplicative rule). and
7
Example: rolling dice If you roll two dice, what is the probability of obtaining a sum of seven? The table shows all of the 6x6=36 possible outcomes. Note that a (2,5) is different from a (5,2). Each outcome has a probability of 1/36. Of these 36, 6 of them will result in a total of 7. so Pr(sum=7) is 1/36 + 1/36 + 1/36 + 1/36 + 1/36 +1/36 = 6/36 = 0.1667
8
Example: What is the probability of getting 8 heads in a row? Answer: Each of the eight coin flips are independent, so the probability of eight heads is ½ x ½ x ½ x ½ x ½ x ½ x ½ x ½ =.0039 Example: What is the probability of getting the following sequence over eight coin flips? H H T H T T H H Answer: Again each event is independent, and each has a probability of ½, so the answer is the same as the previous question, Pr =.0039 Example: Suppose that the probability of winning the lottery is one in 2 million. What is the probability of winning the lottery four times? Answer: (1/2000000) x (1/2000000) x (1/2000000) x(1/2000000) = 1/16,000,000,000,000,000,000,000 or one in 16 septillion.
9
Luckiest woman in the world? In July of 2010, Joan Ginther won the Texas lottery for the fourth time, earning her a total of $20.4 million. The probability of this happening is Pr = 1/18,000,000,000,000,000,000,000,000 Or one in 18 septillion. To put this in perspective, there are one septillion stars in the universe, and one septillion grains of sand on Earth. Joan Ginther has a PhD in math from Stanford, specializing in statistics.
10
Example: Back to coin flipping…Suppose we flip a coin three times. What is the probability of getting exactly two heads? A coin flip is an example of a dichotomous observation. The distribution of the probability of outcomes from repeated dichotomous observations is called a binomial probability distribution. Answer: We need to determine all possible outcomes, and then determine which ones correspond to our ‘event’. There are eight possible outcomes, all are equally likely: HHH HHT HTH HTT THH THT TTH TTT Three of these correspond to two heads. So the probability of our event is: Pr(two heads) = 1/8 + 1/8 + 1/8 = 3/8 = 0.3750
11
Example: Suppose you have a 5 question true-false test and you guess at every answer. What is the probability of getting 3 or more correct? Yuk. There has to be an easier way! Answer: With 5 binary decisions, there are 2 5 =32 possible outcomes. The possible outcomes are (0 = incorrect, 1 = correct) are: 00000, 00001, 00010, 00011, 00100 00101, 00110, 00111, 01000, 01001 01010, 01011, 01100, 01101, 01110 01111, 10000, 10001, 10010, 10011, 10100, 10101, 10110, 10111, 11000, 11001, 11010, 11011, 11100, 11101, 11110, 11111 16 of these 32 events have 3 or more 1’s, so the answer is 16/32 = 0.5
12
If a binary event gives either a 0 or 1 as an outcome, the probability of getting a total number of “1’s” in a sequence of N binary events can be calculated by expanding (P+Q) N Where P is the probability of a ‘1’ and Q is the probability of a ‘0’ (Q = 1-P) Example: For three coin flips, (P+Q) 3 = P 3 + 3P 2 Q + 3PQ 2 + Q 3 = (.5) 3 + 3(.5) 2 (.5) + 3(.5)(.5) 2 + (.5) 3 P 3 = P 3 + 3P 2 Q + 3PQ 2 + Q 3 = (.5) 3 + 3(.5) 2 (.5) + 3(.5)(.5) 2 + (.5) 3 P 3.125 +.375 +.375 +.125 0 heads 1 heads/2 tails 2 heads/1 tails 3 heads /0 tails Still Yuk. There still has to be an easier way!
13
Table B tells gives you the binomial probability distributions. You don’t need to know this, but the probability of obtaining k ‘1’s out of n samples is calculated using the formula: Where n! means ‘n factorial’
14
Here’s how to use Table B: Same example: Suppose you have a 5 question true-false test and you guess at every answer. What is the probability of getting 3 or more correct? Answer: This is a binomial problem with N=5, P=.5 and Q =.5. Looking at table B (page 515), we want to sum up the probabilities of obtaining 3, 4 or 5 correct. Look at the rows for N=4, and the column for P (or Q) = 0.5. We add up the values: Pr(3 or more correct) = Pr(3 correct) + Pr(4 correct) + Pr(5 correct) = 0.3125 + 0.1562 + 0.0312 = 0.5
15
Example: What is the probability of getting more than 4 or more correct out of 7 questions when guessing on a multiple choice exam where each question has four alternatives? Answer: Now the probability of getting a correct answer by guessing is 0.25. So this is a binomial problem with N=7, P=.25 and Q =.75. Looking at table B (page 515), we want to sum up the probabilities of obtaining 4,5,6 or 7 correct. Look at the rows for N=7, and the column for P (or Q) = 0.25. We add up the values: Pr(4 or more correct) = Pr(4 correct) + Pr(5 correct) + Pr(6 correct) + Pr(7 correct) = 0.0577 + 0.0115 + 0.0013 + 0.0001 = 0.0706
16
Example: What is the probability of getting more than 4 or more correct out of 7 questions when guessing on a multiple choice exam where each question has four alternatives? Answer: Table_B_cumulative gives the ‘cumulative binary probability distribution’, which is the probability of obtaining k or more ‘1’s. This is the same as Table B, except that for each value of n, the probabilities start at 1 for k=0 and go down. Look at the rows for N=7, k =4 and the column for P (or Q) = 0.25, we get Pr(4 or more correct) = 0.0706 Or, using ‘Table_B_cumulative.xlsx’
17
Example: In 2013 the Seahawks won 13 of their 16 games. Question 1: What is the probability of losing exactly 13 of 16 games by chance? Looking at table B, Pr(win = 13) =.0085 By chance, we mean that there Pr(win) = Pr(lose) = 0.5. The probability of losing k out of n games comes from a binomial probability distribution. Here, k=13, n=16 and P=.5 Question2: What is the probability of winning 13 or more by chance? Pr(win ≥ 13) = Pr(win= 13) + Pr(win= 14)+ Pr(win= 15) + Pr(win= 16) Or, we can use table B cumulative and see that Pr(win ≥ 16) =.0105 In other words, the likelihood of an average team having a season like the Seahawks in 2013 is low.
18
Example: In 2011 (and 2010) the Seahawks lost 9 of their 16 games. Question 1: What is the probability of losing exactly 9 of 16 games by chance? Looking at table B, Pr(lose = 9) =.1746 By chance, we mean that there Pr(win) = Pr(lose) = 0.5. The probability of losing k out of n games comes from a binomial probability distribution. Here, k=9, n=16 and P=.5 Question2: What is the probability of losing 9 or more by chance? Pr(lose ≥ 9) = Pr(lose = 9+ Pr(lose = 10) + … + Pr(lose = 16). Or, we can use table B cumulative and see that Pr(lose ≥ 9) =.4018
19
Plotting the probabilities from binomial distributions as a frequency histogram: Here are some examples for N=5 and N 15, and P =.9, P=.5, and P=.2. Notice how a frequency histogram can be considered to be a probability distribution. Notice also how bell-shaped the distributions are, especially for large N and a P near 0.5. For very large N, a normal distribution accurately describes this shape.
20
Amazing fact: For large values of n (usually 20 or more), the binomial probability distribution can be approximated by a normal distribution with mean: And standard deviation:
21
Amazing fact: For large values of n (usually 20 or more), the binomial probability distribution can be approximated by a normal distribution with mean: And standard deviation:
22
Example: Suppose you had a biased coin that came up ‘heads’ with a probability of Pr(heads)=.55. If you flip the coin 100 times, what is the probability of obtaining 65 or more ‘heads’? 020406080100 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Pr(k) Probability n=100, P=0.55 Answer: The sum of ‘heads’ will be closely approximated by a normal distribution with a mean and standard deviation of: We can find the probability of 65 or more heads by finding the area under the normal curve, using Table A column C. Technically, we should use a value of X =64.5 to include the entire range that includes 65 Pr(z>1.91) = 0.0281
23
Example: There were 162 games in the 2011 baseball season. The Mariners lost 87 of those games (they lost 101 two and three years ago and 95 last year – we’re getting better!). What is the probability that they could lose 87 games or more by chance? Answer: If this happened by chance, then the probability of a win or loss for each game would be Pr(win) = Pr(loss) = 0.5. The expected number of losses by chance is therefore has a binomial probability distribution with P = 0.5, and n=162. We are asking Pr(L>=87). Using the normal approximation, the number of losses will be normally distributed with: 5161718191101111 Games lost
24
Converting to z, we need to include the ‘bin’ that has 87, so we want to know Pr(X>86.5) Looking up z=0.86 Table A, Column C tells us that the probability of losing 87 or more games is 0.1949 (or about 20%) Example: There were 162 games in the 2011 baseball season. The Mariners lost 87 of those games (they lost 101 both of the previous two years – we’re getting better!). What is the probability that they could lose 95 games or more by chance? 5161718191101111 Games lost
25
Example: In 2009 the Mariners won 85 and lost 77 games. What is the probability of them winning 85 or more games by chance? Converting to z, we need to include the ‘bin’ that has 85, so we want to know Pr(X>84.5) Table A, Column C shows that Pr(z>0.55) = 0.2912 5161718191101111 Games won
26
If you were to flip a coin six million times, what is the probability that the total number of heads will be within plus or minus 500 of three million?
27
Votes away from 3 million -4000-3000-2000-100001000200030004000 Use table A, column 2 Pr(between 0 and 0.41) =.1591 so the area between -0.4087 and 0.4087 is 2x.1591 =.3182
28
In the 2000 presidential election between Al Gore and George W. Bush, approximately six million votes in the state of Florida. The results were so close that a series of recounts were made, all of which fell within 500 votes of a tie. -4000-3000-2000-100001000200030004000 Pr(299500 < votes < 300500) =.3182 If there really was no preference between the candidates (and there were only two), then there actually is a reasonable chance (about30%) of getting within plus or minus 500 votes of a tie. Developing a counting system that is that accurate is nearly impossible. Our system really isn’t set up to deal with close votes like this.
29
Suppose that the probability of an individual voter in the state of Colorado will vote for Obama over Romney is Pr(Obama) =.5001 (or 50.01%). Suppose also that 3 million voters show up at the polls on November 4 th. What is the probability that Obama will win Colorado (and their 9 electoral college votes)? Converting to z: z = (1499999.5-1500300)/866.0254 = -.3515 Table A, Column C shows that Pr(z>-.3515) =.6370, or 63.7% Try this for Pr(Obama) =.501. z will be off the chart and Obama is sure to win Colorado. -1800-1200-6000600120018002400 Votes minus 1.5 million
30
The ‘birthday problem’ (or paradox) is the probability that in a set of n randomly chosen people, some pair of them will have the same birthday. Lets look at the birthday’s for the students in this class as a picture: Birth Month Student # JanFebMarAprMayJunJulAugSepOctNovDec 10 20 30 40 50 60 70 80 90
31
Here’s today: Nobody has a birthday today P = Pr(an individual has a birthday today) = 1/365 =.0027 Pr(two or more out of 96 have a birthday today) =.0073 (Table B Excel spreadsheet)
32
But if we consider all days of the year, there are five days in which two of you share birthdays! (last year we had sixteen days, and the year before there were six days) Birth Month Student # 02/12 2 08/05 2 08/28 2 10/23 2 11/21 2 JanFebMarAprMayJunJulAugSepOctNovDec 10 20 30 40 50 60 70 80 90
33
It’s easy to show (but beyond the scope of this class) to calculate the probability that 2 or more out of n people in a class will share a birthday. For a class of our size, the probability is very, very, very close to 1.0 For a class size of 23 (about the size of our sections), the probability is close to 0.5 http://en.wikipedia.org/wiki/Birthday_problem
34
Margin of error Suppose you survey the preference of 500 likely voters in a race that is very close to a tie. 1) What is the expected mean and standard deviation of the number of votes expected for one of the candidates? n = 500, and we’ll assume that p = 0.5. So the standard error of the mean is:
35
Margin of error 2) What is the expected percent of votes for one candidate and the standard error of the mean of the percent of votes expected for one of the candidates? We need to convert ‘Votes’ to ‘% of Votes’ by dividing the ‘Votes’ distribution by n and multiplying by 100 (to get percent)
36
95% 2.5% 4243444546474849505152535455565758 Percent above 50% Margin of error 3) What range of percent covers the middle 95% of the area under the curve? We need to find the z value for which the area above z is.0025. Table A, Column 3: z = 1.96; x 1 = - z+ = -(2.24)(1.96)+50 = 45.61 x 2 = z+ = (2.24)(1.96)+50 = 54.39 95% of the polls will fall between 45.61 and 54.39%
37
Margin of error Suppose you survey the preference of 500 likely voters in a race that is very close to a tie. 95% 2.5% 95% of the polls will fall between 45.61 and 54.39% -8-7-6-5-4-3-2012345678 Percent above 50% In other words, 95% of the polls fall within plus or minus 4.39% We call 4.39% the margin of error for this poll. If it isn’t stated otherwise, the margin of error corresponds to the range for which 95% of repeated polls should fall.
38
Margin of error The margin of error only depends on the sample size: z is always 1.96 for 95%. The margin of error 0500100015002000250030003500400045005000 0 1 2 3 4 5 6 7 8 9 10 Sample size (n) Margin of error (%)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.