Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03

Similar presentations


Presentation on theme: "Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03"— Presentation transcript:

1 Data Analysis and Statistical Software I (323-21-403) Quarter: Autumn 02/03
Daniela Stan, PhD Course homepage: Office hours: (No appointment needed) M, 3:00pm - 3:45pm at LOOP, CST 471 W, 3:00pm - 3:45pm at LOOP, CST 471 2/17/2019 Daniela Stan - CSC323

2 Outline Chapter 4: Probability – The Study of Randomness
Random Variables Means and Variances of Random Variables 2/17/2019 Daniela Stan - CSC323

3 Chance error = number of heads – half the number of tosses
The Law of Averages A coin lands heads with chance 50%, thus P(fair coin=heads)=0.5 If we toss a coin many times, say 1,000 times, we would expect to get 1,000/2 = 500 heads. This rarely happens in nature. We will most likely see, for example 503 heads, 498 heads, 510 heads or 490 heads. This is because of chance variability: Chance error = number of heads – half the number of tosses 2/17/2019 Daniela Stan - CSC323

4 Examples: For instance, in 1000 tosses we get 550 heads.
The chance error = |500 – 550| = 50 tosses (in absolute terms) Also, we can say that (relative to the number of tosses) the chance error is 50/1000=0.05 or 5% of the number of tosses Suppose 4 coins were tossed 1600 times each. The “chance error” = number of heads – half the number of tosses was plotted against the number of tosses 2/17/2019 Daniela Stan - CSC323

5 Examples: (cont.) Number of Tosses half the number of tosses Number of heads minus After 400 tosses of a coin, the chance errors for the four coins were: 10 = 210 – 200  –8 = 192 – 200 –12 = 188 – 200  = 203 – 200 After 1600 tosses of a coin, the chance errors for the four coins were: 30 = 830 –  –26= 774 – 800  –14= 786 –800  8= 808 – 800 2/17/2019 Daniela Stan - CSC323

6 Examples: (cont.) For the same 4 coins, here are the chance errors
Number of Tosses Percentages of heads–50% For the same 4 coins, here are the chance errors expressed as a percentage of the number of tosses: If a coin is tossed 400 times the percentage of heads is 50% give or take 4% If a coin is tossed 1600 times the percentage of heads is 50% give or take 2% 2/17/2019 Daniela Stan - CSC323

7 As the number of tosses increases the chance error
(= the difference between number of heads and half number of tosses) gets bigger. half the number of tosses Number of heads minus Number of Tosses However if we consider percentages of number of tosses: as the number of tosses goes up, the difference between the percentage of heads and 50% tends to get smaller Percentages of heads–50% Number of Tosses 2/17/2019 Daniela Stan - CSC323

8 The law of averages The law of averages is about the chance error.
The law of averages says that the chance error is likely to be large in absolute value, but small relative to the number of times the process is repeated (e.g. number of tosses). A coin lands heads 550 times in 1,000 tosses. The chance error is 550 – 500 = 50 in absolute terms 50 / 1,000 = 5% of the number of tosses A coin lands heads 499,000 times in 1,000,000 tosses. The chance error is 499,000 – 500,000 = 1,000 in absolute terms 1,000 / 1,000,000 = 0.1 % of the number of tosses As the number of tosses gets larger, the percentage of heads gets closer to 50% 2/17/2019 Daniela Stan - CSC323

9 The law of averages (cont.)
“The longer a random process is repeated under the same conditions, the closer the observed proportion of each outcome occurrence is to the actual probability of occurrence.” A convenient way of representing a random phenomenon is through a random variable. We can associate a variable to each random process. The values of such a variable are the possible outcomes of the random process. 2/17/2019 Daniela Stan - CSC323

10 Random variable For instance X = number of heads in 4 tosses of a coin. These are the possible outcomes: HHTT HTHT HTTT HTTH HHHT THTT THHT HHTH TTHT THTH HTHH TTTT TTTH TTHH THHH HHHH X=0 X=1 X=2 X=3 X=4 A probability value can be associated to each value of X. 2/17/2019 Daniela Stan - CSC323

11 So if X=0, then no head comes up, so the probability is 1/16
HHTT HTHT HTTT HTTH HHHT THTT THHT HHTH TTHT THTH HTHH TTTT TTTH TTHH THHH HHHH X=0 X=1 X=2 X=3 X=4 So if X=0, then no head comes up, so the probability is 1/16 If X=1, then only one head come up, the probability is 4/16 If X=2, then 2 heads come up, the probability is 6/16 If X=3, then 3 heads come up, the probability is 4/16 If X=4, then 4 heads come up, the probability is 1/16 X 1 2 3 4 Probability 0.0625 0.25 0.375 The outcome of 4 tosses of a coin changes from time to time, so the value of X (number of heads in 4 tosses) will change accordingly. 2/17/2019 Daniela Stan - CSC323

12 Probability Histograms
The probability table associated to a random process or a random variable can be displayed as a probability histogram. For example the probability histogram of the number of heads in 4 tosses of a coin is displayed below: X 1 2 3 4 Probability 0.0625 0.25 0.375 Chance 40 (%) 30 20 10 A probability histogram represents chance (or probability) NOT data. The total area under the histogram is 100% chance (or probability=1) 2/17/2019 Daniela Stan - CSC323

13 The law of averages: If the process is repeated many times, empirical histograms will converge to probability histograms. Suppose we toss 4 coins 20 times, 100 times and 1,000 times, and we count the number of heads. The empirical histograms below show the observed proportions of heads. In 20 repetitions In 100 repetitions Percent 40 30 20 10 7/20 7/20 Percent 40 30 20 10 3/20 4/20 1/20 In 1,000 repetitions Probability histogram Percent 40 30 20 10 Chance 40 (%) 30 20 10 2/17/2019 Daniela Stan - CSC323

14 Expected value and standard error
A chance process is observed. It produces a number, then another and another… For instance the number of heads in 100 tosses. You might get 57, which is 7 heads above the expected value of 50 Toss a coin 100 times again, you might get 55, which is 5 heads above 50 Again… you might get 48, which is 2 heads below 50….and so on…. The numbers delivered by the process vary around the expected value, the amount off being similar in size to the standard error. Thus in 100 tosses the expected value for the number of heads is 50. The standard error is a measure of the chance error. We will now define these two quantities. 2/17/2019 Daniela Stan - CSC323

15 The Expected Value The number of heads in 4 tosses of a coin would be a number around 2. In statistical terms: The expected value of the number of heads in 4 tosses of a coin is 2. It is calculated by multiplying each possible value by its probability, then adding all the products. X 1 2 3 4 Probability 0.0625 0.25 0.375 Expected value of X= 0* *0.25+2* *0.25+4*0.0625= = = 2 2/17/2019 Daniela Stan - CSC323

16 Expected value & Probability Histogram
The expected value is the mean (average) of the probability histogram. Ex: The expected value of the number of heads in 4 tosses of a coin is 2. Chance 40 (%) 30 20 10 X=2 2/17/2019 Daniela Stan - CSC323

17 Expected value and expected payoff in gambling
A game is fair if the expected value for the net gain equals 0: on the average players neither win or lose. Keno: In the game Keno, there are 80 balls, numbered 1 to 80. On each play, the casino chooses 20 balls at random. Suppose you bet $1 on 17 in each Keno play. When you win, the casino gives you your dollar back and 2 dollars more When you lose, the casino keeps your dollar. The bet pays 3 to 1. Is the bet fair? Event Probability X=+3 You win, 17 is among the 20 balls 20/80=0.25 X=–1 You lose, 17 is not among 20 balls 60/80=0.75 Create the random variable: The expected value of X is –1*0.75+3*0.25=0. The game is fair! 2/17/2019 Daniela Stan - CSC323

18 Example: American Roulette The roulette wheel has 38 slots numbered 0,00, and 1-36. A straight bet: bet on a single number – pays 35 to 1. A color bet: bet either on red or black – pays 1 to 1 (lose if 0, 00 comes up) A 4-number bet : bet on the four numbers in a square – pays 8 to 1. These are just a few possible bets, what is the expected payoff? Which bet would you expect to pay better? 2/17/2019 Daniela Stan - CSC323

19 A straight bet: bet on a single number – pays 35 to 1.
The random variable X that represents the bet is The expected payoff is the expected value μX=35*1/38+(–1)*37/38=–2/38=–0.0526 In the long run, you will lose 5 cents for each dollar you bet. A color bet: bet either on red or black – pays 1 to 1 (lose if 0, 00 comes up). The expected payoff is the expected value μX=1*18/38+(–1)*20/38=–2/38=–0.0526 In the long run, you will lose 5 cents for each dollar you bet. The bet X on 17 Event Probability X =35 Win, 17 comes out 1/38 X=–1 Lose, 17 does not come out 37/38 The bet X on red Event Probability X =1 Win, a red number comes out 18/38 X=–1 Lose, a black number comes out 20/38 2/17/2019 Daniela Stan - CSC323

20 A 4-number bet : bet on the four numbers in a square – pays 8 to 1.
The bet X on {1,2,4,5} Event Probability X =8 Win, if either 1,2,4 or 5 come out 4/38 X=–1 Lose, they do not come out 34/38 The expected payoff is μX=8*4/38+(–1)*34/38=–2/38=–0.0526 In the American roulette all the bets have the same expected payoff!! In the long run, you will lose 5 cents for each dollar you bet. 2/17/2019 Daniela Stan - CSC323

21 A measure of the chance error: the standard error
The actual number of heads will be off the expected value for some amount. How big is that amount on average? The number of heads in 4 tosses of a coin will be: number of heads X= expected value +chance error So if we observe 3 heads, the chance error is +1; if we observe 1 head, the chance error is –1. The chance error is measured by the standard error. The standard error is calculated by the following 3 steps: Calculate the deviations of each value of X from the expected value X x1 – X, x2 – X ,…, xn – X. Square the deviations and multiply each square deviation by its probability. Add all the products. Take the square root. 2/17/2019 Daniela Stan - CSC323

22 The expected value is X =2
1 2 3 4 Probability 0.0625 0.25 0.375 Deviations X– X 0–2= –2 1–2 = –1 2–2=0 3–2=1 4–2=2 Square Deviations (-2)2=4 (-1)2=1 (0)2=0 (1)2=1 (2)2=4 Products= Dev2* Probability 4*0.0625= 0.25 1*0.25 =0.25 0*0.375=0 1*0.25=0.25 Step 3: sum the products: ( )=1 Step 4: Take the square root The standard error of X is X =1 = 1. The observed number of heads in 4 tosses of a coin is likely to be around 2, give or take 1. 2/17/2019 Daniela Stan - CSC323

23 Standard error and probability histograms
The standard error measures the spread of the probability histogram. The standard error of the number of heads in 4 tosses of a coin is 0.5. Chance 40 (%) 30 20 10 X–1s.e.=2 – 1=1 X+1s.e.=2+1=3 1 s.e. 1 s.e. X=2 Remark: Observed values are rarely more than 2 or 3 standard errors away!! 2/17/2019 Daniela Stan - CSC323

24 The standard errors of the bets for the American roulette.
A straight bet: The bet X on 17 Event Probability X =35 Win, 17 comes out 1/38 X=–1 Lose, 17 does not come out 37/38 The expected payoff is the expected value μX =–0.0526 The standard error is S.E.=sqrt[(35–(–0.0526))2*1/38+ (– )2*37/38]=5.763 A color bet: The bet X on red Event Probability X =1 Win, a red number comes out 18/38 X=–1 Lose, a black number comes out 20/38 The expected payoff is the expected value μX=– The standard error is S.E.=sqrt[(1–(–0.0526))2*18/38+ (– )2*20/38]=0.997 2/17/2019 Daniela Stan - CSC323

25 The standard error is measuring the risk in each bet!!!
A 4-number bet : The bet X on {1,2,4,5} Event Probability X =8 Win, if either 1,2,4 or 5 come out 4/38 X=–1 Lose, they do not come out 34/38 The expected payoff is μX=– The standard error is S.E.= sqrt[(8–(–0.0526))2*4/38+ (– )2*34/38]=2.762 Thus the three bets have the same expected payoffs, but different standard errors. The straight bet has the highest standard error, it is the riskiest bet!! If you bet one dollar on the same number for many times, on average your gain will be a value in (μX –2S.E, μX +2S.E), that is (-11.58$, 11.47$). You can lose or win up to 11 times your bet!! In the color bet, the standard error is small. If you bet one dollar on red for many times, on average your gain will be a value in (-2.047$, 1.94$). You can lose or win up to 2 times your bet! In the 4-number bet, if you bet one dollar for many times, on average your gain will be a value in (-5.47$, 5.58$). You can lose or win up to 5 times your bet! The standard error is measuring the risk in each bet!!! 2/17/2019 Daniela Stan - CSC323

26 Remarks on random processes
An observed value should be somewhere around the expected value; the difference is chance error. The likely size of the chance error is the standard error. Observed values are rarely two or three standard errors away from the expected value. The standard deviation is defined for a list of numbers. The standard error is defined for random processes and measures the chance error. (Subtle difference) The standard error “makes more sense” if the probability histogram of the random variable is bell-shaped, (similar to the normal distribution). 2/17/2019 Daniela Stan - CSC323

27 Mathematical Expressions
Given a random variable X with probability table X x1 x2 x3 x4 xk Probability p1 p2 p3 p4 pk The expected value is The standard error is 2/17/2019 Daniela Stan - CSC323


Download ppt "Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03"

Similar presentations


Ads by Google