STATISTICS AND PROBABILITY

Slides:



Advertisements
Similar presentations
Economics 105: Statistics Any questions? Go over GH2 Student Information Sheet.
Advertisements

1 Chapter 7 Probability Basics. 2 Chapter 7 Introduction to Probability Basics Learning Objectives –Probability Theory and Concepts –Use of probability.
Week 51 Theorem For g: R  R If X is a discrete random variable then If X is a continuous random variable Proof: We proof it for the discrete case. Let.
Chapter 16: Random Variables
Chris Morgan, MATH G160 February 3, 2012 Lecture 11 Chapter 5.3: Expectation (Mean) and Variance 1.
Variance Fall 2003, Math 115B. Basic Idea Tables of values and graphs of the p.m.f.’s of the finite random variables, X and Y, are given in the sheet.
Lecture 4 Rescaling, Sum and difference of random variables: simple algebra for mean and standard deviation (X+Y) 2 =X 2 + Y XY E (X+Y) 2 = EX 2.
Albert Gatt Corpora and Statistical Methods. Probability distributions Part 2.
Application of Random Variables
The Binomial Distribution
Statistical Experiment A statistical experiment or observation is any process by which an measurements are obtained.
COMP 170 L2 L18: Random Variables: Independence and Variance Page 1.
1 Lecture 4. 2 Random Variables (Discrete) Real-valued functions defined on a sample space are random vars. determined by outcome of experiment, we can.
Lectures prepared by: Elchanan Mossel elena Shvets Introduction to probability Stat 134 FAll 2005 Berkeley Follows Jim Pitman’s book: Probability Section.
GrowingKnowing.com © Expected value Expected value is a weighted mean Example You put your data in categories by product You build a frequency.
1 Chapter 16 Random Variables. 2 Expected Value: Center A random variable assumes a value based on the outcome of a random event.  We use a capital letter,
The Standard Deviation of a Discrete Random Variable Lecture 24 Section Fri, Oct 20, 2006.
Mean and Standard Deviation of Discrete Random Variables.
Probability theory Petter Mostad Sample space The set of possible outcomes you consider for the problem you look at You subdivide into different.
STA347 - week 51 More on Distribution Function The distribution of a random variable X can be determined directly from its cumulative distribution function.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 16 Random Variables.
Probability theory Tron Anders Moger September 5th 2007.
12/7/20151 Probability Introduction to Probability, Conditional Probability and Random Variables.
1 Week of July Week 2. 3 “oil” = oil is present “+” = a test for oil is positive “ - ” = a test for oil is negative oil no oil + + false negative.
Lesson Discrete Random Variables. Objectives Distinguish between discrete and continuous random variables Identify discrete probability distributions.
Copyright © 2011 Pearson Education, Inc. Random Variables Chapter 9.
Random Variables. Numerical Outcomes Consider associating a numerical value with each sample point in a sample space. (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 9 Random Variables.
Review Know properties of Random Variables
The Variance of a Random Variable Lecture 35 Section Fri, Mar 26, 2004.
Stat 13, Thu 4/19/ Hand in HW2! 1. Resistance. 2. n-1 in sample sd formula, and parameters and statistics. 3. Probability basic terminology. 4. Probability.
Mean and Standard Deviation Lecture 23 Section Fri, Mar 3, 2006.
7.2 Means & Variances of Random Variables AP Statistics.
Probability Distributions. Constructing a Probability Distribution Definition: Consists of the values a random variable can assume and the corresponding.
Week 61 Poisson Processes Model for times of occurrences (“arrivals”) of rare phenomena where λ – average number of arrivals per time period. X – number.
4-1 Business Finance (MGT 232) Lecture Risk and Return.
Copyright © 2009 Pearson Education, Inc. Chapter 16 Random Variables.
Discrete Probability Distributions
Probabilistic Cash Flow Analysis
Ch 9 實習.
Introduction to Probability Distributions
Discrete Random Variables
Chapter 4 Using Probability and Probability Distributions
Chapter 16 Random Variables.
STATISTICS AND PROBABILITY
Corpora and Statistical Methods
Mean and Standard Deviation
Another Population Parameter of Frequent Interest: the Population Mean µ
Chapter 16 Random Variables
Combining Random Variables
Discrete Probability Distributions
Week 8 Chapter 14. Random Variables.
Chapter 3: Getting the Hang of Statistics
Probability Key Questions
Mean and Standard Deviation
Mean and Standard Deviation
Chebychev, Hoffding, Chernoff
Mean and Standard Deviation
Chapter 3: Getting the Hang of Statistics
AP Statistics Chapter 16 Notes.
Financial Econometrics Fin. 505
Professor John Canny Fall 2004
Combining Random Variables
Mean and Standard Deviation
Discrete Random Variables and Probability Distributions
Ch 9 實習.
CS 160: Lecture 16 Professor John Canny 8/2/2019.
Forecasting Plays an important role in many industries
The Mean Variance Standard Deviation and Z-Scores
Presentation transcript:

STATISTICS AND PROBABILITY Raoul LePage Professor STATISTICS AND PROBABILITY www.stt.msu.edu/~lepage click on STT315_Fall 06 Week of 9-6-06.

2-75 ( "or both" is redundant ), suggested exercises ans in text, don't submit 2-61, 2-63, 2-65, 2-67, 2-69, 2-73, 2-75 ( "or both" is redundant ), 2-77 ( i.e. P( B up | A up ) ), 3-1, 3-5, 3-11, 3-15, 3-17, 3-23 and s.d., 3-25 Week 2.

TREE DIAGRAM “oil” = oil is present “+” = a test for oil is positive “-” = a test for oil is negative + oil - false negative false positive + no oil -

P(oil) = 0.3 P(+ | oil) = 0.9 P(+ | no oil) = 0.4 TREE DIAGRAM CONVENTIONS P(oil) = 0.3 P(+ | oil) = 0.9 P(+ | no oil) = 0.4 P(oil +) = (0.3) (0.9) = 0.27 P(+ | oil) = 0.9 + P(oil) = 0.3 oil - + no oil -

P(oil) = 0.3 P(+ | oil) = 0.9 P(+ | no oil) = 0.4 TOTAL OF BRANCHES = 1 P(oil) = 0.3 P(+ | oil) = 0.9 P(+ | no oil) = 0.4 sum of unconditional probabilities is one 0.3 oil 0.7 no oil

P(oil) = 0.3 P(+ | oil) = 0.9 P(- | oil) = 0.1 P(+ | no oil) = 0.4 TOTAL OF CONDITIONAL BRANCHES = 1 P(oil) = 0.3 P(+ | oil) = 0.9 P(- | oil) = 0.1 P(+ | no oil) = 0.4 sum of conditional probabilities is one 0.9 + 0.3 0.1 - oil 0.7 + no oil -

P(oil) = 0.3 P(+ | oil) = 0.9 P(+ | no oil) = 0.4 COMPLETE TREE P(oil) = 0.3 P(+ | oil) = 0.9 P(+ | no oil) = 0.4 0.27 oil+ unconditional 0.9 + 0.3 0.1 oil - 0.03 oil- 0.7 0.28 oil+ 0.4 + no oil 0.6 - 0.42 oil- conditional outcomes

- - + oil + no oil VENN DIAGRAM S oil + 0.03 0.27 0.28 0.42 0.27 oil+ 0.03 0.27 0.28 0.42 0.27 oil+ 0.9 + 0.3 0.1 - oil- oil 0.03 0.7 oil+ 0.28 0.4 + no oil - 0.6 0.42 oil-

+ oil + no oil TOTAL PROBABILITY P(+) = P(oil+) + P(no oil+) 0.55 = 0.27 + 0.28 0.27 oil+ 0.9 + 0.3 oil 0.7 oil+ 0.28 0.4 + no oil Oil contributes 0.27 to the total P(+) = 0.55.

P(oil | +) = P(oil+) / P(+) BAYES FORMULA S oil + 0.03 0.27 0.28 0.42 0.27 oil+ P(oil | +) = P(oil+) / P(+) = 0.27 / (0.27 + 0.28) = 0.4909.. oil+ 0.28 Oil contributes 0.27 of the total P(+) = 0.27+0.28.

- - + + MEDICAL TEST 0.98 0.01 disease 0.02 0.03 0.99 no disease 0.97 The test for this infrequent disease seems to be reliable having only 3% false positives and 2% false negatives. What if we test positive?

- - + + MEDICAL TEST 0.0098 0.01 0.98 disease 0.02 0.0002 0.0297 0.03 0.99 no disease 0.97 - 0.9603 We need to calculate P(diseased | +), the conditional probability that we have this disease GIVEN we’ve tested positive for it.

- - + + CALCULATING OUR CHANCES OF HAVING THE DISEASE IF + only 25% ! 0.0098 0.01 disease 0.98 + 0.02 - 0.0002 0.0297 0.03 + 0.99 no disease 0.97 - 0.9603 P(+) = 0.0098 + 0.0297 = 0.0395 P(disease | +) = P(disease+) / P(+) = .0098 / 0.0395 = 0.248. only 25% !

- - + + FALSE POSITIVE PARADOX 0.0098 0.98 0.01 disease 0.02 0.0002 one may overwhelm a good test by failing to screen 0.0098 0.01 disease 0.98 + 0.02 - 0.0002 0.0297 0.99 no disease 0.03 + 0.97 - 0.9603 EVEN FOR THIS ACCURATE TEST: P(diseased | +) is only around 25% because the non-diseased group is so predominant that most positives come from it.

- - + + FALSE POSITIVE PARADOX rare ! 0.00098 0.98 0.001 disease 0.02 one may overwhelm a good test by failing to screen 0.00098 0.001 disease 0.98 + rare ! 0.02 - 0.00002 0.02997 0.999 no disease 0.03 + 0.97 - 0.996003 WHEN THE DISEASE IS TRULY RARE: P(diseased | +) is a mere 3.2% because the huge non-diseased group has completely over-whelmed the test, which no longer has value

IMPLICATIONS OF THE PARADOX FOR MEDICAL PRACTICE: Good diagnostic tests will be of little use if the system is over-whelmed by lots of healthy people taking the test. Screen patients first. FOR BUSINESS: Good sales people capably focus their efforts on likely buyers, leading to increased sales. They can be rendered ineffective by feeding them too many false leads, as with massive un-targeted sales promotions.

RANDOM VARIABLE boats/month P(fewer than 3.7) = .4 P(4 to 7) = .55 probability 2 0.2 3 0.2 4 0.3 5 0.1 6 0.1 7 0.05 8 0.05 total 1 RANDOM VARIABLE (3-17 of text) boats/month P(fewer than 3.7) = .4 P(4 to 7) = .55

P(oil) = 0.3 oil no oil OIL DRILLING EXAMPLE Cost to drill 130 Reward for oil 400 net return “just drill” -130 + 400 = 270 drill oil drill no oil -130 + 000 = -130 0.3 oil 0.7 no oil A random variable is just a numerical function over the outcomes of a probability experiment.

EXPECTATION Definition of E X E X = sum of value times probability x p(x). Key properties E(a X + b) = a E(X) + b E(X + Y) = E(X) + E(Y) (always, if such exist) a. E(sum of 13 dice) = 13 E(one die) = 13(3.5). b. E(0.82 Ford US + Ford Germany - 20M) = 0.82 E(Ford US) + E(Ford Germany) - 20M regardless of any possible dependence.

total of 2 dice E ( total ) is just twice the 3.5 avg for one die (3-15) of text probability product 2 1/36 2/36 3 2/36 6/36 4 3/36 12/36 5 4/36 20/36 6 5/36 30/36 7 6/36 42/36 8 5/36 40/36 9 4/36 36/36 10 3/36 30/36 11 2/36 22/36 12 1/36 12/36 sum 1 252/36 = 7 E ( total ) is just twice the 3.5 avg for one die E(total)

E(number of boats this month) probability product 2 0.2 0.4 3 0.2 0.6 4 0.3 1.2 5 0.1 0.5 6 0.1 0.6 7 0.05 0.35 8 0.05 0.4 total 1 4.05 (3-17 of text) boats/month we avg 4.05 boats per month E(number of boats this month)

EXPECTATION IN THE OIL EXAMPLE Expected return from policy “just drill” is the probability weighted average (NET) return E(NET) = (0.3) (270) + (0.7) (-130) = 81 - 91 = -10. net return from policy“just drill.” -130 + 400 = 270 drill oil drill no-oil -130 + 0 = -130 just drill 0.3 oil 0.7 no oil E(X) = -10

OIL EXAMPLE WITH A "TEST FOR OIL" A test costing 20 is available. This test has: P(test + | oil) = 0.9 P(test + | no-oil) = 0.4. “costs” TEST 20 DRILL 130 OIL 400 0.27 0.9 + 0.3 0.1 oil - 0.03 0.28 0.7 0.4 + no oil 0.6 - 0.42 Is it worth 20 to test first?

EXPECTED RETURN IF WE "TEST FIRST" net return prob prod oil+ = -20 -130 + 400 = 250 0.27 67.5 oil- = -20 - 0 + 0 = - 20 .03 - 0.6 no oil+ = -20 -130 + 0 = -150 .28 - 42.0 no oil- = -20 - 0 + 0 = - 20 .42 - 8.4 total 1.00 16.5 drill only if the test is + E(NET) = .27 (250) - .03 (20) - .28 (150) - .42 (20) = 16.5 (for the “test first” policy). This average return is much preferred over the E(NET) = -10 of the “just drill” policy.

Variance and s.d. of boats/month (3-17) of text x p(x) x p(x) x2 p(x) (x-4.05)2 p(x) 2 0.2 0.4 0.8 0.8405 3 0.2 0.6 1.8 0.2205 4 0.3 1.2 4.8 0.0005 5 0.1 0.5 2.5 0.09025 6 0.1 0.6 3.6 0.38025 7 0.05 0.35 2.45 0.435125 8 0.05 0.4 3.2 0.780125 total 1.00 4.05 19.15 2.7475 quantity E X E X2 E (X - E X)2 terminology mean mean of squares variance = mean of sq dev s.d. = root(2.7474) = root(19.15 - 4.052) = 1.6576

VARIANCE AND STANDARD DEVIATION Var(X) =def E (X - E X)2 =comp E (X2) - (E X)2 i.e. Var(X) is the expected square deviation of r.v. X from its own expectation. Caution: The computing formula (right above), although perfectly accurate mathematically, is sensitive to rounding errors. Key properties: Var(a X + b) = a2 Var(X) (b has no effect). sd(a X + b) = |a| sd(X). VAR(X + Y) = Var(X) + VAR(Y) if X ind of Y.

EXPECTATION AND INDEPENDENCE Random variables X, Y are INDEPENDENT if p(x, y) = p(x) p(y) for all possible values x, y. If random variables X, Y are INDEPENDENT E (X Y) = (E X) (E Y) echoing the above. Var( X + Y ) = Var( X ) + Var( Y ).

PRICE RELATIVES EXPECTED RETURN Venture one returns random variable X per $1 investment. This X is termed the “price relative.” This random X may in turn be reinvested in venture two which returns random variable Y per $1 investment. The return from $1 invested at the outset is the product random variable XY. If INDEPENDENT, E( X Y ) = (E X) (E Y). EXPECTED RETURN

PARADOX OF GROWTH WE AVERAGE 14% PER PERIOD EXAMPLE: x p(x) x p(x) 0.8 0.3 0.24 $1 X 1.2 0.5 0.60 1.5 0.2 0.30 E(X) = 1.14 BUT YOU WILL NOT EARN 14%. Simply put, the average is not a reliable guide to real returns in the case of exponential growth. WE AVERAGE 14% PER PERIOD

EXPECTATION governs SUMS but sums are in the exponent EXAMPLE: x p(x) Log[x] p(x) 0.8 0.3 -0.029073 $1 X 1.2 0.5 0.039591 1.5 0.2 0.035218 E Log10[X] = 0.105311 100.105311.. = 1.11106.. With INDEPENDENT plays your RANDOM return will compound at 11.1% not 14%. (more about this later in the course)

THREE RANDOM EVOLUTIONS COMPARING 1.14^n WITH THREE RANDOM EVOLUTIONS you can see that 14% exceeds reality