STATISTICS AND PROBABILITY

STATISTICS AND PROBABILITY
Raoul LePage Professor STATISTICS AND PROBABILITY click on STT315_Fall 06 Week of

2-75 ( "or both" is redundant ),
suggested exercises ans in text, don't submit 2-61, 2-63, 2-65, 2-67, 2-69, 2-73, 2-75 ( "or both" is redundant ), 2-77 ( i.e. P( B up | A up ) ), 3-1, 3-5, 3-11, 3-15, 3-17, 3-23 and s.d., 3-25 Week 2.

TREE DIAGRAM “oil” = oil is present “+” = a test for oil is positive “-” = a test for oil is negative + oil - false negative false positive + no oil -

P(oil) = 0.3 P(+ | oil) = 0.9 P(+ | no oil) = 0.4
TOTAL OF BRANCHES = 1 P(oil) = 0.3 P(+ | oil) = 0.9 P(+ | no oil) = 0.4 sum of unconditional probabilities is one 0.3 oil 0.7 no oil

P(oil) = 0.3 P(+ | oil) = 0.9 P(+ | no oil) = 0.4
COMPLETE TREE P(oil) = 0.3 P(+ | oil) = 0.9 P(+ | no oil) = 0.4 0.27 oil+ unconditional 0.9 + 0.3 0.1 oil - 0.03 oil- 0.7 0.28 oil+ 0.4 + no oil 0.6 - 0.42 oil- conditional outcomes

- - + oil + no oil VENN DIAGRAM S oil + 0.03 0.27 0.28 0.42 0.27 oil+
0.27 oil+ 0.9 + 0.3 0.1 - oil- oil 0.03 0.7 oil+ 0.28 0.4 + no oil - 0.6 0.42 oil-

+ oil + no oil TOTAL PROBABILITY P(+) = P(oil+) + P(no oil+)
0.55 = 0.27 oil+ 0.9 + 0.3 oil 0.7 oil+ 0.28 0.4 + no oil Oil contributes 0.27 to the total P(+) = 0.55.

P(oil | +) = P(oil+) / P(+)
BAYES FORMULA S oil 0.27 oil+ P(oil | +) = P(oil+) / P(+) = 0.27 / ( ) = oil+ 0.28 Oil contributes 0.27 of the total P(+) =

- - + + MEDICAL TEST 0.98 0.01 disease 0.02 0.03 0.99 no disease 0.97
The test for this infrequent disease seems to be reliable having only 3% false positives and 2% false negatives. What if we test positive?

- - + + MEDICAL TEST 0.0098 0.01 0.98 disease 0.02 0.0002 0.0297 0.03
0.99 no disease 0.97 - 0.9603 We need to calculate P(diseased | +), the conditional probability that we have this disease GIVEN we’ve tested positive for it.

- - + + CALCULATING OUR CHANCES OF HAVING THE DISEASE IF + only 25% !
0.0098 0.01 disease 0.98 + 0.02 - 0.0002 0.0297 0.03 + 0.99 no disease 0.97 - 0.9603 P(+) = = P(disease | +) = P(disease+) / P(+) = / = only 25% !

- - + + FALSE POSITIVE PARADOX 0.0098 0.98 0.01 disease 0.02 0.0002
one may overwhelm a good test by failing to screen 0.0098 0.01 disease 0.98 + 0.02 - 0.0002 0.0297 0.99 no disease 0.03 + 0.97 - 0.9603 EVEN FOR THIS ACCURATE TEST: P(diseased | +) is only around 25% because the non-diseased group is so predominant that most positives come from it.

- - + + FALSE POSITIVE PARADOX rare ! 0.00098 0.98 0.001 disease 0.02
one may overwhelm a good test by failing to screen 0.001 disease 0.98 + rare ! 0.02 - 0.999 no disease 0.03 + 0.97 - WHEN THE DISEASE IS TRULY RARE: P(diseased | +) is a mere 3.2% because the huge non-diseased group has completely over-whelmed the test, which no longer has value

IMPLICATIONS OF THE PARADOX
FOR MEDICAL PRACTICE: Good diagnostic tests will be of little use if the system is over-whelmed by lots of healthy people taking the test. Screen patients first. FOR BUSINESS: Good sales people capably focus their efforts on likely buyers, leading to increased sales. They can be rendered ineffective by feeding them too many false leads, as with massive un-targeted sales promotions.

RANDOM VARIABLE boats/month P(fewer than 3.7) = .4 P(4 to 7) = .55
probability total RANDOM VARIABLE (3-17 of text) boats/month P(fewer than 3.7) = .4 P(4 to 7) = .55

P(oil) = 0.3 oil no oil OIL DRILLING EXAMPLE Cost to drill 130
Reward for oil 400 net return “just drill” = 270 drill oil drill no oil = -130 0.3 oil 0.7 no oil A random variable is just a numerical function over the outcomes of a probability experiment.

EXPECTATION Definition of E X
E X = sum of value times probability x p(x). Key properties E(a X + b) = a E(X) + b E(X + Y) = E(X) + E(Y) (always, if such exist) a. E(sum of 13 dice) = 13 E(one die) = 13(3.5). b. E(0.82 Ford US + Ford Germany - 20M) = E(Ford US) + E(Ford Germany) - 20M regardless of any possible dependence.

total of 2 dice E ( total ) is just twice the 3.5 avg for one die
(3-15) of text probability product / /36 / /36 / /36 / /36 / /36 / /36 / /36 / /36 / /36 / /36 / /36 sum /36 = 7 E ( total ) is just twice the 3.5 avg for one die E(total)

E(number of boats this month)
probability product total (3-17 of text) boats/month we avg 4.05 boats per month E(number of boats this month)

EXPECTATION IN THE OIL EXAMPLE
Expected return from policy “just drill” is the probability weighted average (NET) return E(NET) = (0.3) (270) + (0.7) (-130) = = -10. net return from policy“just drill.” = 270 drill oil drill no-oil = -130 just drill 0.3 oil 0.7 no oil E(X) = -10

OIL EXAMPLE WITH A "TEST FOR OIL"
A test costing 20 is available. This test has: P(test + | oil) = 0.9 P(test + | no-oil) = 0.4. “costs” TEST 20 DRILL 130 OIL 400 0.27 0.9 + 0.3 0.1 oil - 0.03 0.28 0.7 0.4 + no oil 0.6 - 0.42 Is it worth 20 to test first?

EXPECTED RETURN IF WE "TEST FIRST"
net return prob prod oil+ = = oil- = = no oil+ = = no oil- = = total drill only if the test is + E(NET) = .27 (250) (20) (150) (20) = 16.5 (for the “test first” policy). This average return is much preferred over the E(NET) = -10 of the “just drill” policy.

Variance and s.d. of boats/month
(3-17) of text x p(x) x p(x) x2 p(x) (x-4.05)2 p(x) total quantity E X E X E (X - E X)2 terminology mean mean of squares variance = mean of sq dev s.d. = root(2.7474) = root( ) =

VARIANCE AND STANDARD DEVIATION
Var(X) =def E (X - E X)2 =comp E (X2) - (E X)2 i.e. Var(X) is the expected square deviation of r.v. X from its own expectation. Caution: The computing formula (right above), although perfectly accurate mathematically, is sensitive to rounding errors. Key properties: Var(a X + b) = a2 Var(X) (b has no effect). sd(a X + b) = |a| sd(X). VAR(X + Y) = Var(X) + VAR(Y) if X ind of Y.

EXPECTATION AND INDEPENDENCE
Random variables X, Y are INDEPENDENT if p(x, y) = p(x) p(y) for all possible values x, y. If random variables X, Y are INDEPENDENT E (X Y) = (E X) (E Y) echoing the above. Var( X + Y ) = Var( X ) + Var( Y ).

PRICE RELATIVES EXPECTED RETURN
Venture one returns random variable X per $1 investment. This X is termed the “price relative.” This random X may in turn be reinvested in venture two which returns random variable Y per $1 investment. The return from $1 invested at the outset is the product random variable XY. If INDEPENDENT, E( X Y ) = (E X) (E Y). EXPECTED RETURN

PARADOX OF GROWTH WE AVERAGE 14% PER PERIOD EXAMPLE: x p(x) x p(x)
$ X E(X) = 1.14 BUT YOU WILL NOT EARN 14%. Simply put, the average is not a reliable guide to real returns in the case of exponential growth. WE AVERAGE 14% PER PERIOD

EXPECTATION governs SUMS but sums are in the exponent
EXAMPLE: x p(x) Log[x] p(x) $ X E Log10[X] = = With INDEPENDENT plays your RANDOM return will compound at 11.1% not 14%. (more about this later in the course)

THREE RANDOM EVOLUTIONS
COMPARING 1.14^n WITH THREE RANDOM EVOLUTIONS you can see that 14% exceeds reality

STATISTICS AND PROBABILITY

Similar presentations

Presentation on theme: "STATISTICS AND PROBABILITY"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

STATISTICS AND PROBABILITY

Similar presentations

Presentation on theme: "STATISTICS AND PROBABILITY"— Presentation transcript:

Similar presentations

About project

Feedback