Download presentation
Presentation is loading. Please wait.
Published byShanon Bryant Modified over 6 years ago
1
Part A: Concepts & binomial distributions Part B: Normal distributions
11/13/2018 4: Probability Part A: Concepts & binomial distributions Part B: Normal distributions 11/13/2018 Unit 4: Intro to probability Biostat
2
Unit 4: Intro to probability
Definitions Random variable a numerical quantity that takes on different values depending on chance Population the set of all possible values for a random variable Event an outcome or set of outcomes for a random variable Probability the proportion of times an event occurs in the population; (long-run) expected proportion 11/13/2018 Unit 4: Intro to probability
3
Probability (definition #1)
The probability of an event is its relative frequency (proportion) in the population. Example: Let A selecting a female at random from an HIV+ population There are 600 people in the population. There are 159 females. Therefore, Pr(A) = 159 ÷ 600 = 0.265 11/13/2018 Unit 4: Intro to probability
4
Probability (definition #2)
The probability of an event is its expected proportion when the process in repeated again and again under the same conditions Select 100 individuals at random 24 are female Pr(A) 24 ÷ 100 = 0.24 This is only an estimate (unless n is very very big) 11/13/2018 Unit 4: Intro to probability
5
Probability (definition #3)
The probability of an event is a quantifiable level of belief between 0 and 1 Probability Verbal expression 0.00 Never 0.05 Seldom 0.20 Infrequent 0.50 As often as not 0.80 Very frequent 0.95 Highly likely 1.00 Always Example: Prior experience suggests a quarter of population is female. Therefore, Pr(A) ≈ 0.25 11/13/2018 Unit 4: Intro to probability
6
Some rules of probability
11/13/2018 Unit 4: Intro to probability
7
Types of random variables
Discrete have a finite set of possible outcomes, e.g. number of females in a sample of size n (0, 1, 2, …, n) We cover binomial random variables Continuous have a continuum of possible outcomes e.g., average body weight (lbs) in a sample (160, 160.5, , , …) We cover Normal random variables There are other random variable families, but only binomial and Normal RVs are covered for now. 11/13/2018 Unit 4: Intro to probability
8
Binomial distributions
Most popular type of discrete RV Based on Bernoulli trial random event characterized by “success” or “failure” Examples Coin flip (heads or tails) Survival (yes or no) 11/13/2018 Unit 4: Intro to probability
9
Binomial random variables
Binomial random variable random number of successes in n independent Bernoulli trials A family of distributions identified by two parameters n number of trials p probability of success for each trial Notation: X~b(n,p) X random variable ~ “distributed as” b(n, p) binomial RV with parameters n and p 11/13/2018 Unit 4: Intro to probability
10
“Four patients” example
A treatment is successful 75% of time We treat 4 patients X random number of successes, which varies 0, 1, 2, 3, or 4 depending on binomial distribution X~b(4, 0.75) 11/13/2018 Unit 4: Intro to probability
11
The probability of i successes is …
Binomial formula The probability of i successes is … Where nCi = the binomial coefficient (next slide) p = probability of success for each trial q = probability of failure = 1 – p 11/13/2018 Unit 4: Intro to probability
12
Binomial coefficient (“choose function”)
where ! the factorial function: x! = x (x – 1) (x – 2) … 1 Example: 4! = 4 3 2 1 = 24 By definition 1! = 1 and 0! = 1 nCi the number of ways to choose i items out of n Example: “4 choose 2”: 11/13/2018 Unit 4: Intro to probability
13
“Four patients” example
n = 4 and p = 0.75 (so q = = 0.25) Question: What is probability of 0 successes? i = 0 Pr(X = 0) =nCi pi qn–i = 4C0 · · 0.254–0 = 1 · · = 11/13/2018 Unit 4: Intro to probability
14
Unit 4: Intro to probability
X~b(4,0.75), continued Pr(X = 1) = 4C1 · · –1 = 4 · · = Pr(X = 2) = 4C2 · · –2 = 6 · · = (Do not demonstrate all calculations. Students should prove to themselves they derive and interpret these values.) 11/13/2018 Unit 4: Intro to probability
15
Unit 4: Intro to probability
X~b(4, 0.75) continued Pr(X = 3) = 4C3 · · –3 = 4 · · 0.25 = Pr(X = 4) = 4C4 · · –4 = 1 · · 1 = 11/13/2018 Unit 4: Intro to probability
16
The distribution X~b(4, 0.75)
Probability table for X~b(4,.75) Probability curve for X~b(4,.75) Successes Probability 0.0039 1 0.0469 2 0.2109 3 0.4210 4 0.3164 11/13/2018 Unit 4: Intro to probability
17
Area under the curve (AUC) concept
The area under a probability curve (AUC) = probability! Get it? Pr(X = 2) = .2109 11/13/2018 Unit 4: Intro to probability
18
Cumulative probability (left tail)
Cumulative probability = Pr(X i) = probability less than or equal to i Illustrative example: X~b(4, .75) Pr(X 0) = Pr(X = 0) = .0039 Pr(X 1) = Pr(X 0) + Pr(X = 1) = = Pr(X 2) = Pr(X 1) + Pr(X = 2) = = Pr(X 3) = Pr(X 2) + Pr(X = 3) = = Pr(X 4) = Pr(X 3) + Pr(X = 4) = = 11/13/2018 Unit 4: Intro to probability
19
Unit 4: Intro to probability
X~b(4, 0.75) Probability function Cumulative probability Pr(X 0) 0.0039 Pr(X 1) 0.0469 0.0508 Pr(X 2) 0.2109 0.2617 Pr(X 3) 0.4210 0.6836 Pr(X 4) 0.3164 1.0000 11/13/2018 Unit 4: Intro to probability
20
Cumulative probability
left tail = cumulative probability Area under shaded bars in left tail sums to , i.e., Pr(X 2) = Area under “curve” = probability Bring it on! 11/13/2018 Unit 4: Intro to probability
21
Reasoning Use probability model to reasoning about chance. I hypothesize p = 0.75, but observe only 2 successes. Should I doubt my hypothesis? ANS: No. When p = 0.75, you’ll see 2 or fewer successes 25% of the time (not that unusual). 11/13/2018 Unit 4: Intro to probability
22
StaTable probability calculator
Link on course homepage Three versions Java (browser) Windows Palm Probability Cumulative probability 11/13/2018 Unit 4: Intro to probability
23
Intro to Probability, Part B
The Normal distributions 11/13/2018 Unit 4: Intro to probability
24
The Normal distributions
Most popular continuous model Recognized by de Moivre (1667– 1754) Extended by Laplace (1749 – 1827) How’s my hair? Looks good. 11/13/2018 Unit 4: Intro to probability
25
Probability density function (curve)
11/13/2018 Probability density function (curve) Example: vocabulary scores of 947 seventh graders Smooth curve drawn over histogram is a model of the actual distribution Mathematical model is the Normal probability density function (pdf) 11/13/2018 Unit 4: Intro to probability Biostat
26
Unit 4: Intro to probability
11/13/2018 Area under curve The area under the curve (AUC) concepts applies The shaded bars (left tail) represent scores ≤ 6.0 = 30.3% of scores Pr(X ≤ 6) = 0.303 11/13/2018 Unit 4: Intro to probability Biostat
27
Areas under curve (cont.)
11/13/2018 Areas under curve (cont.) Now translate this to the area under the curve (AUC) The scale of the Y-axis is adjusted so the total AUC = 1 The AUC to the left of 6.0 (shaded) = 0.293 Therefore, the AUC “models” the area in proportion area in the bars of the histogram, i.e., probabilities of associated ranges 11/13/2018 Unit 4: Intro to probability Biostat
28
Unit 4: Intro to probability
11/13/2018 Density Curves 11/13/2018 Unit 4: Intro to probability Biostat
29
Arrows indicate points of inflection
11/13/2018 Normal distributions Normal distributions = a family of distributions with common characteristics Normal distributions have two parameters Mean µ locates center of the curve Standard deviation quantifies spread (at points of inflection) Arrows indicate points of inflection 11/13/2018 Unit 4: Intro to probability Biostat
30
Unit 4: Intro to probability
11/13/2018 rule for Normal RVs 68% of AUC falls within 1 standard deviation of the mean (µ ) 95% fall within 2 (µ 2) 99.7% fall within 3 (µ 3) 11/13/2018 Unit 4: Intro to probability Biostat
31
Illustrative example: WAIS
Wechsler adult intelligence scores (WAIS) vary according to a Normal distribution with μ = 100 and σ = 15 11/13/2018 Unit 4: Intro to probability
32
Another example (male height)
11/13/2018 Another example (male height) Adult male height is approximately Normal with µ = 70.0 inches and = 2.8 inches (NHANES, 1980) Shorthand: X ~ N(70, 2.8) Therefore: 68% of heights = µ = 70.0 2.8 = 67.2 to 72.8 95% of heights = µ 2 = 70.0 2(2.8) = 64.4 to 75.6 99.7% of heights = µ 3 = 70.0 3(2.8) = 61.6 to 78.4 11/13/2018 Unit 4: Intro to probability Biostat
33
Another example (male height)
11/13/2018 Another example (male height) What proportion of men are less than 72.8 inches tall? (Note: 72.8 is one σ above μ) ? (height) 68% (by Rule) -1 +1 16% 16% 84% 11/13/2018 Unit 4: Intro to probability Biostat
34
Male Height Example ? 68 70 (height)
11/13/2018 Male Height Example What proportion of men are less than 68 inches tall? ? (height) 68 does not fall on a ±σ marker. To determine the AUC, we must first standardize the value. 11/13/2018 Unit 4: Intro to probability Biostat
35
Standardized value = z score
11/13/2018 Standardized value = z score To standardize a value, simply subtract μ and divide by σ This is now a z-score The z-score tells you the number of standard deviations the value falls from μ 11/13/2018 Unit 4: Intro to probability Biostat
36
Example: Standardize a male height of 68”
11/13/2018 Example: Standardize a male height of 68” Recall X ~ N(70,2.8) Therefore, the value 68 is 0.71 standard deviations below the mean of the distribution 11/13/2018 Unit 4: Intro to probability Biostat
37
Men’s Height (NHANES, 1980) ? 68 70 (height values)
11/13/2018 Men’s Height (NHANES, 1980) What proportion of men are less than 68 inches tall? = What proportion of a Standard z curve is less than –0.71? (height values) ? (standardized values) You can now look up the AUC in a Standard Normal “Z” table. 11/13/2018 Unit 4: Intro to probability Biostat
38
Using the Standard Normal table
11/13/2018 Using the Standard Normal table z .00 .01 .02 0.8 .2119 .2090 .2061 0.7 .2420 .2389 .2358 0.6 .2743 .2709 .2676 Pr(Z ≤ −0.71) = .2389 11/13/2018 Unit 4: Intro to probability Biostat
39
Summary (finding Normal probabilities)
Draw curve w/ landmarks Shade area Standardize value(s) Use Z table to find appropriate AUC (standardized values) (height values) .2389 11/13/2018 Unit 4: Intro to probability
40
Right-”tail” 68 70 (height values)
11/13/2018 Right-”tail” What proportion of men are greater than 68” tall? Greater than look at right “tail” Area in right tail = 1 – (area in left tail) (standardized values) (height values) .2389 = .7611 Therefore, 76.11% of men are greater than 68 inches tall. 11/13/2018 Unit 4: Intro to probability Biostat
41
Unit 4: Intro to probability
Z percentiles zp the z score with cumulative probability p What is the 50th percentile on Z? ANS: z.5 = 0 What is the 2.5th percentile on Z? ANS: z.025 = 2 What is the 97.5th percentile on Z? ANS: z.975 = 2 11/13/2018 Unit 4: Intro to probability
42
Finding Z percentile in the table
11/13/2018 Finding Z percentile in the table Look up the closest entry in the table Find corresponding z score e.g., What is the 1st percentile on Z? z.01 = -2.33 closest cumulative proportion is .0099 z .02 .03 .04 2.3 .0102 .0099 .0096 11/13/2018 Unit 4: Intro to probability Biostat
43
Unstandardizing a value
11/13/2018 Unstandardizing a value How tall must a man be to place in the lower 10% for men aged 18 to 24? .10 ? (height values) 11/13/2018 Unit 4: Intro to probability Biostat
44
Table A: Standard Normal Table
11/13/2018 Table A: Standard Normal Table Use Table A Look up the closest proportion in the table Find corresponding standardized score Solve for X (“un-standardize score”) 11/13/2018 Unit 4: Intro to probability Biostat
45
Table A: Standard Normal Proportion
11/13/2018 Table A: Standard Normal Proportion z .07 .09 1.3 .0853 .0838 .0823 .1020 .0985 1.1 .1210 .1190 .1170 .08 1.2 .1003 Pr(Z < -1.28) = .1003 11/13/2018 Unit 4: Intro to probability Biostat
46
Men’s Height Example (NHANES, 1980)
11/13/2018 Men’s Height Example (NHANES, 1980) How tall must a man be to place in the lower 10% for men aged 18 to 24? .10 ? (height values) (standardized values) 11/13/2018 Unit 4: Intro to probability Biostat
47
Observed Value for a Standardized Score
11/13/2018 Observed Value for a Standardized Score “Unstandardize” z-score to find associated x : 11/13/2018 Unit 4: Intro to probability Biostat
48
Observed Value for a Standardized Score
11/13/2018 Observed Value for a Standardized Score x = μ + zσ = 70 + (-1.28 )(2.8) = 70 + (3.58) = 66.42 A man would have to be approximately inches tall or less to place in the lower 10% of the population 11/13/2018 Unit 4: Intro to probability Biostat
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.