Part A: Concepts & binomial distributions Part B: Normal distributions 11/13/2018 4: Probability Part A: Concepts & binomial distributions Part B: Normal distributions 11/13/2018 Unit 4: Intro to probability Biostat
Unit 4: Intro to probability Definitions Random variable a numerical quantity that takes on different values depending on chance Population the set of all possible values for a random variable Event an outcome or set of outcomes for a random variable Probability the proportion of times an event occurs in the population; (long-run) expected proportion 11/13/2018 Unit 4: Intro to probability
Probability (definition #1) The probability of an event is its relative frequency (proportion) in the population. Example: Let A selecting a female at random from an HIV+ population There are 600 people in the population. There are 159 females. Therefore, Pr(A) = 159 ÷ 600 = 0.265 11/13/2018 Unit 4: Intro to probability
Probability (definition #2) The probability of an event is its expected proportion when the process in repeated again and again under the same conditions Select 100 individuals at random 24 are female Pr(A) 24 ÷ 100 = 0.24 This is only an estimate (unless n is very very big) 11/13/2018 Unit 4: Intro to probability
Probability (definition #3) The probability of an event is a quantifiable level of belief between 0 and 1 Probability Verbal expression 0.00 Never 0.05 Seldom 0.20 Infrequent 0.50 As often as not 0.80 Very frequent 0.95 Highly likely 1.00 Always Example: Prior experience suggests a quarter of population is female. Therefore, Pr(A) ≈ 0.25 11/13/2018 Unit 4: Intro to probability
Some rules of probability 11/13/2018 Unit 4: Intro to probability
Types of random variables Discrete have a finite set of possible outcomes, e.g. number of females in a sample of size n (0, 1, 2, …, n) We cover binomial random variables Continuous have a continuum of possible outcomes e.g., average body weight (lbs) in a sample (160, 160.5, 160.75, 160.825, …) We cover Normal random variables There are other random variable families, but only binomial and Normal RVs are covered for now. 11/13/2018 Unit 4: Intro to probability
Binomial distributions Most popular type of discrete RV Based on Bernoulli trial random event characterized by “success” or “failure” Examples Coin flip (heads or tails) Survival (yes or no) 11/13/2018 Unit 4: Intro to probability
Binomial random variables Binomial random variable random number of successes in n independent Bernoulli trials A family of distributions identified by two parameters n number of trials p probability of success for each trial Notation: X~b(n,p) X random variable ~ “distributed as” b(n, p) binomial RV with parameters n and p 11/13/2018 Unit 4: Intro to probability
“Four patients” example A treatment is successful 75% of time We treat 4 patients X random number of successes, which varies 0, 1, 2, 3, or 4 depending on binomial distribution X~b(4, 0.75) 11/13/2018 Unit 4: Intro to probability
The probability of i successes is … Binomial formula The probability of i successes is … Where nCi = the binomial coefficient (next slide) p = probability of success for each trial q = probability of failure = 1 – p 11/13/2018 Unit 4: Intro to probability
Binomial coefficient (“choose function”) where ! the factorial function: x! = x (x – 1) (x – 2) … 1 Example: 4! = 4 3 2 1 = 24 By definition 1! = 1 and 0! = 1 nCi the number of ways to choose i items out of n Example: “4 choose 2”: 11/13/2018 Unit 4: Intro to probability
“Four patients” example n = 4 and p = 0.75 (so q = 1 - 0.75 = 0.25) Question: What is probability of 0 successes? i = 0 Pr(X = 0) =nCi pi qn–i = 4C0 · 0.750 · 0.254–0 = 1 · 1 · 0.0039 = 0.0039 11/13/2018 Unit 4: Intro to probability
Unit 4: Intro to probability X~b(4,0.75), continued Pr(X = 1) = 4C1 · 0.751 · 0.254–1 = 4 · 0.75 · 0.0156 = 0.0469 Pr(X = 2) = 4C2 · 0.752 · 0.254–2 = 6 · 0.5625 · 0.0625 = 0.2106 (Do not demonstrate all calculations. Students should prove to themselves they derive and interpret these values.) 11/13/2018 Unit 4: Intro to probability
Unit 4: Intro to probability X~b(4, 0.75) continued Pr(X = 3) = 4C3 · 0.753 · 0.254–3 = 4 · 0.4219 · 0.25 = 0.4219 Pr(X = 4) = 4C4 · 0.754 · 0.254–4 = 1 · 0.3164 · 1 = 0.3164 11/13/2018 Unit 4: Intro to probability
The distribution X~b(4, 0.75) Probability table for X~b(4,.75) Probability curve for X~b(4,.75) Successes Probability 0.0039 1 0.0469 2 0.2109 3 0.4210 4 0.3164 11/13/2018 Unit 4: Intro to probability
Area under the curve (AUC) concept The area under a probability curve (AUC) = probability! Get it? Pr(X = 2) = .2109 11/13/2018 Unit 4: Intro to probability
Cumulative probability (left tail) Cumulative probability = Pr(X i) = probability less than or equal to i Illustrative example: X~b(4, .75) Pr(X 0) = Pr(X = 0) = .0039 Pr(X 1) = Pr(X 0) + Pr(X = 1) = .0039 + .0469 = 0.0508 Pr(X 2) = Pr(X 1) + Pr(X = 2) = .0508 + .2109 = 0.2617 Pr(X 3) = Pr(X 2) + Pr(X = 3) = .2617 + .4219 = 0.6836 Pr(X 4) = Pr(X 3) + Pr(X = 4) = .6836 + .3164 = 1.0000 11/13/2018 Unit 4: Intro to probability
Unit 4: Intro to probability X~b(4, 0.75) Probability function Cumulative probability Pr(X 0) 0.0039 Pr(X 1) 0.0469 0.0508 Pr(X 2) 0.2109 0.2617 Pr(X 3) 0.4210 0.6836 Pr(X 4) 0.3164 1.0000 11/13/2018 Unit 4: Intro to probability
Cumulative probability left tail = cumulative probability Area under shaded bars in left tail sums to 0.2617, i.e., Pr(X 2) = 0.2617 Area under “curve” = probability Bring it on! 11/13/2018 Unit 4: Intro to probability
Reasoning Use probability model to reasoning about chance. I hypothesize p = 0.75, but observe only 2 successes. Should I doubt my hypothesis? ANS: No. When p = 0.75, you’ll see 2 or fewer successes 25% of the time (not that unusual). 11/13/2018 Unit 4: Intro to probability
StaTable probability calculator Link on course homepage Three versions Java (browser) Windows Palm Probability Cumulative probability 11/13/2018 Unit 4: Intro to probability
Intro to Probability, Part B The Normal distributions 11/13/2018 Unit 4: Intro to probability
The Normal distributions Most popular continuous model Recognized by de Moivre (1667– 1754) Extended by Laplace (1749 – 1827) How’s my hair? Looks good. 11/13/2018 Unit 4: Intro to probability
Probability density function (curve) 11/13/2018 Probability density function (curve) Example: vocabulary scores of 947 seventh graders Smooth curve drawn over histogram is a model of the actual distribution Mathematical model is the Normal probability density function (pdf) 11/13/2018 Unit 4: Intro to probability Biostat
Unit 4: Intro to probability 11/13/2018 Area under curve The area under the curve (AUC) concepts applies The shaded bars (left tail) represent scores ≤ 6.0 = 30.3% of scores Pr(X ≤ 6) = 0.303 11/13/2018 Unit 4: Intro to probability Biostat
Areas under curve (cont.) 11/13/2018 Areas under curve (cont.) Now translate this to the area under the curve (AUC) The scale of the Y-axis is adjusted so the total AUC = 1 The AUC to the left of 6.0 (shaded) = 0.293 Therefore, the AUC “models” the area in proportion area in the bars of the histogram, i.e., probabilities of associated ranges 11/13/2018 Unit 4: Intro to probability Biostat
Unit 4: Intro to probability 11/13/2018 Density Curves 11/13/2018 Unit 4: Intro to probability Biostat
Arrows indicate points of inflection 11/13/2018 Normal distributions Normal distributions = a family of distributions with common characteristics Normal distributions have two parameters Mean µ locates center of the curve Standard deviation quantifies spread (at points of inflection) Arrows indicate points of inflection 11/13/2018 Unit 4: Intro to probability Biostat
Unit 4: Intro to probability 11/13/2018 68-95-99.7 rule for Normal RVs 68% of AUC falls within 1 standard deviation of the mean (µ ) 95% fall within 2 (µ 2) 99.7% fall within 3 (µ 3) 11/13/2018 Unit 4: Intro to probability Biostat
Illustrative example: WAIS Wechsler adult intelligence scores (WAIS) vary according to a Normal distribution with μ = 100 and σ = 15 11/13/2018 Unit 4: Intro to probability
Another example (male height) 11/13/2018 Another example (male height) Adult male height is approximately Normal with µ = 70.0 inches and = 2.8 inches (NHANES, 1980) Shorthand: X ~ N(70, 2.8) Therefore: 68% of heights = µ = 70.0 2.8 = 67.2 to 72.8 95% of heights = µ 2 = 70.0 2(2.8) = 64.4 to 75.6 99.7% of heights = µ 3 = 70.0 3(2.8) = 61.6 to 78.4 11/13/2018 Unit 4: Intro to probability Biostat
Another example (male height) 11/13/2018 Another example (male height) What proportion of men are less than 72.8 inches tall? (Note: 72.8 is one σ above μ) ? 70 72.8 (height) 68% (by 68-95-99.7 Rule) -1 +1 16% 16% 84% 11/13/2018 Unit 4: Intro to probability Biostat
Male Height Example ? 68 70 (height) 11/13/2018 Male Height Example What proportion of men are less than 68 inches tall? ? 68 70 (height) 68 does not fall on a ±σ marker. To determine the AUC, we must first standardize the value. 11/13/2018 Unit 4: Intro to probability Biostat
Standardized value = z score 11/13/2018 Standardized value = z score To standardize a value, simply subtract μ and divide by σ This is now a z-score The z-score tells you the number of standard deviations the value falls from μ 11/13/2018 Unit 4: Intro to probability Biostat
Example: Standardize a male height of 68” 11/13/2018 Example: Standardize a male height of 68” Recall X ~ N(70,2.8) Therefore, the value 68 is 0.71 standard deviations below the mean of the distribution 11/13/2018 Unit 4: Intro to probability Biostat
Men’s Height (NHANES, 1980) ? 68 70 (height values) 11/13/2018 Men’s Height (NHANES, 1980) What proportion of men are less than 68 inches tall? = What proportion of a Standard z curve is less than –0.71? 68 70 (height values) ? -0.71 0 (standardized values) You can now look up the AUC in a Standard Normal “Z” table. 11/13/2018 Unit 4: Intro to probability Biostat
Using the Standard Normal table 11/13/2018 Using the Standard Normal table z .00 .01 .02 0.8 .2119 .2090 .2061 0.7 .2420 .2389 .2358 0.6 .2743 .2709 .2676 Pr(Z ≤ −0.71) = .2389 11/13/2018 Unit 4: Intro to probability Biostat
Summary (finding Normal probabilities) Draw curve w/ landmarks Shade area Standardize value(s) Use Z table to find appropriate AUC -0.71 0 (standardized values) 68 70 (height values) .2389 11/13/2018 Unit 4: Intro to probability
Right-”tail” 68 70 (height values) 11/13/2018 Right-”tail” What proportion of men are greater than 68” tall? Greater than look at right “tail” Area in right tail = 1 – (area in left tail) -0.71 0 (standardized values) 68 70 (height values) .2389 1- .2389 = .7611 Therefore, 76.11% of men are greater than 68 inches tall. 11/13/2018 Unit 4: Intro to probability Biostat
Unit 4: Intro to probability Z percentiles zp the z score with cumulative probability p What is the 50th percentile on Z? ANS: z.5 = 0 What is the 2.5th percentile on Z? ANS: z.025 = 2 What is the 97.5th percentile on Z? ANS: z.975 = 2 11/13/2018 Unit 4: Intro to probability
Finding Z percentile in the table 11/13/2018 Finding Z percentile in the table Look up the closest entry in the table Find corresponding z score e.g., What is the 1st percentile on Z? z.01 = -2.33 closest cumulative proportion is .0099 z .02 .03 .04 2.3 .0102 .0099 .0096 11/13/2018 Unit 4: Intro to probability Biostat
Unstandardizing a value 11/13/2018 Unstandardizing a value How tall must a man be to place in the lower 10% for men aged 18 to 24? .10 ? 70 (height values) 11/13/2018 Unit 4: Intro to probability Biostat
Table A: Standard Normal Table 11/13/2018 Table A: Standard Normal Table Use Table A Look up the closest proportion in the table Find corresponding standardized score Solve for X (“un-standardize score”) 11/13/2018 Unit 4: Intro to probability Biostat
Table A: Standard Normal Proportion 11/13/2018 Table A: Standard Normal Proportion z .07 .09 1.3 .0853 .0838 .0823 .1020 .0985 1.1 .1210 .1190 .1170 .08 1.2 .1003 Pr(Z < -1.28) = .1003 11/13/2018 Unit 4: Intro to probability Biostat
Men’s Height Example (NHANES, 1980) 11/13/2018 Men’s Height Example (NHANES, 1980) How tall must a man be to place in the lower 10% for men aged 18 to 24? .10 ? 70 (height values) -1.28 0 (standardized values) 11/13/2018 Unit 4: Intro to probability Biostat
Observed Value for a Standardized Score 11/13/2018 Observed Value for a Standardized Score “Unstandardize” z-score to find associated x : 11/13/2018 Unit 4: Intro to probability Biostat
Observed Value for a Standardized Score 11/13/2018 Observed Value for a Standardized Score x = μ + zσ = 70 + (-1.28 )(2.8) = 70 + (3.58) = 66.42 A man would have to be approximately 66.42 inches tall or less to place in the lower 10% of the population 11/13/2018 Unit 4: Intro to probability Biostat