Biostat. 200 Review slides Week 1-3
Recap: Probability
Basic Probability 1)Complement P(A)= 1-P(Ā) 2)Intersection = P(A ∩ B) 3)Union = P(A U B) P(A U B) =P(A) + P(B) – P(A ∩ B)
Basic Probability 4) Mutually exclusivity (but still dependant) Mutual exclusivity = Additive Rule P(A ∩ B) = 0 P(A U B) = P(A) + P(B) - P(A ∩ B) = P(A) + P(B) 4
Basic Probability 5) Conditional Probability The probability that an event B will occur given that event A has occurred. Use the multiplicative rule – P(A ∩ B) = P(A) P(B|A) – P(B|A) = P(A ∩ B) / P(A) Applies to – Relative risks – Odds ratios
Basic Probability 6) Independence Note that independence ≠ mutual exclusivity! If A and B are independent: – P(B | A)=P(B | Ā) = P(B) – P(A | B) = P(A|B)= P(A) – P(A ∩ B) = P(A)P(B) (Multiplicative rule)
Probability Distributions Discrete distributions Continuous distributions
Discrete Variables For discrete variables the probability distribution describes the probability of each possible value 8
Discrete distributions Bernoulli distribution variable that can take on one of two values with a constant probability p, then it is a Bernoulli random variable outcomes are either 0 or 1 theoretical building block to describe the distribution of more than one trial.
Discrete Distributions Binomial Distribution: With: p is probability of “success” in each “trial” n is the number of “trials” n and p are the parameters of the binomial distribution, (summarize the distribution) x is the number of “successes” (outcomes) Note that Stata and Table A.1 use the symbol k for x 10
Binominal Distributions Assumes – Fixed number of trials n, each with one of two mutually exclusive outcomes – Independent outcomes of the n trials – Constant probability of success p for each trial
Binominal Distribution What is the probability of exactly 2 cases of disease in a sample of n=5 where p=0.15? How to calculate the probability? 1) Use the binomial formula In Stata: display comb(n,k). display comb(5,2) 10 – (10)(0.15) 2 (1-0.15) 5-2 – (10)(0.0225) (0.614) = 0.138
Binominal Distribution What is the probability of exactly 2 cases of disease in a sample of n=5 where p=0.15? How to calculate the probability? 2) Use Table A1 – Table A.1 gives you P(X=k) – Look up p=.15, n=5, k=2, answer=.1382
Binominal Distribution What is the probability of exactly 2 cases of disease in a sample of n=5 where p=0.15? How to calculate the probability? 3) Use Stata Binomialp (n,k,p) display binomialp(5,2,.15)
Binomial Distrubution What is the probability of 1 or more cases of disease in a sample of n=5 where p=0.15? 1) Use Binomial Formula P(X≥1) = 1-P(X=0) di comb(5,1)*0.15^1.85^ So 1-P(X=0) = =
Binomial Distrubution What is the probability of 1 or more cases of disease in a sample of n=5 where p=0.15? 2) Use Table A1 P(X≥1) = 1-P(X=0) Looking up P(X=0) we get – So 1-P(X=0) = =
Binominal Distribution What is the probability of 1 or more cases of disease in a sample of n=5 where p=0.15? How to calculate the probability? 3) Use Stata – display binomialtail(5,1,.15)
Binomial Distribution Binomial mean = np Binomial variance= np(1-p) – Variance is largest when p=0.5, smaller when p closer to 0 or 1 – The distribution is symmetric when p=0.5 – The distribution is a mirror image for 1-p (i.e. the distribution for p=0.05 is the mirror image of the one for p=0.95) 18
Continuous distributions Normal distribution
Continuous Distribution For continuous variables, the distribution describes the probability of a range of values
Normal distribution The probability density function is μ is the mean and σ is the standard deviation of a normally distributed random variable – They are the parameters of the normal distribution – π is the constant that is approximately
22
The Standard Normal Distribution μ and σ can take on an infinite number of values standard curve with – μ =0 – σ =1 (and variance σ 2 =1). Denoted N(0,1) 23
The Standard Normal Distribution If X is a normally distributed random variable with mean μ and standard deviation σ then Z= (X – μ)/σ is a standard normal random variable That is, a normally distributed random variable with its mean subtracted off, divided by its standard deviation, is a normal random variable with mean=0 and standard deviation=1 24
For Z ~ N(0,1) P(Z≥0) = Zero is the mean & median For a standard normal distribution
For Z ~ N(0,1) P(Z≥1.96) = Probability of observing a value of 1.96 or greater is 0.025
P(µ-2σ ≤ Z ≤ µ+2σ) Remember µ=0 and σ=1, so this is P(-2 < Z < 2) = Therefore, approximately 95.4% of the area of the standard normal is within 2 SD of the mean
Stata will calculate standard normal probabilities for you In Stata, the left portion of the curve P(Z<z) is calculated for you. display normal(1.96) If you want the right hand portion of the curve, P(Z>z), you subtract your answer from 1 display 1-normal(1.96) If you want the middle: display normal(1.96) -normal(-1.96)
To get the z value for P(Z<z) = p use display invnormal(p) To get the z value for P(Z>z) = p use display invnormal(1-p) E.g. what is the z value for P(Z≤z) = display invnormal(0.025) E.g. what is the z value for P(Z>z) = display invnormal(1-.025) Finding z values for probabilities in Stata 29
To get the z value for P(Z>z) = p – find p in the table and read the corresponding z To get the z value for P(Z<z) = p – find p and use -1* the corresponding p E.g. what is the z value for P(Z≤z) = For p=0.025 the table value is 1.96, so the answer is E.g. what is the z value for P(Z>z) = For p=0.025 the table value is 1.96 Finding z values for probabilities in using Table A.3 30