Download presentation
Presentation is loading. Please wait.
Published byKelley Parker Modified over 9 years ago
1
Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University East Lansing, Michigan 48824, USA Email: fuw@msu.edu fuw@msu.edu www: http://www.msu.edu/~fuw http://www.msu.edu/~fuw
2
Chapter 3 Probability Theory Probability of an event: Pr (A). Probability of an event: Pr (A). Sample space: a collection of all possible outcomes of events. Sample space: a collection of all possible outcomes of events. Examples: Examples: Tossing a coin {H}, {T}. Tossing two coins {HH}, {HT}, {TT}. Tossing a coin {H}, {T}. Tossing two coins {HH}, {HT}, {TT}. Tossing a dice {1}, {2}, … {6}. Blood pressure {L}, {H}, {N}. Tossing a dice {1}, {2}, … {6}. Blood pressure {L}, {H}, {N}. Sex {M}, {F}.Race {white}, {black}, {Asian} … Sex {M}, {F}.Race {white}, {black}, {Asian} … How to have a probability of male students in the class. How to have a probability of male students in the class. 1. count {M} and {F}. N(male) / N(total). 1. count {M} and {F}. N(male) / N(total). 2. Sampling method. Assuming total 40 students. 2. Sampling method. Assuming total 40 students. Take random samples of 40 with replacement. Calculate proportion p. Repeat this procedure many times, say B = 200 times. Take the average of these 200 proportions (p 1 +…+p 200 )/200 =p 0 This number p 0 will be close to the true proportion p in the class. This number p 0 will be close to the true proportion p in the class. This method is called Bootstrap sampling – Invented by Efron. This method is called Bootstrap sampling – Invented by Efron. Efron and Tibshirani: Introduction to the Bootstrap, Spring 1991.
3
Bootstrap R program ###### male student proportion in ###### class by bootstrap sampling x <- c(rep(1,25),rep(0,15)) #### 25 males among 40 students x p<- sum(x)/length(x) B <- 200 pp <- rep(0, B) for (i in 1:B) { y <- sample(x, replace = T) y <- sample(x, replace = T) pp[i] <- sum(y)/length(y) pp[i] <- sum(y)/length(y)} p0 <- mean(pp) plot(c(1:B), pp) abline(h=p0, col=‘red’) title(main= ‘Bootstrap Approximation’) p p0 Answer: p = 0.625 p 0 = 0.6188
4
Bootstrap
5
Events An event is any set of outcomes of interest An event is any set of outcomes of interest Examples: heart attack, diarrhea, accident … Examples: heart attack, diarrhea, accident … Probability (of an event): relative frequency of this event over an indefinitely large (or infinite) number of trials. Pr (A) Probability (of an event): relative frequency of this event over an indefinitely large (or infinite) number of trials. Pr (A) Example 1: Student of age 19 – 20 in the classroom Example 1: Student of age 19 – 20 in the classroom Take a student, check the age, 1 – yes, 0 – no ; put student back to the pool. And record the process: 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, …… so far, 7 / 15 around certain number (Prob.) Example 2: Student of systolic BP 95 – 105 mm Hg, Example 2: Student of systolic BP 95 – 105 mm Hg, 1 – systolic BP 95 – 105; 0 – otherwise. 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, … … Example 3: Student had Hepatitis – B vaccine. Example 3: Student had Hepatitis – B vaccine. 1- H.B. vaccinated; 0- not yet.
6
Properties of probability Properties of probability (Pr (A) ). (1) 0 Pr (A) 1 (1) 0 Pr (A) 1 (2)If A and B can not occur at the same time, then (2)If A and B can not occur at the same time, then Pr (A or B) = Pr (A) + Pr (B) Mutually exclusive: if both can not occur at the same time. Mutually exclusive: if both can not occur at the same time. Example: A = {DBP > 90}, B = {DBP 90}, B = {DBP < 75} Notation {. } – Event description. Notation {. } – Event description. Venn diagram – John Venn (1834 – 1923) of England Venn diagram – John Venn (1834 – 1923) of England
7
John Venn popularized the idea of Venn Diagrams. John lived from 1834 to 1923 in England. He was a priest and taught at Gonville and Caius College of Cambridge. He wrote several books including two on logic. Venn Diagram
8
Venn diagram B A Mutually exclusive B A Not mutually exclusive A∩B AA Complement
9
Event operations Union A B: either A or B occurs or both occur. Union A B: either A or B occurs or both occur. Intersection A B: the event that both A and B occur simultaneously. Intersection A B: the event that both A and B occur simultaneously. Complement : the event where A does not occur. Complement : the event where A does not occur. Example: A = { SBP 70 }, B = { SBP 90 } Example: A = { SBP 70 }, B = { SBP 90 } A B = { SBP 70 or SBP 90 } = { All SBP readings }, A B = { 70 SBP 90 } = {SBP< 70} = {SBP< 70} Independence Two events are independent if the outcome of one event does not affect the other’s: or the intersection event probability follows Independence Two events are independent if the outcome of one event does not affect the other’s: or the intersection event probability follows Pr (A B)= Pr (A) x Pr (B) Pr (A B)= Pr (A) x Pr (B) Q: if two events are mutually exclusive, are they independent? Q: if two events are mutually exclusive, are they independent?
10
Probability Example:A = {Father hypertension}, Pr (A) = 0.6 Example:A = {Father hypertension}, Pr (A) = 0.6 B = {Mother hypertension}. Pr (B) = 0.5 1) If Pr (A B) = 0.3, are A and B independent? 2) How about if Pr (A B) = 0.6? Pr (A B) = 0.3 = 0.6 0.5 = Pr (A) Pr (B) Pr (A B) = 0.3 = 0.6 0.5 = Pr (A) Pr (B) Conclusion: A and B are independent in 1). If Pr (A B) = 0.6 then Pr (A B) = 0.6 > 0.6 0.5 = Pr (A) Pr (B) Conclusion: A and B are not independent. Causes of hypertension: very complex. But high salt intake leads to hypertension. Couples commonly share the same kind of food at home and may thus be highly correlated to be hypertensive.
11
Probability Example:Example: Students A and B take an exam. Example:Example: Students A and B take an exam. A – {Student A passes}; B – {Student B passes}. Pr (A) = 0.7, Pr (B) = 0.8, Pr (A B) = ? Pr (A B) = 0.7 0.8 = 0.56 if independent. How about if two students compete for the same award? Example:TB (tuberculosis) tests: Example:TB (tuberculosis) tests: A = {“+” by skin test} B = {“+” by X-ray test} Ideally use B to confirm A. Pr (A) = 0.3, Pr (B) = 0.5, Pr (A B) = 0.15 then A and B are independent.
12
Probability Laws Multiplication law Multiplication law If A1, A2, …, Ak are k mutually independent, then Pr(A1 A2… Ak) = Pr (A1) Pr (A2) … Pr (Ak) Addition law Addition law A and B are two events, then Pr (A B) = Pr (A) + Pr (B) – Pr (A B) The Venn diagram for addition law A B 3 mutually exclusive events
13
Probability Laws Addition law for more than 2 events Addition law for more than 2 events A, B and C are three events, then Pr (A B C) = Pr (A) + Pr (B) + Pr (C) – Pr (A B) – Pr (A C) – Pr (B C) – Pr (A B) – Pr (A C) – Pr (B C) + Pr (A B C) + Pr (A B C) The Venn diagram for addition law A B A B C C
14
Probability Example:3.1 - 3.13 Example:3.1 - 3.13 Family with flu: mother, father, 2 kids. A1 = {mother has flu}, A2 = {father flu}, A3 = {1st kid flu} A4 = {2nd kid flu}, B = {at least one kid has flu}, C = {at least one parent flu}, D = {at least one person flu} If Pr (A1) =.1, Pr (A2) =.1, Pr (both parents) =.02, If Pr (A1) =.1, Pr (A2) =.1, Pr (both parents) =.02, Pr (each kid has flu) =.2, Pr (at least one kid has flu) =.3 Q: B = ? C = ? D = ? A1, A2 indep ? A3, A4 indep? Q: B = ? C = ? D = ? A1, A2 indep ? A3, A4 indep? Pr (B) = ? {no kid has flu} = ?{no person has flu} = ? A: Pr(A3 A4)=? A: Pr(A3 A4)=? Addition Law: Pr(A3 A4) = Pr(A3) + Pr(A4) - Pr(A3 A4) Addition Law: Pr(A3 A4) = Pr(A3) + Pr(A4) - Pr(A3 A4) Pr(A3 A4) =Pr(A3)+Pr(A4) - Pr(A3 A4) =.2 +.2 -.3 =.1 Pr(A3 A4) =Pr(A3)+Pr(A4) - Pr(A3 A4) =.2 +.2 -.3 =.1
15
Conditional Probability Example. TB tests:Skin test v.s. X-ray test Example. TB tests:Skin test v.s. X-ray test A = {“+” by skin test}B = {“+” by x-ray} Ideally A B and A B Ideally A B and A B B confirms A, or A and B function the same; Extremely dependent Opposite: skin test is independent of X-ray test Opposite: skin test is independent of X-ray test i.e. A and B do not depend on each other. Not ideal (in practice) Not ideal (in practice) Need some kind of dependence:A B Need some kind of dependence:A B Want to know the probability of B given A Want to know the probability of B given A
16
Conditional Probability Conditional Probability of B given A Conditional Probability of B given A Pr (B | A) = Pr (B A) / Pr (A) Interpretation! Interpretation! Rules: Rules: (1) A, B indep, Pr (B|A) = Pr (B)=Pr (B|A) (2) A, B dep, Pr (B|A) Pr (B) Pr(B|A) because Pr (A B) Pr (A) Pr (B)
17
Applications of Conditional Probability Relative risk or risk ratio (RR) of B given A Relative risk or risk ratio (RR) of B given A Pr (B|A) / Pr(B |A ) Example. TB: Pr (B|A) = 0.01, Pr (B|A) = 0.0001 Example. TB: Pr (B|A) = 0.01, Pr (B|A) = 0.0001 RR = Pr (B | A) / Pr (B |A) = 0.01 / 0.0001 = 100 Interpretation: people with A is 100 times as likely to have TB Interpretation: people with A is 100 times as likely to have TB as people with A. But if A, B indep, But if A, B indep, Pr(B|A) = Pr(B) = Pr (B|A) RR = 1 Pr(B|A) = Pr(B) = Pr (B|A) RR = 1 Interpretation: equally likely to have TB regardless of A or A. Not a good screening test. Not a good screening test.
18
Applications of Conditional Probability Example: Smoking and lung cancer Example: Smoking and lung cancer A = {smoking},B = {lung cancer} RR = Pr (B|A) / Pr (B|A) = 0.08 / 0.0001 = 800 = 0.08 / 0.0001 = 800 Interpretation: Smokers are 800 times as likely to develop lung cancer as non-smokers. Screening tests A, B; Screening tests A, B; A = {“+” by screening}, B = {“+” by confirmation} Pr (B) = Pr (B A) + Pr (B A) From conditional probability Pr (B|A) = Pr (B A) / Pr (A); Pr (B|A) = Pr (B A)/Pr (A) Pr (B) = Pr (B|A) Pr(A) + Pr (B|A) Pr(A) Pr (B) = Pr (B|A) Pr(A) + Pr (B|A) Pr(A)
19
Applications of Conditional Probability Example: A group of people had screening test for TB and Example: A group of people had screening test for TB and 5% got positive results. Assume 85% of the (+) will be confirmed by X-ray test, and 0.5% of the ( ) will also be diagnosed by X-ray as TB patients. What proportion of the this group of people had TB? A = {(+) by screening}, B = {diagnosed by x-ray} A = {(+) by screening}, B = {diagnosed by x-ray} Want to know Pr (B). Answer:Pr (B) = Pr (B|A) Pr(A) + Pr (B|A) Pr(A) Answer:Pr (B) = Pr (B|A) Pr(A) + Pr (B|A) Pr(A) Pr (A) =.05, Pr (B|A) =.85, Pr (B|A) =.005 Pr (A) = 1 – Pr (A) = 1 -.05 =.95 Pr (B) =.85 x.05 +.005 x.95 =.0425 +.00475 =.04725 RR = Pr (B|A) / Pr (B|A) =.85 /.005 = 170
20
Applications of Probability Exhaustive events Exhaustive events A1, …, Ak are exhaustive events, if at least one must occur. A1, …, Ak are mutually exclusive and exhaustive events if one and only one event must occur. -- No two events occur at the same time -- No two events occur at the same time Examples: BP. A = {SBP 70}, C = {SBP > 110} Examples: BP. A = {SBP 70}, C = {SBP > 110} B = {70 < SBP 110}. These 3 events mutually exclusive and exhaustive. Complement events. A and A Complement events. A and A
21
Total Probability Rule A 1, …, A k are k mutually exclusive and exhaustive events. A 1, …, A k are k mutually exclusive and exhaustive events. B is any event. Then Pr (B) = k i=1 Pr (B|A i ) Pr (A i ). Note that Pr (B) = k i=1 Pr (B A i ). Note that Pr (B) = k i=1 Pr (B A i ). The total probability is an weighted average of the conditional probability Pr(B|A i ) with weight Pr(A i ). The total probability is an weighted average of the conditional probability Pr(B|A i ) with weight Pr(A i ). Example: 3.22 A five year study of cataract. Example: 3.22 A five year study of cataract. Population: 5000 people aged 60 and up. Q: What percentage of population will have cataract? How many cases of cataract can be expected? Age group Population percentage Cataract 60 – 64 45% 2.4% 65 – 69 28% 4.6% 70 – 74 20% 8.8% 75+ 7% 15.3%
22
Examples using Total Probability Rule Answer: Define events: B = {develop cataract}. Answer: Define events: B = {develop cataract}. A1 = {in age group 60 – 64}, A2 = {in age group 65 – 69} A3 = {in age group 70 – 74}, A4 = {age 75+}. Pr (B) = 4 i=1 Pr (B|Ai) Pr (Ai) Pr (Ai) : 45% 28% 20% 7% Pr (B|Ai): 2.4%4.6%8.8%15.3% Pr (B) = 2.4% x 45% + … + 15.3% x 7% = 0.052 = 5.2% The number of expected cases = 5000 x 5.2% = 260 Example: TB screening. A = {(+) by skin test}, Example: TB screening. A = {(+) by skin test}, B = {(+) by x-ray test}. 1000 people went for TB screening. 50 got positive results. Further x-ray test confirms 40 out of the 50 positives. Assume the skin test missed 0.2% of the negatives who are TB patients. How many TB patients are there in this group of 1000 people? Pr (B) = Pr (B|A) Pr (A) + Pr (B|A) Pr (A) =(40/50) x (50/1000)+0.2% x(1000–50)/1000 = 0.04 + 0.0019 = 0.0419 # TB patients = 0.0419 x 1000 = 41.9 42
23
Bayes’ theorem mathematician who first used probability inductively and established a mathematical basis for probability inference (a means of calculating, from the number of times an event has not occurred, the probability that it will occur in future trials). Thomas Bayes (1702 ~ 1761)
24
Bayes’ rule and screening test Screening test : T. A = {T+}; Confirmation: B = {disease}. Screening test : T. A = {T+}; Confirmation: B = {disease}. Q: How good is the screening test? Q: How good is the screening test? 1) Can it give a “+” to a subject with disease?Sensitive 1) Can it give a “+” to a subject with disease?Sensitive 2) Can it give a “-” to a subject with no disease?Specific 2) Can it give a “-” to a subject with no disease?Specific 3) Can it predict disease accurately? Predictive 3) Can it predict disease accurately? Predictive Predictive value positive (PV+) : Pr (dis. | T+) Predictive value positive (PV+) : Pr (dis. | T+) Predictive value negative (PV-) : Pr (no dis. | T – ) Predictive value negative (PV-) : Pr (no dis. | T – ) Sensitivity = Pr (T+ | dis) Sensitivity = Pr (T+ | dis) Specificity = Pr (T – | no dis) Specificity = Pr (T – | no dis) False positive = {T + with no disease} False positive = {T + with no disease} False positive rate = Pr (T+ | no dis) False negative = {T – with disease} False negative = {T – with disease} False negative rate = Pr (T – | dis )
25
Bayes’ rule and screening test Example: Smoking v.s. lung cancer Example: Smoking v.s. lung cancer if using smoking for screening too many false +. if using family history of breast cancer v.s. BR CA too many false negatives. if using family history of breast cancer v.s. BR CA too many false negatives. Pr (B) : probability of disease in the reference population. Pr (B) : probability of disease in the reference population. Bayes’ Rule Bayes’ Rule PV + = Pr (B|A) = PV – = Pr (B|A) = where x = Pr (B) = prevalence of disease. where x = Pr (B) = prevalence of disease. = =
26
Facts about Screening Tests Some important points: Some important points: Predictive values depend on Predictive values depend on 1). Sensitivity, 2). Specificity, and 3). Prevalence of the disease. Sensitivity and specificity remain the same with the test, Sensitivity and specificity remain the same with the test, but the prevalence of the disease changes from population to population, from time to time. So, Pr (A|B) and Pr (A|B) remain the same with the same test, So, Pr (A|B) and Pr (A|B) remain the same with the same test, while predictive values Pr (B|A) and Pr (B|A) change with Pr (B). False positive rate Pr (A|B) = 1 – Pr (A|B) = 1 – specificity False positive rate Pr (A|B) = 1 – Pr (A|B) = 1 – specificity False negative rate Pr (A|B) = 1 – Pr (A | B) = 1 – sensitivity False negative rate Pr (A|B) = 1 – Pr (A | B) = 1 – sensitivity So false positive rate and false negative rate remain the same So false positive rate and false negative rate remain the same with the test when population changes.
27
Calculation for Screening Test Sensitivity = Pr (T+|D+) = 80/90 =.889 Sensitivity = Pr (T+|D+) = 80/90 =.889 Specificity = Pr (T-|D-) = 890/910 =.978 Specificity = Pr (T-|D-) = 890/910 =.978 Pr (D+) = 90 / 1000 =.09 Pr (D+) = 90 / 1000 =.09 PV+ = Pr (D+|T+) = 80/100 =.80 PV+ = Pr (D+|T+) = 80/100 =.80 PV- = Pr (D-|T-) = 890 / 900 =.989 PV- = Pr (D-|T-) = 890 / 900 =.989 Pf+ = Pr (T+|D-) = 20 / 910 =.022 Pf+ = Pr (T+|D-) = 20 / 910 =.022 Pf- = Pr (T-|D+) = 10 / 90 =.111 Pf- = Pr (T-|D+) = 10 / 90 =.111 Example Disease +–Total Screening Test +8020100 –10890900 Total 909101000
28
Calculation for Screening Test Sensitivity = Pr (T+|D+) = 80/90 =.889 Sensitivity = Pr (T+|D+) = 80/90 =.889 Specificity = Pr (T-|D-) = 890/910 =.978 Specificity = Pr (T-|D-) = 890/910 =.978 Pr (D+) = 90 / 1000 =.09 Pr (D+) = 90 / 1000 =.09 PV+ = Pr (D+|T+) = 80/100 =.80 PV+ = Pr (D+|T+) = 80/100 =.80 PV- = Pr (D-|T-) = 890 / 900 =.989 PV- = Pr (D-|T-) = 890 / 900 =.989 Pf+ = Pr (T+|D-) = 20 / 910 =.022 Pf+ = Pr (T+|D-) = 20 / 910 =.022 Pf- = Pr (T-|D+) = 10 / 90 =.111 Pf- = Pr (T-|D+) = 10 / 90 =.111 Example Disease +–Total Screening Test +8020100 –10890900 Total 909101000 Pf- = 1 - sensitivity
29
Calculation for Screening Test Sensitivity = Pr (T+|D+) = 80/90 =.889 Sensitivity = Pr (T+|D+) = 80/90 =.889 Specificity = Pr (T-|D-) = 890/910 =.978 Specificity = Pr (T-|D-) = 890/910 =.978 Pr (D+) = 90 / 1000 =.09 Pr (D+) = 90 / 1000 =.09 PV+ = Pr (D+|T+) = 80/100 =.80 PV+ = Pr (D+|T+) = 80/100 =.80 PV- = Pr (D-|T-) = 890 / 900 =.989 PV- = Pr (D-|T-) = 890 / 900 =.989 Pf+ = Pr (T+|D-) = 20 / 910 =.022 Pf+ = Pr (T+|D-) = 20 / 910 =.022 Pf- = Pr (T-|D+) = 10 / 90 =.111 Pf- = Pr (T-|D+) = 10 / 90 =.111 Example Disease +–Total Screening Test +8020100 –10890900 Total 909101000 Pf+ = 1 - Specificity
30
Generalized Bayes’ Rule Let B 1, …, B k be a set of mutually exclusive and exhaustive disease states, i.e. at least one occurs, but no two occur at the same time. Let A be symptoms of the disease. Let B 1, …, B k be a set of mutually exclusive and exhaustive disease states, i.e. at least one occurs, but no two occur at the same time. Let A be symptoms of the disease. Pr (Bi | A) = Pr (Bi | A) = Pr (A) = Example. Suppose A = {chronic cough, lung biopsy} Disease: B1 = {normal}, B2 = {lung cancer}, B3 = {sarcoidosis} Pr (A| B1) =.001,Pr (A| B2) =.9, Pr (A| B3) =.9 for a 60 years old non-smoker: Pr (B1) =.99, Pr (B2) =.001, Pr (B3) =.009 Q: A 60 years old non-smoker with symptom A, what disease would you diagnose?
31
Generalized Bayes’ Rule A: 60 years old non-smoker: A: 60 years old non-smoker: Pr (B 1 ) =.99, Pr (B 2 ) =.001, Pr (B 3 ) =.009 Pr (A) = =.001 x.99 +.9 x.001 +.9 x.009 =.00999 Pr (B 1 |A)=Pr(A|B 1 ) Pr (B 1 )/Pr (A) =.001 x.99/.00999 =.099 Pr (B 2 |A)=Pr (A|B 2 ) Pr (B 2 )/Pr (A)=.9 x.001 /.00999 =.090 Pr (B 3 |A)=Pr (A|B 3 ) Pr (B 3 )/Pr (A) =.9 x.009/.00999 =.811 With the conditional probabilities, we diagnose this person has sarcoidosis with probability p = 0.811.
32
Receiver Operating Characteristic (ROC) Curve NormalHypertensive Clinical criterion A Blood Pressure Test
33
Disease Prevalence and Incidence Prevalence = Pr (Having a disease at one time point) Prevalence = Pr (Having a disease at one time point) = total number of cases of disease at one time point / total number of population at the same time = total number of cases of disease at one time point / total number of population at the same time Prevalence is an instant rate, at one time point only. Cumulative incidence or incidence Cumulative incidence or incidence Incidence = Pr (developing a new case of disease over a period of time) It has a unit: per person-year or (person-year)-1 It takes the time of exposure into consideration, within one period of time, eg. 1 yr. Example. Example. Prevalence of hepatitis A is.01. How many cases of H.A are expected in a community of 1000 people? # cases = prevalence x population =.01 x 1000 = 10. Breast cancer incidence is 5 per 10 5 person-year. How many new cases are expected over next 5 years in a city with a population of 10 5 ? Breast cancer incidence is 5 per 10 5 person-year. How many new cases are expected over next 5 years in a city with a population of 10 5 ? # new cases = incidence x duration x population = 5 / (10 5 person-year) x 5 year x 10 5 people = 25 cases = 5 / (10 5 person-year) x 5 year x 10 5 people = 25 cases
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.