Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cohort and longitudinal studies: statistics

Similar presentations


Presentation on theme: "Cohort and longitudinal studies: statistics"— Presentation transcript:

1 Cohort and longitudinal studies: statistics
Kath Bennett 25th June 2007

2 Outline Statistical methods Prevalence, Incidence
Rates, standardised rates, case fatality rate Relative rates/risks Attributable risk and Attributable risk fraction Regression and beta coefficients

3 Measuring disease/drug frequency
Prevalence is the proportion of a population that are cases at one time = no of existing cases with disease/on drug Total population in same time period e.g birth defect prevalence rate = no of births with given abnormality Total number of live births

4 Measuring disease/drug frequency
Incidence is the rate at which new cases occur in a population during a specified period =Number of new cases/starting on therapy in period Population at risk in given period e.g. Bacteremia among OC users in study of 482 using OC, 27 developed bacteremia Incidence= (27/482)*100=5.6%

5 Inter-relation between incidence and prevalence
Prevalence = incidence* average duration

6 Rates commonly used in epidemiology
Crude rates For entire population eg. Total deaths/population Category specific (e.g. age-specific, gender-specific) Eg. Cancer rates by age-category, 35-44, 45-54 Age-adjusted or standardised rates Allows for appropriate comparison when differing populations being studies. Reduces distortions in comparison due to age differences in populations.

7 Rates Used to quantify the risk in a population.
Numerator is number of events Denominator is Population at risk Person-years at risk or exposure Usually presented as per 1000 persons or 1000 person-years at risk.

8 Rates Provided the disease/event occurs randomly and independently, reasonable to assume follow Poisson distribution. Estimated standard error of rate SE (rate)=number of events / pop at risk Approx 95% CI for number events (E) EL=(1.96/2 - E)2 EU=(1.96/2 + (E+1))2 Approx 95% CI for rate Lower CI= EL/Pop, Upper CI= EU/Pop

9 Rates The number of deaths from CHD in Scottish men in 1995 was 1080, total population of men Rate of CHD deaths per 1000 (1080/612955)*1000 = 1.76 95% CI for number of deaths is EL=(1.96/2 - 1080)2 = EU=(1.96/2 + 1081)2 = 95% CI for rates is / , /612955 (1.66, 1.87)

10 Case fatality rate Usually expressed as the percentage of persons diagnosed as having a specified disease who die as a result of that illness within a given period. Sometimes 30-day or 1 year case-fatality rates quoted. The case-fatality rate is not the same as mortality rate. Is also referred to as fatality rate or case-fatality ratio.

11 Relative rate Relative rate is ratio of 2 rates
i.e. (rate group 1)/(rate group 2) Also known as rate ratio Example, rate for CHD in men calculated as 1.76 per 1000 in 1995, and per 1000 in women. Relative rate =1.76/0.483 =3.65

12 Mortality from lung cancer by 5-year age-bands

13 Standardised rates Direct method
Requires a standard population to which age-specific rates are applied Multiply age-specific rates in pops A &B by standard population then compare Indirect method If age-specific rates for populations not possible Based on applying age-specific rates for standard population to population of interest to determine ‘expected’ Standardised Mortality Ratio (SMR) = Total Observed deaths / Total Expected

14 Direct standardisation – raw data

15 Direct standardisation

16 Indirect method

17 Indirect method Observed 1781 deaths SMR = 1781/2032.5 = 0.876
SMR>1 more deaths than expected SMR<1 less deaths than expected Sometimes multiplied by 100.

18 Examples of cohort studies
Framingham study. Cohort of 5,209 men and women in 1948 between the ages of 30 and 62 from the town of Framingham, Massachusetts, to examine CVD development, returning every two years. British doctors study. Cohort of male doctors started in 1951 to examine smoking and related mortality (Doll and Hill). Million Women Study. Cohort of 1 million women asked about HRT use and followed up for cancer incidence.

19 Data from a cohort study are expressed as:
Follow-up to see whether: All persons free of disease Persons who develop disease over time Persons who do not develop disease over time TOTAL Exposed to risk factor a b a+b Not exposed to risk factor c d c+d a+c b+d a+b+c+d Incidence in exposed (IE)=a/a+b Incidence in unexposed (IO)=c/c+d

20 Data from cohort studies are analyzed in terms of:
1. Relative risk (RR)= Incidence rate in the exposed group (IE)    Incidence rate in the non-exposed group (IO) Relative risks significantly higher than 1 imply that the factor under study is associated with an increased risk of disease Relative risks significantly lower than 1 imply that the factor is  associated with a decreased risk of disease. The magnitude of the relative risk indicates the strength of the association.

21 Attributable Risk Attributable risk implies that not all of the disease incidence is due to the exposure, as some nonexposed individuals may develop the disease. IO (Incidence in nonexposed group) =  "background incidence" IE (Incidence in exposed group) = ”background incidence”+ Incidence due to the exposure  Therefore, the incidence in the exposed group which is attributable to the exposure can be calculated by subtracting: Attributable risk (AR): AR = IE – IO

22 Attributable risk percent:
The percentage of the total incidence in the exposed group which is attributable to the exposure can be calculated by: AR% = IE - IO x 100 IE Short cut also = (RR-1)/RR x 100

23 The Population is not all exposed
How many cases per 1000 population are attributable to the exposure? Do we know the Incidence in the total population (It)? It =Ie x Pe+Io x (1-Pe) Pe is the Prevalence of the exposure Population Attributable Risk PAR=It-Io Short cut = AR X Pe

24 The Population What proportion of the risk in the total population is attributable? Population Attributable Risk %,PAR%, PAF The Denominator is the risk in the total population (It). The numerator is the “extra” risk (PAR) which is It -Io PAR% =(It -Io)/ It PAR% = Pe (RR-1)/[Pe (RR-1)+1]

25 Example: Cohort study of smoking and coronary heart disease
Cohort of initially healthy people Develop CHD Don't Develop CHD Total Smokers Non-smokers Incidence in Smokers = 84/3000 = 28.0 per 1,000/year Incidence in Non-Smokers = 87/5000 = 17.4 per 1,000/year RR = Incidence in Smokers = 28.0/1000/yr = 1.6 Incidence in Non-Smokers 17.4/1000/yr

26 Example: Attributable risk
AR (incidence in exposed group attributable to the exposure) = Incidence in Smokers - Incidence in Non-Smokers = = 10.6/1000/year  AR% (% total incidence in exposed grp attributable to exposure) =Incidence in Smokers - Incidence in Non-Smokers x 100 Incidence in Smokers =( ) x 100 = 10.6 x 100  = 37.9% 37.9% of the morbidity from CHD among smokers may be attributable to smoking.

27 Example: Population AR (PAR)
Population Attributable Risk PAR= It –IO = AR X Pe Suppose prevalence of smoking in the population is 40% We can calculate It =Ie*Pe+Io(1-Pe)= 21.6/1000 PAR = 21.6/1000 – 17.4/1000 =4.2/1000 Also = AR X Pe =(10.6/1000)X 0.4 =4.2/1000 Population Attributable Risk % PAR% =[(It-Io)/It] = ( )/21.6 = 19.4% Or PAR%=( Pe(RR-1) / [Pe(RR-1)+1] =0.4*(1.6-1)/[0.4*(1.6-1)+1]=19.4%

28 Methods to account for varying lengths of follow-up
Since participants may enter or leave the study at various times due to death, emmigration or loss to follow-up, the time of observation is usually not uniform. This is accounted for by main methods: 1. Person - years of observation 2. Life - table method 3. Survival analysis, Cox proportional hazards

29 Regression

30 Is the linear relationship reasonable?

31 Regression In broad terms, regression can be thought of as a statistical model which is used to help us get the ‘best guess’. Formally, we assume that there is an underlying linear relationship between the variables, and our observations lie scattered about that line. The actual value of ‘Y’ is ‘scattered’ about the expected or predicted value of Y.

32 Linear Model Illustrated
Line is ‘best’ fit in statistical sense. Scatter is called residual variability.

33 Regression equations The equation of a line can be written as
E[Y] = a + bX. a = intercept (where line cuts Y axis) b = slope of the line (can be +ve or –ve) Software fits the ‘best guess’ for a and b. Regression Analysis: Leaving versus Mock The regression equation is Leaving = Mock Predictor Coef SE T P Constant Mock

34 More questions Is there an effect of gender?
That is can one consider different lines for red and blue? Should these lines be parallel or not? What about ‘adjusting’ for other covariates such as age or ‘intelligence’? More than one variable = multiple linear regression


Download ppt "Cohort and longitudinal studies: statistics"

Similar presentations


Ads by Google