Download presentation
Presentation is loading. Please wait.
Published byLee Perry Modified over 9 years ago
1
Primer on Statistics for Interventional Cardiologists Giuseppe Sangiorgi, MD Pierfrancesco Agostoni, MD Giuseppe Biondi-Zoccai, MD
2
What you will learn Introduction Basics Descriptive statistics Probability distributions Inferential statistics Finding differences in mean between two groups Finding differences in mean between more than 2 groups Linear regression and correlation for bivariate analysis Analysis of categorical data (contingency tables) Analysis of time-to-event data (survival analysis) Advanced statistics at a glance Conclusions and take home messages
3
What you will learn Introduction Basics Descriptive statistics Probability distributions Inferential statistics Finding differences in mean between two groups Finding differences in mean between more than 2 groups Linear regression and correlation for bivariate analysis Analysis of categorical data (contingency tables) Analysis of time-to-event data (survival analysis) Advanced statistics at a glance Conclusions and take home messages
4
What you will learn Descriptive statistics –frequency distributions –contingency tables –measures of location: mean, median, mode –measures of dispersion: variance, standard deviation, range, interquartile range –coefficient of variation –graphical presentation: histogram, box-plot, scatter plot –correlation
5
What you will learn Descriptive statistics –frequency distributions –contingency tables –measures of location: mean, median, mode –measures of dispersion: variance, standard deviation, range, interquartile range –coefficient of variation –graphical presentation: histogram, box-plot, scatter plot –correlation
6
Cardiology Counting and displaying data After we have collected our data, we need to display them (tables, graphics and figures) Raw enumeration (eg lesion length by visual estimation in patients treated in Endeavor II trial: 14-27 mm) …
7
Cardiology example Tabular display
8
Cardiology example DELAYED RRISC, JACC 2007 Tabular display
9
Cardiology example DELAYED RRISC, JACC 2007 Tabular display
10
Variables nominalordinaldiscretecontinuous orderedcategories ranks counting measuring Types of variablesQUANTITYCATEGORY
11
Cardiology Variable type NominalOrdinalContinuous Patient IDDiabetesAHA/ACC Type Lesion Length 1YA18 2NB124 3NA17 4NC25 5YB223 6NA15 7NA16 8YB218 9NB121 10YB219 11NB114 12YC22 13NC27 Counting and displaying data Create a database!
12
Cardiology Frequency distribution A frequency distribution is a list of the values that a variable takes in a sample. It is usually a list, ordered by quantity, showing the number of times each value appears Diabetesn=13 Yes5 No8 AHA/ACC Type n=13 A4 B13 B23 C3
13
Cardiology Frequency distribution A frequency distribution is a list of the values that a variable takes in a sample. It is usually a list, ordered by quantity, showing the number of times each value appears Diabetesn=13 Yes538.5% No861.5% AHA/ACC Type n=13 A430.7% B1323.1% B2323.1% C3 This introduces the concept of percentage or rate
14
Cardiology Frequency distribution ENDEAVOR III, JACC 2006
15
Cardiology Frequency distribution This simple tabulation has drawbacks. When a variable can take continuous values instead of discrete values or when the number of possible values is too large, the table construction is cumbersome, if not impossible Lesion length n=13 1417.7% 1517.7% 1617.7% 1717.7% 18215.3% 1917.7% 2117.7% 2217.7% 2317.7% 2417.7% 2517.7% 2717.7%
16
Cardiology Frequency distribution A slightly different tabulation scheme based on the range of values can be a solution in such cases Lesion lengthn=13 14-20 mm753.8% 21-27 mm646.2% However better solutions are coming later…
17
What you will learn Descriptive statistics –frequency distributions –contingency tables –measures of location: mean, median, mode –measures of dispersion: variance, standard deviation, range, interquartile range –coefficient of variation –graphical presentation: histogram, box-plot, scatter plot –correlation
18
Cardiology Contingency tables are used to record and analyse the relationship between two (or more) variables, most usually categorical variables Counting and displaying data Diabetesn=13 Yes538.5% No861.5% AHA/ACC Type n=13 A430.7% B1323.1% B2323.1% C3
19
Cardiology Contingency tables are used to record and analyse the relationship between two (or more) variables, most usually categorical variables Counting and displaying data 33028 10315 433313 no yes DIABETES Total AB1B2C AHA/ACC type Total
20
Cardiology Contingency tables are used to record and analyse the relationship between two (or more) variables, most usually categorical variables Counting and displaying data Is there a difference between diabetics and non- dabetics in the rate of AHA/ACC type lesions? The answer will follow…
21
What you will learn Descriptive statistics –frequency distributions –contingency tables –measures of location: mean, median, mode –measures of dispersion: variance, standard deviation, range, interquartile range –coefficient of variation –graphical presentation: histogram, box-plot, scatter plot –correlation
22
Cardiology We need to describe the kind of values that we have (eg lesion length by visual estimation in patients treated in Endeavor II trial: 14-27 mm) Raw enumeration … Measures of central tendency: rationale
23
Cardiology Characteristics: -summarises information well -discards a lot of information (dispersion??) Assumptions: -data are not skewed –distorts the mean –outliers make the mean very different -Measured on measurement scale –cannot find mean of a categorical measure ‘average’ stent diameter may be meaningless Mean (arithmetic)
24
Cardiology Mean (arithmetic) Lesion length n=13 1417.7% 1517.7% 1617.7% 1717.7% 18215.3% 1917.7% 2117.7% 2217.7% 2317.7% 2417.7% 2517.7% 2717.7% 14+15+16+17+18+18+19+21+22+23+24+25+27 13 Mean = 19.92
25
Cardiology TAPAS, Lancet 2008 Mean (arithmetic)
26
Cardiology What is it? –The one in the middle –Place values in order –Median is central Definition: –Equally distant from all other values Used for: –Ordinal data –Skewed data / outliers Median
27
Cardiology Median Variable type Continuous Patient IDLesion Length 118 224 317 425 523 615 716 818 921 1019 1114 1222 1327
28
Cardiology Median Variable type Continuous Patient IDLesion Length 118 224 317 425 523 615 716 818 921 1019 1114 1222 1327 Variable type Continuous Patient IDLesion Length 1114 615 716 317 118 8 1019 921 1222 523 224 425 1327
29
Cardiology What is it? Definition: –The most common value Used (rarely) for: –Discrete non interval data –E.g. stent length, stent diameter………… –MicroDriver is only available in 2.25, 2.50, 2.75 reporting the mean is meaningless Mode
30
Cardiology Mode Variable type Continuous Patient IDLesion Length 118 224 317 425 523 615 716 818 921 1019 1114 1222 1327 Lesion length n=13 1417.7% 1517.7% 1617.7% 1717.7% 18215.3% 1917.7% 2117.7% 2217.7% 2317.7% 2417.7% 2517.7% 2717.7%
31
Cardiology Mean is usually best –If it works –Useful properties (with standard deviation [SD]) –But… Driver Endeavor 1721 1921 1921 1721 186 Mean18 Median1821 Lesion length Comparing Measures of central tendency
32
Cardiology It also depends on the underlying distribution… Symmetric?mean = median = mode Comparing Measures of central tendency Value Frequency
33
Cardiology It also depends on the underlying distribution… Asymmetric?mean ≠ median ≠ mode 0 5 10 15 20 25 30 0123456789 Number of Endeavor implanted per patient Frequency Mode Mode Median Median Mean Mean Comparing Measures of central tendency
34
Cardiology Agostoni et al, AJC 2007 Median
35
What you will learn Descriptive statistics –frequency distributions –contingency tables –measures of location: mean, median, mode –measures of dispersion: variance, standard deviation, range, interquartile range –coefficient of variation –graphical presentation: histogram, box-plot, scatter plot –correlation
36
Cardiology Central tendency doesn’t tell us everything –We need to know about the spread, or dispersion of the scores Is there a difference? And if yes, how big is it? We can only tell if we know data dispersion Group Late loss(mm) Endeavor0.61 Driver1.03 Measures of dispersion: rationale ENDEAVOR II, Circulation 2006
37
Cardiology 00.300.600.901.201.50 Late loss Frequency Driver Endeavor Measures of dispersion: examples
38
Cardiology 00.300.600.901.201.50 Late loss Frequency Driver Endeavor Measures of dispersion: examples
39
Cardiology 00.300.600.901.201.50 Late loss Frequency Driver Endeavor Measures of dispersion: examples
40
Cardiology Gaussian, normal or “parametric” distribution Shape of distribution
41
Cardiology Non-normal, right-skewed Departing from normality
42
Cardiology Non-normal, left-skewed Value Frequency Departing from normality
43
Cardiology 20 10 0 Frequency Value Departing from normality Outliers
44
Cardiology Standard deviation (SD) –Used with mean –Parametric tests Range –First to last value –Not commonly used Interquartile range –Used with median –25% (1/4) to 75% (3/4) percentile –Non-parametric tests Measures of dispersion: types
45
Cardiology Standard deviation (SD): –approximates population σ as N increases Advantages: –with mean enables powerful synthesis mean±1*SD 68% of data mean±2*SD 95% of data (1.96) mean±3*SD 99% of data (2.86) Disadvantages: –is based on normal assumptions 1 )( 2 - - N xx SD Standard deviation Variance
46
Cardiology 1 )( 2 - - N xx SD Standard deviation Variable type Continuous Patient IDLesion Length 118 224 317 425 523 615 716 818 921 1019 1114 1222 1327 Mean19.92 (18-19.92) 2 +(24-19.92) 2 +(17-19.92) 2 +…+(27-19.92) 2 12 Variance = 16.58 SD = √16.58 = 4.07
47
Cardiology -1 SD mean +1 SD Frequency 68% Mean ± Standard deviation
48
Cardiology -1 SD+1 SD-2 SD+2 SD 95% mean Frequency Mean ± Standard deviation
49
Cardiology -1 SD+1 SD-2 SD+2 SD 99% -3 SD+3 SD mean Frequency Mean ± Standard deviation
50
Cardiology TAPAS, Lancet 2008 Standard deviation
51
Cardiology TAPAS, NEJM 2008 Standard deviation
52
Cardiology TAPAS, NEJM 2008 Why not mean ± SD?
53
Cardiology Rules of thumb 1.Refer to previous data or analyses (eg landmark articles, large databases) 2.Inspect tables and graphs (eg outliers, histograms) 3.Check rough equality of mean, median, mode 4.Perform ad hoc statistical tests Levene’s test for equality of means Kolmogodorov-Smirnov tests … Testing normality assumptions
54
Cardiology Range Lesion length n=13 1417.7% 1517.7% 1617.7% 1717.7% 18215.3% 1917.7% 2117.7% 2217.7% 2317.7% 2417.7% 2517.7% 2717.7% First to last value Range = 14 – 27 or Range = 13
55
Cardiology Range RRISC, JACC 2006
56
Cardiology Interquartile range Variable type Continuous Patient IDLesion Length 1114 615 716 317 118 8 1019 921 1222 523 224 425 1327 16.5 23.5 25% to 75% percentile or 1° to 3° quartile Median Interquartile Range = 16.5 – 23.5
57
Cardiology Agostoni et al, AJC 2007 Interquartile range
58
Cardiology
60
Reporting data If parametric: Mean and Standard Deviation Mean ± SD Mean (SD) Age (y): 63 ± 13 Age (y): 63 (13) If non-parametric: Median and InterQuartile Range Median [IQR] NIH vol (mm 3 ): 1.3 [0–13.1] Mode and Range less commonly used
61
What you will learn Descriptive statistics –frequency distributions –contingency tables –measures of location: mean, median, mode –measures of dispersion: variance, standard deviation, range, interquartile range –coefficient of variation –graphical presentation: histogram, box-plot, scatter plot –correlation
62
Coefficient of Variation The coefficient of variation (CV) is a normalized measure of dispersion of a probability distribution. It is defined as the ratio of the standard deviation to the mean This is only defined for non-zero mean, and is most useful for variables that are always positive. The coefficient of variation should only be computed for continuous data A given standard deviation indicates a high or low degree of variability only in relation to the mean value It is easier to get an idea of variability in a distribution by dividing the standard deviation with the mean
63
Coefficient of Variation Advantages The CV is a dimensionless number The CV is particularly useful when comparing dispersion in datasets with: markedly different means or, different units of measurement Distributions with CV 1 are considered high-variance Disadvantages When the mean is near zero, the CV is sensitive to small changes in the mean, limiting its usefulness Unlike the standard deviation, it cannot be used to construct confidence intervals for the mean
64
What you will learn Descriptive statistics –frequency distributions –contingency tables –measures of location: mean, median, mode –measures of dispersion: variance, standard deviation, range, interquartile range –coefficient of variation –graphical presentation: histogram, box-plot, scatter plot –correlation
65
Cardiology Histograms no yes ENDEAVOR II, Circulation 2006 Very good for categorical variables
66
Cardiology Histograms Not so good for continuous variables, but…
67
Cardiology example both restenotic and non-restenotic SES Agostoni et al, AJC 2007 Histograms
68
Cardiology example non-restenotic SES Agostoni et al, AJC 2007 shape of distribution Shape of distributions
69
Cardiology Box (& whiskers) plots
70
Cardiology Box (& whiskers) plots Median (Q2) Interquartile range Max (Q4) or Q3+1.5(IQR) Q1 Q3 Min (Q0) or Q1-1.5(IQR)
71
Cardiology Box (& whiskers) plots Margheri, Biondi Zoccai, et al, AJC 2008
72
Cardiology Scatter plots A scatter plot is a type of display using Cartesian coordinates to display values for two variables for a set of data. The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis Usually it is done with 2 continuous variables to visually assess the degree of correlation between them But it can be also used with one categorical variable and one continuous variable (mainly if sample size is small)
73
Cardiology Scatter plots Abbate, Biondi Zoccai, et al, Circulation 2002
74
Cardiology Scatter plots Mintz, et al, AJC 2005
75
Cardiology Agostoni, et al, IJC 2007 Scatter plots
76
Thank you for your attention For any correspondence: gbiondizoccai@gmail.com For further slides on these topics feel free to visit the metcardio.org website: http://www.metcardio.org/slides.html gbiondizoccai@gmail.com http://www.metcardio.org/slides.html
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.