Presentation is loading. Please wait.

Presentation is loading. Please wait.

Primer on Statistics for Interventional Cardiologists Giuseppe Sangiorgi, MD Pierfrancesco Agostoni, MD Giuseppe Biondi-Zoccai, MD.

Similar presentations


Presentation on theme: "Primer on Statistics for Interventional Cardiologists Giuseppe Sangiorgi, MD Pierfrancesco Agostoni, MD Giuseppe Biondi-Zoccai, MD."— Presentation transcript:

1 Primer on Statistics for Interventional Cardiologists Giuseppe Sangiorgi, MD Pierfrancesco Agostoni, MD Giuseppe Biondi-Zoccai, MD

2 What you will learn Introduction Basics Descriptive statistics Probability distributions Inferential statistics Finding differences in mean between two groups Finding differences in mean between more than 2 groups Linear regression and correlation for bivariate analysis Analysis of categorical data (contingency tables) Analysis of time-to-event data (survival analysis) Advanced statistics at a glance Conclusions and take home messages

3 What you will learn Introduction Basics Descriptive statistics Probability distributions Inferential statistics Finding differences in mean between two groups Finding differences in mean between more than 2 groups Linear regression and correlation for bivariate analysis Analysis of categorical data (contingency tables) Analysis of time-to-event data (survival analysis) Advanced statistics at a glance Conclusions and take home messages

4 What you will learn Descriptive statistics –frequency distributions –contingency tables –measures of location: mean, median, mode –measures of dispersion: variance, standard deviation, range, interquartile range –coefficient of variation –graphical presentation: histogram, box-plot, scatter plot –correlation

5 What you will learn Descriptive statistics –frequency distributions –contingency tables –measures of location: mean, median, mode –measures of dispersion: variance, standard deviation, range, interquartile range –coefficient of variation –graphical presentation: histogram, box-plot, scatter plot –correlation

6 Cardiology Counting and displaying data After we have collected our data, we need to display them (tables, graphics and figures) Raw enumeration (eg lesion length by visual estimation in patients treated in Endeavor II trial: 14-27 mm) …

7 Cardiology example Tabular display

8 Cardiology example DELAYED RRISC, JACC 2007 Tabular display

9 Cardiology example DELAYED RRISC, JACC 2007 Tabular display

10 Variables nominalordinaldiscretecontinuous orderedcategories ranks counting measuring Types of variablesQUANTITYCATEGORY

11 Cardiology Variable type NominalOrdinalContinuous Patient IDDiabetesAHA/ACC Type Lesion Length 1YA18 2NB124 3NA17 4NC25 5YB223 6NA15 7NA16 8YB218 9NB121 10YB219 11NB114 12YC22 13NC27 Counting and displaying data Create a database!

12 Cardiology Frequency distribution A frequency distribution is a list of the values that a variable takes in a sample. It is usually a list, ordered by quantity, showing the number of times each value appears Diabetesn=13 Yes5 No8 AHA/ACC Type n=13 A4 B13 B23 C3

13 Cardiology Frequency distribution A frequency distribution is a list of the values that a variable takes in a sample. It is usually a list, ordered by quantity, showing the number of times each value appears Diabetesn=13 Yes538.5% No861.5% AHA/ACC Type n=13 A430.7% B1323.1% B2323.1% C3 This introduces the concept of percentage or rate

14 Cardiology Frequency distribution ENDEAVOR III, JACC 2006

15 Cardiology Frequency distribution This simple tabulation has drawbacks. When a variable can take continuous values instead of discrete values or when the number of possible values is too large, the table construction is cumbersome, if not impossible Lesion length n=13 1417.7% 1517.7% 1617.7% 1717.7% 18215.3% 1917.7% 2117.7% 2217.7% 2317.7% 2417.7% 2517.7% 2717.7%

16 Cardiology Frequency distribution A slightly different tabulation scheme based on the range of values can be a solution in such cases Lesion lengthn=13 14-20 mm753.8% 21-27 mm646.2% However better solutions are coming later…

17 What you will learn Descriptive statistics –frequency distributions –contingency tables –measures of location: mean, median, mode –measures of dispersion: variance, standard deviation, range, interquartile range –coefficient of variation –graphical presentation: histogram, box-plot, scatter plot –correlation

18 Cardiology Contingency tables are used to record and analyse the relationship between two (or more) variables, most usually categorical variables Counting and displaying data Diabetesn=13 Yes538.5% No861.5% AHA/ACC Type n=13 A430.7% B1323.1% B2323.1% C3

19 Cardiology Contingency tables are used to record and analyse the relationship between two (or more) variables, most usually categorical variables Counting and displaying data 33028 10315 433313 no yes DIABETES Total AB1B2C AHA/ACC type Total

20 Cardiology Contingency tables are used to record and analyse the relationship between two (or more) variables, most usually categorical variables Counting and displaying data Is there a difference between diabetics and non- dabetics in the rate of AHA/ACC type lesions? The answer will follow…

21 What you will learn Descriptive statistics –frequency distributions –contingency tables –measures of location: mean, median, mode –measures of dispersion: variance, standard deviation, range, interquartile range –coefficient of variation –graphical presentation: histogram, box-plot, scatter plot –correlation

22 Cardiology We need to describe the kind of values that we have (eg lesion length by visual estimation in patients treated in Endeavor II trial: 14-27 mm) Raw enumeration … Measures of central tendency: rationale

23 Cardiology Characteristics: -summarises information well -discards a lot of information (dispersion??) Assumptions: -data are not skewed –distorts the mean –outliers make the mean very different -Measured on measurement scale –cannot find mean of a categorical measure ‘average’ stent diameter may be meaningless Mean (arithmetic)

24 Cardiology Mean (arithmetic) Lesion length n=13 1417.7% 1517.7% 1617.7% 1717.7% 18215.3% 1917.7% 2117.7% 2217.7% 2317.7% 2417.7% 2517.7% 2717.7% 14+15+16+17+18+18+19+21+22+23+24+25+27 13 Mean = 19.92

25 Cardiology TAPAS, Lancet 2008 Mean (arithmetic)

26 Cardiology What is it? –The one in the middle –Place values in order –Median is central Definition: –Equally distant from all other values Used for: –Ordinal data –Skewed data / outliers Median

27 Cardiology Median Variable type Continuous Patient IDLesion Length 118 224 317 425 523 615 716 818 921 1019 1114 1222 1327

28 Cardiology Median Variable type Continuous Patient IDLesion Length 118 224 317 425 523 615 716 818 921 1019 1114 1222 1327 Variable type Continuous Patient IDLesion Length 1114 615 716 317 118 8 1019 921 1222 523 224 425 1327

29 Cardiology What is it? Definition: –The most common value Used (rarely) for: –Discrete non interval data –E.g. stent length, stent diameter………… –MicroDriver is only available in  2.25, 2.50, 2.75 reporting the mean  is meaningless Mode

30 Cardiology Mode Variable type Continuous Patient IDLesion Length 118 224 317 425 523 615 716 818 921 1019 1114 1222 1327 Lesion length n=13 1417.7% 1517.7% 1617.7% 1717.7% 18215.3% 1917.7% 2117.7% 2217.7% 2317.7% 2417.7% 2517.7% 2717.7%

31 Cardiology Mean is usually best –If it works –Useful properties (with standard deviation [SD]) –But… Driver Endeavor 1721 1921 1921 1721 186 Mean18 Median1821 Lesion length Comparing Measures of central tendency

32 Cardiology It also depends on the underlying distribution… Symmetric?mean = median = mode Comparing Measures of central tendency Value Frequency

33 Cardiology It also depends on the underlying distribution… Asymmetric?mean ≠ median ≠ mode 0 5 10 15 20 25 30 0123456789 Number of Endeavor implanted per patient Frequency Mode Mode Median Median Mean Mean Comparing Measures of central tendency

34 Cardiology Agostoni et al, AJC 2007 Median

35 What you will learn Descriptive statistics –frequency distributions –contingency tables –measures of location: mean, median, mode –measures of dispersion: variance, standard deviation, range, interquartile range –coefficient of variation –graphical presentation: histogram, box-plot, scatter plot –correlation

36 Cardiology Central tendency doesn’t tell us everything –We need to know about the spread, or dispersion of the scores Is there a difference? And if yes, how big is it? We can only tell if we know data dispersion Group Late loss(mm) Endeavor0.61 Driver1.03 Measures of dispersion: rationale ENDEAVOR II, Circulation 2006

37 Cardiology 00.300.600.901.201.50 Late loss Frequency Driver Endeavor Measures of dispersion: examples

38 Cardiology 00.300.600.901.201.50 Late loss Frequency Driver Endeavor Measures of dispersion: examples

39 Cardiology 00.300.600.901.201.50 Late loss Frequency Driver Endeavor Measures of dispersion: examples

40 Cardiology Gaussian, normal or “parametric” distribution Shape of distribution

41 Cardiology Non-normal, right-skewed Departing from normality

42 Cardiology Non-normal, left-skewed Value Frequency Departing from normality

43 Cardiology 20 10 0 Frequency Value Departing from normality Outliers

44 Cardiology Standard deviation (SD) –Used with mean –Parametric tests Range –First to last value –Not commonly used Interquartile range –Used with median –25% (1/4) to 75% (3/4) percentile –Non-parametric tests Measures of dispersion: types

45 Cardiology Standard deviation (SD): –approximates population σ as N increases Advantages: –with mean enables powerful synthesis mean±1*SD 68% of data mean±2*SD 95% of data (1.96) mean±3*SD 99% of data (2.86) Disadvantages: –is based on normal assumptions 1 )( 2 - -   N xx SD Standard deviation Variance

46 Cardiology 1 )( 2 - -   N xx SD Standard deviation Variable type Continuous Patient IDLesion Length 118 224 317 425 523 615 716 818 921 1019 1114 1222 1327 Mean19.92 (18-19.92) 2 +(24-19.92) 2 +(17-19.92) 2 +…+(27-19.92) 2 12 Variance = 16.58 SD = √16.58 = 4.07

47 Cardiology -1 SD mean +1 SD Frequency 68% Mean ± Standard deviation

48 Cardiology -1 SD+1 SD-2 SD+2 SD 95% mean Frequency Mean ± Standard deviation

49 Cardiology -1 SD+1 SD-2 SD+2 SD 99% -3 SD+3 SD mean Frequency Mean ± Standard deviation

50 Cardiology TAPAS, Lancet 2008 Standard deviation

51 Cardiology TAPAS, NEJM 2008 Standard deviation

52 Cardiology TAPAS, NEJM 2008 Why not mean ± SD?

53 Cardiology Rules of thumb 1.Refer to previous data or analyses (eg landmark articles, large databases) 2.Inspect tables and graphs (eg outliers, histograms) 3.Check rough equality of mean, median, mode 4.Perform ad hoc statistical tests Levene’s test for equality of means Kolmogodorov-Smirnov tests … Testing normality assumptions

54 Cardiology Range Lesion length n=13 1417.7% 1517.7% 1617.7% 1717.7% 18215.3% 1917.7% 2117.7% 2217.7% 2317.7% 2417.7% 2517.7% 2717.7% First to last value Range = 14 – 27 or Range = 13

55 Cardiology Range RRISC, JACC 2006

56 Cardiology Interquartile range Variable type Continuous Patient IDLesion Length 1114 615 716 317 118 8 1019 921 1222 523 224 425 1327 16.5 23.5 25% to 75% percentile or 1° to 3° quartile Median Interquartile Range = 16.5 – 23.5

57 Cardiology Agostoni et al, AJC 2007 Interquartile range

58 Cardiology

59

60 Reporting data If parametric: Mean and Standard Deviation Mean ± SD Mean (SD) Age (y): 63 ± 13 Age (y): 63 (13) If non-parametric: Median and InterQuartile Range Median [IQR] NIH vol (mm 3 ): 1.3 [0–13.1] Mode and Range less commonly used

61 What you will learn Descriptive statistics –frequency distributions –contingency tables –measures of location: mean, median, mode –measures of dispersion: variance, standard deviation, range, interquartile range –coefficient of variation –graphical presentation: histogram, box-plot, scatter plot –correlation

62 Coefficient of Variation The coefficient of variation (CV) is a normalized measure of dispersion of a probability distribution. It is defined as the ratio of the standard deviation to the mean This is only defined for non-zero mean, and is most useful for variables that are always positive. The coefficient of variation should only be computed for continuous data A given standard deviation indicates a high or low degree of variability only in relation to the mean value It is easier to get an idea of variability in a distribution by dividing the standard deviation with the mean

63 Coefficient of Variation Advantages The CV is a dimensionless number The CV is particularly useful when comparing dispersion in datasets with: markedly different means or, different units of measurement Distributions with CV 1 are considered high-variance Disadvantages When the mean is near zero, the CV is sensitive to small changes in the mean, limiting its usefulness Unlike the standard deviation, it cannot be used to construct confidence intervals for the mean

64 What you will learn Descriptive statistics –frequency distributions –contingency tables –measures of location: mean, median, mode –measures of dispersion: variance, standard deviation, range, interquartile range –coefficient of variation –graphical presentation: histogram, box-plot, scatter plot –correlation

65 Cardiology Histograms no yes ENDEAVOR II, Circulation 2006 Very good for categorical variables

66 Cardiology Histograms Not so good for continuous variables, but…

67 Cardiology example both restenotic and non-restenotic SES Agostoni et al, AJC 2007 Histograms

68 Cardiology example non-restenotic SES Agostoni et al, AJC 2007 shape of distribution Shape of distributions

69 Cardiology Box (& whiskers) plots

70 Cardiology Box (& whiskers) plots Median (Q2) Interquartile range Max (Q4) or Q3+1.5(IQR) Q1 Q3 Min (Q0) or Q1-1.5(IQR)

71 Cardiology Box (& whiskers) plots Margheri, Biondi Zoccai, et al, AJC 2008

72 Cardiology Scatter plots A scatter plot is a type of display using Cartesian coordinates to display values for two variables for a set of data. The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis Usually it is done with 2 continuous variables to visually assess the degree of correlation between them But it can be also used with one categorical variable and one continuous variable (mainly if sample size is small)

73 Cardiology Scatter plots Abbate, Biondi Zoccai, et al, Circulation 2002

74 Cardiology Scatter plots Mintz, et al, AJC 2005

75 Cardiology Agostoni, et al, IJC 2007 Scatter plots

76 Thank you for your attention For any correspondence: gbiondizoccai@gmail.com For further slides on these topics feel free to visit the metcardio.org website: http://www.metcardio.org/slides.html gbiondizoccai@gmail.com http://www.metcardio.org/slides.html


Download ppt "Primer on Statistics for Interventional Cardiologists Giuseppe Sangiorgi, MD Pierfrancesco Agostoni, MD Giuseppe Biondi-Zoccai, MD."

Similar presentations


Ads by Google