Advanced Statistics for Interventional Cardiologists
What you will learn Introduction Basics of multivariable statistical modeling Advanced linear regression methods Hands-on session: linear regression Bayesian methods Logistic regression and generalized linear model Resampling methods Meta-analysis Hands-on session: logistic regression and meta-analysis Multifactor analysis of variance Cox proportional hazards analysis Hands-on session: Cox proportional hazard analysis Propensity analysis Most popular statistical packages Conclusions and take home messages 1 st day 2 nd day
Focus on different programs Complex and all-purpose statistical packages are not always necessary for statistical analyses: R, S, S-Plus, SAS, SPSS, Stata, Statistica, WinBUGS … Leaner and less expensive programs can sometimes be effective and available (eg 30-day free trials): StatsDirect, Jmp, Minitab, StatXact, LogXact, … However, if you wish to become more competent in advanced statistical analysis for clinical cardiovascular research, the best choice is to progressively familiarize yourself with one or two complex and all-purpose statistical packages
R R is a programming language and software environment for statistical computing and graphics, and it is an implementation of the S programming language with lexical scoping semantics. R is widely used for statistical software development and data analysis. Its source code is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems. R uses a command line interface, though several graphical user interfaces are available. Pro: flexibility and programming capabilities (eg for bootstrap), sophisticated graphical capabilities. Cons: complex and user-unfriendly interface. Price: free.
S and S-Plus S-PLUS is a commercial package sold by TIBCO Software Inc. with a focus on exploratory data analysis, graphics and statistical modeling It is an implementation of the S programming language. It features object-oriented programming capabilities and advanced analytical algorithms (eg for robust regression, repeated measurements, …) Pros: flexibility and programming capabilities (eg for bootstrap), user-friendly graphical user interface Cons: complex matrix programming environment Price: €€€€-€€.
S and S-Plus Regression with S-Plus Menu Programming Call lm object to model stack.loss as a linear function of three predictors: > stack.lm <- lm(stack.loss ~ Air.Flow + Water.Temp + Acid.Conc.)
SAS SAS (originally Statistical Analysis System, 1968) is an integrated suite of platform independent software modules provided by SAS Institute (1976, Jim Goodnight and Co). The functionality of the system is very complete and built around four major tasks: data access, data management, data analysis and data presentation. Applications of the SAS system include: statistical analysis, data mining, forecasting; report writing and graphics; operations research and quality improvement; applications development; data warehousing (extract, transform, load). Pros: very complete tool for data analysis, flexibility and programming capabilities (eg for Bayesian, bootstrap, conditional, or meta-analyses), large volumes of data Cons: complex programming environment, labyrinth of modules and interfaces, very expensive Price: €€€€-€€€€
SAS Analyst Application
SAS Enterprise Guide
SAS Enterprise Miner Predictive Modeling Workbench
SAS Programming ANCOVA model … proc mixed data=name; class gregion trtl; model varY = gregion trtl varX / ddfm=kr; lsmeans trtl / pdiff=all; estimate 'Difference CZP' trtl -1 1 / cl alpha=0.05; run; …. Frequency table … proc freq data=AAA(where = (charge > 100)); table charge; run; …
JMP Statistical Discovery Software JMP is a software package that was first developed by John Sall, co- founder of SAS, to perform simple and complex statistical analyses. It dynamically links statistics with graphics to interactively explore, understand, and visualize data. This allows you to click on any point in a graph, and see the corresponding data point highlighted in the data table, and other graphs. JMP provides a comprehensive set of statistical tools as well as design of experiments and statistical quality control in a single package. JMP allows for custom programming and script development via JSL, originally know as "John's Scripting Language“. An add-on JMP Genomics comes with over 100 analytic procedures to facilitate the treatment of data involving genetics, microarrays or proteomics. Pros: very intuitive, lean package for design and analysis in research Cons: less complete and less flexible than the complete SAS system Price: €€€€.
JMP screenshots
Statistica STATISTICA is a powerful statistics and analytics software package developed by StatSoft, Inc. Provides a wide selection of data analysis, data management, data mining, and data visualization procedures. Features of the software include basic and multivariate statistical analysis, quality control modules and a collection of data mining techniques. Pros: extensive range of methods, user-friendly graphical interface, has been called “the king of graphics” Cons: limited flexibility and programming capabilities, labyrinth Price: €€€€.
SPSS SPSS (originally, Statistical Package for the Social Sciences) is a computer program used for statistical analysis released in its first version in 1968 and now distributed by SPSS Inc. SPSS is among the most widely used programs for statistical analysis in social science. It is used by market researchers, health researchers, survey companies, government, education researchers, marketing organizations and others. Pros: extensive range of tests and procedures, user- friendly graphical interface. Cons: limited flexibility and programming capabilities. Price: €€€€.
Stata Stata (name formed by blending "statistics" and "data“) is a general-purpose statistical software package created in 1985 by StataCorp. Stata's full range of capabilities includes: data management, statistical analysis, graphics generation, simulations, custom programming. Most meta-analyses tools were first developed for Stata, and thus this package offers one of the most extensive library of statistical tools for systematic reviewers Pros: flexibility and programming capabilities (eg for bootstrap, or meta-analyses), sophisticated graphical capabilities Cons: relatively complex interface Price: €€€€-€€€€
WinBUGS and OpenBUGS WinBUGS (Windows-based Bayesian inference Using Gibbs Sampling) is a statistical software for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo (MCMC) methods, developed by the MRC Biostatistics Unit, at the University of Cambridge, UK. It is based on the BUGS (Bayesian inference Using Gibbs Sampling) project started in OpenBUGS is the open source variant of WinBUGS. Pros: flexibility and programming capabilities Cons: complex interface Price: free
And many more … Extensive overview of functionality and cost of statistical software packages:
And now, we’re almost ready to go…
For further slides on these topics please feel free to visit the metcardio.org website: