Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007.

Slides:



Advertisements
Similar presentations
Making Life Easy Using Epi Info: An Introduction Ali Rowhani-Rahbar, MD, MPH, PhD Postdoctoral Scholar Pediatric Infectious Diseases.
Advertisements

Basics of Biostatistics for Health Research Session 2 – February 14 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Using Excel Biostatistics 212 Lecture 4. Housekeeping Questions about Lab 3? –replace vs. recode Final Project Dataset! –“Housekeeping” commands vs. data.
Computing for Research I Spring 2013 Primary Instructor: Elizabeth Garrett-Mayer Regression Using Stata February 19.
9. Weighting and Weighted Standard Errors. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Introduction to Research Design Statlab Workshop, Fall 2010 Jeremy Green Nancy Hite.
Teaching Statistics Using Stata Software Susan Hailpern BSN MPH MS Department of Epidemiology and Population Health Albert Einstein College of Medicine.
R Mohammed Wahaj. What is R R is a programming language which is geared towards using a statistical approach and graphics Statisticians and data miners.
Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) School of Social Sciences (SSS) Jawaharlal Nehru University (JNU) New Delhi -
Welcome to E-Prime E-Prime refers to the Experimenter’s Prime (best) development studio for the creation of computerized behavioral research. E-Prime is.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Course director: Mark Pletcher Teaching Assistant: Lee Zane.
Spreadsheets With Microsoft Excel ® as an example.
A Simple Guide to Using SPSS© for Windows
Copyright 2003 The McGraw-Hill Companies, Inc CHAPTER Application Software computing ESSENTIALS    
EViews. Agenda Introduction EViews files and data Examining the data Estimating equations.
Everything I wish I had known about research design and data analysis… Statlab Workshop Fall 2006 Kyle Hood and Frank Farach.
Data Management: Quantifying Data & Planning Your Analysis
RESEARCH HUB AT THE UNIVERSITY LIBRARIES PENN STATE UNIVERSITY TOUR OF STATISTICAL PACKAGES.
Version 4 for Windows NEX T. Welcome to SphinxSurvey Version 4,4, the integrated solution for all your survey needs... Question list Questionnaire Design.
What is R Muhammad Omer. What is R  R is the programing language software for statistical computing and data analysis  The R language is extensively.
Introduction to R. Statistical Software Statistical software – Wide variety of software tools that researchers use to analyze data – Common examples are.
ANALYSIS OF BIOLOGICAL DATA BIOL4062/5062 Hal Whitehead.
Research Methods. Research Projects  Background Literature  Aims and Hypothesis  Methods: Study Design Data collection approach Sample Size and Power.
L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 13 1 MER301: Engineering Reliability LECTURE 13 Chapter 6: Multiple Linear.
Multilevel Modeling Using HLM and MLwiN Xiao Chen UCLA Academic Technology Services.
Sirius™ version 6.0 Sirius™ is a software package for multivariate data analysis and experimental design. Application areas: Spectral analysis and calibration.
Performing the Study Data Collection
Working With Large Datasets in Corporate Settings Ed Bassin
 Overview of SPSS  Interface  Getting Started  Managing Data  Descriptive Statistics  Basic Analysis  Additional Resources.
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
ALEXANDER C. LOPILATO R: Because the names of other stat programs don’t make sense so why should this one?
Applications Software. Applications software is designed to perform specific tasks. There are three main types of application software: Applications packages.
Java Programming, 3e Concepts and Techniques Chapter 3 Section 62 – Manipulating Data Using Methods – Day 1.
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
TheDataWeb & DataFerrett Rebecca Blash Bill Hazard The DataWeb Applications Branch U.S. Census Bureau.
Chapter 1 Introduction to SAS ® Enterprise Guide ®
Introduction to PASS, and Exercises on Operators and Basic I/O.
1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel.
Chapter 4c, Database H Definition H Structure H Parts H Types.
1 (21) EZinfo Introduction. 2 (21) EZinfo  A Software that makes data analysis easy  Reveals patterns, trends, groups, outliers and complex relationships.
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
WRITING REPORTS Introduction Section 0 Lecture 1 Slide 1 Lecture 6 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall 2015.
Correlation MEASURING ASSOCIATION Establishing a degree of association between two or more variables gets at the central objective of the scientific enterprise.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
1 Database Basics: Filemaker 7 Introduction Center for Faculty Development, SJSU Steve Sloan
Chapter 6: Analyzing and Interpreting Quantitative Data
Approach to Research Papers Pardis Esmaeili, B.S. Valcour Lab Mentoring Toolbox Valcour Lab Mentoring Toolbox2015.
Learning Objectives Understand the concepts of Information systems.
A computer contains two major sets of tools, software and hardware. Software is generally divided into Systems software and Applications software. Systems.
Introduction to CADStat. CADStat and R R is a powerful and free statistical package [
All Hands Meeting 2004 Clinician’s Requirements for HID Query and Statistics Interface Christine Fennema-Notestine, Ph.D. David Kennedy, Ph.D.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Strategies for Metabolomic Data Analysis Dmitry Grapov, PhD.
Pinellas County Schools
Rapid formation of regression tables for research purposes Roy Wada UCLA/RAND October 2007.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
CCSS.Math.Content.8.SP.A.1 Construct and interpret scatter plots for bivariate measurement data to investigate patterns of association between two quantities.
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
A quick guide to other statistical software
An Introduction to Epi Info 6/7
A statistical package for epidemiologists
ECONOMETRICS ii – spring 2018
Welcome to E-Prime E-Prime refers to the Experimenter’s Prime (best) development studio for the creation of computerized behavioral research. E-Prime is.
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
Presentation transcript:

Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007

What is STATA? A general purpose statistical analysis package used by A general purpose statistical analysis package used by –epidemiologists, demographers, clinical researchers, social scientists, many others Tool to graphically display data Tool to graphically display data –Good for data exploration –Also good for publishing in journals

Why STATA? Easy to learn Easy to learn Powerful Powerful It will help you produce papers It will help you produce papers

Anatomy of A Clinical Research Project Collect (the data) Collect (the data) Clean Clean Explore Explore Analyze Analyze Submit (for publication) Submit (for publication) Revise Revise

Collect the Data STATA is good for analyzing STATA is good for analyzing –large secondary databases –smaller home grown data Store the data as a relational database (or maybe as a spreadsheet) Store the data as a relational database (or maybe as a spreadsheet) –It’s easy to convert to STATA format from SAS and other formats

Clean the Data Merge in other sources of data Merge in other sources of data –STATA does merges of all types, including match merge, table-lookup, and more complicated merging Recode variables Recode variables Hunt for outliers Hunt for outliers Apply inclusion/exclusion criteria Apply inclusion/exclusion criteria Treat missing variables consistently Treat missing variables consistently

Explore the Data Make a data codebook Make a data codebook Examine univariate statistics Examine univariate statistics –mean, standard deviation, percentiles Explore bivariate relationships Explore bivariate relationships –correlations, conditional means, etc. Examine the data graphically Examine the data graphically –STATA has powerful graphics capabilities (with a simple GUI interface)

Analyze the Data STATA is powerful all-purpose statistical package with most common statistical computations built in STATA is powerful all-purpose statistical package with most common statistical computations built in STATA is extensible for uncommon statistical computations STATA is extensible for uncommon statistical computations –You can share the tools you develop with the rest of the STATA community –Built-in and user written commands have a common interface –The STATA community is vibrant and helpful

Built-In Commands Linear models (ANOVA, regressions) Linear models (ANOVA, regressions) Nonlinear models (logit, poission regression) Nonlinear models (logit, poission regression) Failure time models (KM curves, Cox models) Failure time models (KM curves, Cox models) Time-series models Time-series models R-like matrix processing tools R-like matrix processing tools Bootstrap Bootstrap Robust statistics Robust statistics –Standard error corrections for clustering –Accounting for complex survey design Powerful and easy to use macro language to automate commands Powerful and easy to use macro language to automate commands

Submit for Publication With STATA, you can make a wide variety of publishable-quality graphs With STATA, you can make a wide variety of publishable-quality graphs You can automatically generate tables of results that are easy to edit in your favorite word processor You can automatically generate tables of results that are easy to edit in your favorite word processor –These are commands added to STATA by the user community –LaTeX support

Revise STATA has a nice, intuitive GUI for interactive data exploration STATA has a nice, intuitive GUI for interactive data exploration –Don’t use it too much! STATA commands can be stored in a text (.do) file, edited, and re-run STATA commands can be stored in a text (.do) file, edited, and re-run

An Example Body mass index is weight (kg) divided by height (m) squared Body mass index is weight (kg) divided by height (m) squared Why squared? Why squared? –Presumably to make BMI independent of height—BMI should mean the same thing for a short man and a tall woman But does it? But does it? –And is the triceps skinfold test height independent?

NHANES data National Health and Nutrition Examination Survey (NHANES) National Health and Nutrition Examination Survey (NHANES) – edition Publicly available version can be downloaded from the National Center for Health Statistics Publicly available version can be downloaded from the National Center for Health Statistics –Includes anthropometric measurements –Plus lots of other covariates

Comparing SAS and STATA Pro: Pro: –STATA is easier to learn and at least as powerful –STATA is substantially cheaper –STATA tends to be faster –STATA has better help facilities Con: Con: –“Live” data management and report generation is easier with SAS –Simple analyses with datasets larger than memory is possible with SAS