Adrián de la Garza Jeremy Green 27 March 2009

Slides:



Advertisements
Similar presentations
Housekeeping: Variable labels, value labels, calculations and recoding
Advertisements

Research Methods Lecture 3 More STATA Ian Walker Room S2.109   Slides available at:
Everything I wish I had known about research design and data analysis… Statlab Workshop Spring 2005 Heather Lord and Melanie Dirks.
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Introduction to Research Design Statlab Workshop, Fall 2010 Jeremy Green Nancy Hite.
1. Overview Brief guide to the display windows and toolbar
Teaching Statistics Using Stata Software Susan Hailpern BSN MPH MS Department of Epidemiology and Population Health Albert Einstein College of Medicine.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
By Hrishikesh Gadre Session II Department of Mechanical Engineering Louisiana State University Engineering Equation Solver Tutorials.
1 An Introduction to IBM SPSS PSY450 Experimental Psychology Dr. Dwight Hennessy.
Data analysis Incorporating slides from IS208 (© Yale Braunstein) to show you how 208 and 214 are telling you many of the the same things; and how to use.
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Course director: Mark Pletcher Teaching Assistant: Lee Zane.
Chapter 12 Multiple Regression
A Simple Guide to Using SPSS© for Windows
Everything I wish I had known about research design and data analysis… Statlab Workshop Fall 2006 Kyle Hood and Frank Farach.
SW318 Social Work Statistics Slide 1 Using SPSS for Graphic Presentation  Various Graphics in SPSS  Pie chart  Bar chart  Histogram  Area chart 
Getting Started with your data
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
RESEARCH HUB AT THE UNIVERSITY LIBRARIES PENN STATE UNIVERSITY TOUR OF STATISTICAL PACKAGES.
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Guide to Using Excel For Basic Statistical Applications To Accompany Business Statistics: A Decision Making Approach, 6th Ed. Chapter 14: Multiple Regression.
Two-Way Analysis of Variance STAT E-150 Statistical Methods.
FEBRUARY, 2013 BY: ABDUL-RAUF A TRAINING WORKSHOP ON STATISTICAL AND PRESENTATIONAL SYSTEM SOFTWARE (SPSS) 18.0 WINDOWS.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
Day 1: Getting Started Department of Economics
Econometric Analysis Using Stata
Graphical Analysis. Why Graph Data? Graphical methods Require very little training Easy to use Massive amounts of data can be presented more readily Can.
 Overview of SPSS  Interface  Getting Started  Managing Data  Descriptive Statistics  Basic Analysis  Additional Resources.
Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health
Using SPSS for Windows Part II Jie Chen Ph.D. Phone: /6/20151.
API-208: Stata Review Session Daniel Yew Mao Lim Harvard University Spring 2013.
Key Data Management Tasks in Stata
The Use of Dummy Variables. In the examples so far the independent variables are continuous numerical variables. Suppose that some of the independent.
Dealing with data All variables ok? / getting acquainted Base model Final model(s) Assumption checking on final model(s) Conclusion(s) / Inference Better.
Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007.
What is SPSS  SPSS is a program software used for statistical analysis.  Statistical Package for Social Sciences.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
Introduction to Statistical Computing in Clinical Research Biostatistics 212.
Advanced Stata Workshop FHSS Research Support Center.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
1 An Introduction to SPSS for Windows Jie Chen Ph.D. 6/4/20161.
EDCI 696 Dr. D. Brown Presented by: Kim Bassa. Targeted Topics Analysis of dependent variables and different types of data Selecting the appropriate statistic.
Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education.
STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.
4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
PSC 47410: Data Analysis Workshop  What’s the purpose of this exercise?  The workshop’s research questions:  Who supports war in America?  How consistent.
Today Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation – GOF.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Stata Review Session Economics 1018 Abby Williamson and Hongyi Li November 17, 2006.
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
Analyzing Data. Learning Objectives You will learn to: – Import from excel – Add, move, recode, label, and compute variables – Perform descriptive analyses.
Data Analysis using Stata workshop #4 / Kristin Bott reed.edu > K.Bott / Instructional Technology Services Reed College / Portland, OR.
Before the class starts: 1) login to a computer 2) start Stata 13.
Using Graphs and Charts Organizing results from Scientific data.
Data Workshop H397. Data Cleaning  Inputting data  Missing Values  Converting String Variables  Creating Scales  Creating Dummy Variables.
The Law of Averages. What does the law of average say? We know that, from the definition of probability, in the long run the frequency of some event will.
IENG-385 Statistical Methods for Engineers SPSS (Statistical package for social science) LAB # 1 (An Introduction to SPSS)
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
Introduction to SPSS Review of Concepts (stats and scales) Data entry (the workspace and labels) – By hand – Import Excel Running an analysis-
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
QM222 Class 13 Section D1 Omitted variable bias (Chapter 13.)
DEPARTMENT OF COMPUTER SCIENCE
ECONOMETRICS ii – spring 2018
Introduction to Stata Spring 2017.
Stata Basic Course Lab 4.
Evaluation of Public Policy
Ordinary Least Square estimator using STATA
Presentation transcript:

Adrián de la Garza Jeremy Green 27 March 2009 Intermediate STATA Adrián de la Garza Jeremy Green 27 March 2009 4/14/2017

Getting Help STATA Help: Just type help in STATA main Command window. STATA listserv: http://www.stata.com/statalist/ UCLA Stat Computing: http://www.ats.ucla.edu/stat/stata/ Yale StatLab Consultants, online help and FAQs: http://statlab.stat.yale.edu/help/ Manuals also available at SSL and Yale StatLab. 0. Introduction

Today’s Workshop 1. Programming/Project Management Tips 2. Data Management 3. Analyzing Data - Graphs - Statistical Analysis Latest version: STATA v. 10: Commands throughout this presentation will always refer to this version, although most are backwards-compatible. 0. Introduction

Using DO files (1/2) DO files allow you to run a whole program interactively; you can run it all at once or select portions of the program. AVOID making changes to your original data interactively using the STATA command window. Use DO files instead. Use DO files to make changes to your data and to run your statistical and graphical analyses. Keep track of your progress. 4 1. Programming/Project Management Tips

Using DO files (2/2) Keep your DO files organized: Helps to create a main DO file from which you run other DO files that perform smaller tasks on your data. Write lots of comments in your DO file to help you remember what a command or a section of your DO file does. This will help you remember what you did months ago. To open DO file, use FILE menu or DO-file button. 5 1. Programming/Project Management Tips

Log files Syntax Open log file log using filename [, append replace [text|smcl] name(logname)] Close log, temporarily suspend logging, or resume logging log {close|off|on} [logname] Examples . log using mylog . log close . log using mylog, append . log using "filename containing spaces" 1. Programming/Project Management Tips

Managing Your Data Back up all Master Data Files CD, USB drive, network Keep a detailed codebook Describes each variable and values Adding variables, cases, computing new variables Keep a roadmap Keep a log of all analyses with what you have done Save syntax files 7 2. Data Management

Inspecting Your Data (1/3) cd “C:\Documents and Settings\Adrian\My Documents\stata files” clear set mem 80m log using “C:\Documents and Settings\Adrian\My Documents\stata files\logs\mylog” sysuse census browse list state region pop if _n <= 3 /* shows first 3 obs */ l state region pop if _N - _n <= 2 /* shows last 3 obs */ l state region pop in 1/3 /* shows first 3 obs */ l state region pop in -3/l /* shows last 3 obs */ 2. Data Management

Inspecting Your Data (2/3) generate agesq = medage^2 /* creates variable equal to medage squared */ sum pop /* shows summary stats for pop */ scalar popmean = r(mean) /* saves mean of pop to scalar popmean */ /* create variable equal to 1 when pop > popmean and 0 otherwise */ g dummy = 0 replace dummy = 1 if pop > popmean /* how many states have population higher than average? */ count if dummy == 1 /* how many states NOT IN THE SOUTH have pop > popmean? */ count if dummy == 1 & region != 3 9 2. Data Management

Inspecting Your Data (3/3) describe label list /* shows all labels attached to dataset */ label list cenreg /* shows label cenreg attached to variable region */ sum pop browse /* summarize population by region */ sum pop if region == “NE” /* this gives an error since region is not a string */ sum pop if region == 1 /* this does work */ 10 2. Data Management

Calculate mean population by region Method 1 sum pop if region == 1 sum pop if region == 2 sum pop if region == 3 sum pop if region == 4 Downside: We have to type the sum command for each individual region. If the dataset contained population data by city and we had to compute means for each of the 50 states, typing the sum command 50 times would be very painful!!! 11 2. Data Management

Calculate mean population by region Method 2 bysort region: sum pop Downside: This method shows the population means by region, like we wanted, but it also shows a bunch of other stats we may not care about. Also, the means are stored in memory but are not readily available for use in case we want to use those means for further calculations. 12 2. Data Management

Calculate mean population by region Method 3 table region, c(m pop) Downside: This method is great for presentation purposes: it shows exactly the information we want. One problem, however, is that the information is still not readily available for use in case we want to store the population means by region for further analyses. 13 2. Data Management

Calculate mean population by region Method 4 sysuse census, clear collapse (mean) pop, by(region) Downside: The collapse command converts the dataset in memory into a set of means, standard deviations, and other summary stats. In our case, the new dataset now contains population means by region. All variables other than the collapsed variable (pop) and the grouping variable (region) are NOT collapsed and hence disappear from dataset. Can we make any further analyses without the rest of the variables? 14 2. Data Management

Calculate mean population by region Method 5 sysuse census, clear by region, sort: egen meanpop = mean(pop) Downside: Do we really want an additional variable in the dataset that contains information on population means by region, a number that is repeated for each observation (state) within the same region? In very large datasets, one additional variable may lead to memory constraints. Use scalars? 15 2. Data Management

Reshaping Data sysuse bplong, clear br Suppose we want to take difference in bp before and after treatment. Difficult to calculate difference if data is organized in long format. Need to convert to wide format. reshape wide bp, i(patient sex agegrp) j(when) g bpdiff = bp2 – bp1 16 2. Data Management

Value Labels (1/2) g gender = sex br Why do gender and sex look different?  Value labels Why use value labels? * They save space (e.g., “0” instead of “male” for each obs.) * More informative to the researcher (e.g., what region is 3?) * Regression, lists, tables… display labels instead of values table sex, c(m bp1 m bp2) table gender, c(m bp1 m bp2) 17 2. Data Management

Value Labels (2/2) label value gender sex /* note that sex refers to label, not var */ br patient sex gender label value gender /* detaches sex label from gender variable */ br pat sex gend label define genderlbl 0 “man” 1 “woman” label value gender genderlbl What do the following commands do? label define genderlbl 2 “na”, add label define genderlbl 0 “Man” 1 “Woman” 2 “NA”, modify 18 2. Data Management

Dummy Variables (1/3) Suppose we want to create dummy vars for each of the 4 regions in census database: g dum1 = 0 replace dum1 = 1 if region == 1 … What problems may these commands lead to? 2. Data Management

Dummy Variables (2/3) To create four dummies, we need to type those two commands four times. More importantly, the previous method generates 0s even when we have missing values. tab region, g(d) This second method tabulates the variable region, showing a list of the four regions, and correctly creates 4 separate dummies, accounting for missing values. 20 2. Data Management

Dummy Variables (3/3) One more command that will be useful in regressions: xi i.region, noomit This third alternative yields the same results as the tab method described in previous slide. 21 2. Data Management

Merging Data (1/4) sysuse census, clear keep state-popurban sort state /* both master and using data must be sorted */ save census1, replace keep state region medage-divorce /* note region is kept in both */ sort state save census2, replace use census1, clear merge state using census2 /* remember: both files must be sorted */ table _merge /* _merge keeps track of how good merge was */ 2. Data Management

Merging Data (2/4) Important!!! If non-merging variable (e.g. region) is in both files, data on master file will be kept – while data on using file will be lost. use census1, clear l state region in 1 replace region = 2 in 1 sort state merge state using census2 table _merge l state region in 1 /* region data in master file is kept */ 23 2. Data Management

Merging Data (3/4) Now suppose that each of the two databases contains information about only SOME (non-overlapping) of the 50 states. Do we lose information after merging the two datasets? use census2, clear drop in 3/6 sort state save, replace use census1, clear drop in 22/23 merge state using census2 table _merge 24 2. Data Management

Merging Data (4/4) Finally, it’s important to note that, in case a variable has value labels attached in both datasets, labels attached to variables in master dataset prevail. This may cause serious trouble, for example, when we are merging datasets from surveys taken in different years and for which the possible values in the answers may mean different things. Example 1: Change in scale (1 to 4 in 1980; 1 to 5 in 1990). Example 2: Omitted country in second survey, but all countries, sorted in alphabetical order, are assigned consecutive values. 25 2. Data Management

Other Data Management Issues Use StatTransfer software to convert Excel, SAS, SPSS, … into STATA. Use compress command to make your dataset as small as possible and use less memory. Some very large datasets won’t open in STATA due to STATA’s memory limitations. In this case, it is recommended that you open a subset of the dataset, delete variables/observations that don’t interest you and try again: use varlist using filename 26 2. Data Management

Analyzing Data: Make a List Dependent Variable(s) (response, outcome, criterion) Independent Variables (explanatory or predictor variables) Treatment Variable Covariates / Confounding Variables Categorical and Continuous Variables Remember: Types of variables determine the statistics we use Time period Scope and type of analysis 27 3. Analyzing Data

Analyzing Data: Graphs (1/2) Draw a histogram: sysuse auto, clear histogram price Create a scatter plot: scatter price mpg Draw line of best fit (linear regression): twoway lfit price mpg Put two graphs together: twoway scatter price mpg || lfit price mpg 3. Analyzing Data

Analyzing Data: Graphs (2/2) Type help graphs to: * create other graphs (pie and bar charts, box plots, etc.); * adjust graph settings (change labels, axes, colors…) An easier (although less customizable) option is to use GRAPH menu. 29 3. Analyzing Data

Analyzing Data: Statistical Analysis (1/2) Correlation: quantify relationships between variables Regression: predict dependent variable from independent variable(s) Group differences t-test & ANOVA Chi-square for categorical and frequency data Significance v. effect size More Complex Models 30 3. Analyzing Data

Analyzing Data: Statistical Analysis (2/2) cor var1 var2 gives the basic (Pearson) correlation between two variables. cor price mpg regress var1 var2 gives the effect of var2 on var1. reg price mpg Useful textbook for more on stats for social sciences: Agresti, Alan, and Barbra Finlay (2008): Statistical Methods for the Social Sciences, Prentice Hall, 4th edition. Textbook examples with STATA: http://www.ats.ucla.edu/stat/examples/smss/ 31 3. Analyzing Data

Thank you!! 32