Stata Introduction Sociology 229A, Class 2 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission
Stata Intro Stata is awesome because: It is powerful –It does nearly everything that you’d need –And, it is being improved all the time It is fast It is extensible –People can write “add-ons” that you can download Programming is simpler/cleaner than SPSS or SAS –Though it has its quirks… It is a common standard… datasets are increasingly available in Stata format
Stata Quirks Stata is case sensitive Variable “age” not the same as “AGE” Stata syntax is elegant, but potentially cryptic Math operators are reminiscent of a computer language –They work great, but there is a learning curve Made worse by the fact that you an abbreviate any stata command –“generate” becomes “gen”, “summarize” becomes “sum”
Stata Advice Learn to write programs the old way… What stata calls “do” files –Open up “do file” editor and write commands And, save the commands into a “do file” Run commands by selecting them and clicking “do” Don’t use “point-and-click” menus to do analyses –Why? When doing research, you need a clear record –E.g., of how you coded all variables –Because you may collaborate; or need to review things later You need to be able to easily change your decisions –Re-code variables, etc…
Stata Basics use – open a stata data file use "C:\Users\schofer\Documents\GSS2006subset.dta" generate – compute a new variable gen x=0 gen logGDPPerCapita = ln(GDP/Population) Note: gen will not over-write data This is to prevent errors… Instead, use “replace” replace x=5 replace male = 1 if gender == 2 egen – extensions to “gen” Fancy computations like computing variable means, counts, etc.
Stata Basics If/then commands: Stata is opposite of SPSS SPSS – If x=1 Y=2 Stata: replace y=2 if x==1 Stata is clearer in 2 ways: –By specifying the computation first –By using double-equals (==) to indicate comparison –Single-equals assigns value, double compares… –Note: Stata allows “if” in any context Including in a regression reg salary jobhrs educ, if male == 1 –Conduct an analysis looking only at males –Easier than cumbersome “select” or “filter” in SPSS.
Stata Basics sum – “summarize” = descriptive statistics Mean, SD, etc sum age educ maritalstatus tab – “tabulate” = histogram or crosstab Specify 2 variables to make a crosstab tab educ tab sex maritalstatus reg – “regress” = multivariate regression reg depvar indep1 indep2 indep3 reg salary jobhrs educ
Stata Basics Let’s review a simple do-file together From the homework assignment…