Introduction to STATA Before you get frustrated, imagine processing data by hand and think dearly of STATA.
Workshop Outline Downloading STATA How STATA thinks? Using commands Importing data from Excel Tracking your work Do files Logs Generating New Variables Running OLS regressions Drawing a scatterplot with line of best fit Regression Tests Manipulating your data Copying results over to Word Saving your data and work
Thinking in STATA STATA is a model for working with data: similar to a word processor You can work with a copy of your data that is loaded into the processor memory. However, there will be no changes to the copy on the disk unless you explicitly replace the file. STATA is both connected to the web and your folders STATA uses commands STATA can save several different file types: .do files—txt files with your commands, for future reference and editing .log files—txt files with your output, for future reference and printing .dta files—data files in stata format .gph files—graph files in stata format .ado files—programs in stata
Command Window Data Summary Command Summary Command Results, main place to monitor your work
Commands Syntax: Commandvarlist if expin range List of Variables Observation number written beginning #/end # Ex—1/10 If expression Set with a qualifier like >5 meaning greater than five, or ==20 meaning is twenty
CategoryStata Commands Getting online help search, findit, help Operating system interface pwd, cd, sysdir, mkdir, rmdir, dir, erase, copy, type Using and saving data from disk use, save, append, merge, compress Inputting data into Stata input, edit, infile, infix, insheet The Internet and updating Stata update, net, ado, news Basic data reporting describe, codebook, list, browse, count, inspect, summarize, table, tabulate Data manipulation generate, replace, egen, rename, drop, keep, sort, encode, decode, order, by, reshape, collapse Formatting format, label Keeping track of your work log, notes Convenience display Most Common Commands
Getting Help Stata will provide information when an error occurs Just click on the blue error message to get more information A viewer will pop up with a reason for the error Search To search for the appropriate command type “help” into your command window. Still cannot find your answer… use Google Forums Blogs Electronic manuals
Working with Directories Stata is interactively connected to your folders You can directly pull or save files from anywhere on your computer pwd tells you what directory you are currently working in use filename open any file saved in that directory save filename save a file in stata format save filename, replace overwrites the dataset mkdir makes a new directory, (a new folder) cd change your directory You can get to my directory by typing “cd C:users\cbenson\workshops * IN General DO NOT SAVE IN THE STATA DIRECTORY --save your work files elsewhere, like your hdrive.
Importing Data from Excel Copy and paste In Excel, copy your full data set Open your data editor by clicking “data” then “data editor” Click on the first cell, and then “paste” Use first row as “variable names” Save as a “.dta” file
Clearing Data.clear removes any data that you might be working on, unless you have saved the data, none of the changes you made will affect the data set. This is important to do before you import new data Dictionaries Can specify how you want to import data (search “dictionaries” to learn more
Tracking your work Logs-keeps track of your all your commands and results Do Files-keeps your commands and allows you to re-execute work.
Logs Saves your results window Create a log by clicking on the notebook (no pencil), or by typing “log using filename” this will save in the current directory. Suspend a log by typing “log off” Re-open a log by typing “log on” Close a log by typing “log close” Add to a closed log by typing “log using filename, append”
Do Files You’ll want do files for your thesis and class assignments! Do files allow you to keep your commands so that you can re- run your work at a later date. They are very helpful for generating new variables, data manipulation that is multi-step, and tedious repetitive commands. To start a do-file, click on the notebook with a pencil button, or go to “window-do file editor—new do file”
DATA Reporting Describe basic information on variables Summarize basic descriptive statistics Codebook descriptive statistics, lots of information List spreadsheet form Label create variable labels and values Table frequency table q stops STATA in whatever it is running Inspect displays simple summary of data’s attributes Tabulate table of frequencies Count count observations satisfying specified conditions
Generating New Variables To generate a new variable go to “data—create or change data—new variable” You’ll get a screen like on the side Type in an expression that you want to generate. Alternatively, you could type the command, “generate new variable name = expression”
Exercise 1 1. Generate a variable named lnprice = ln(price) 2. Generate a variable that is an indicator variable for domestic cars (there are additional ways to go about this, I’ve included one below) Generate domestic=0 Replace domestic=1 if foreign==“Domestic” 3. Generate fuelefficient=1 if mgp>25
A Scatterplot with Best Fit Line Only for scatterplot graph Type: graph twoway scatter price weight Only for best fit line Type: graph twoway lfit price weight To draw a scatterplot with best fit line Type: graph twoway (lfit price weight) (scatter price weight) Remember dependent variable “y” axis. Independent variable “x” axis. The order of the variables in the command depends on which one do you choose as a dependent variable.
Exercise 2 Draw a scatterplot with best fit line
A Scatterplot with Best Fit Line and Confidence Interval Confidence interval: a range of values so defined that there is a specified probability that the value of a parameter lies within it. Scatterplot with CI: Calculates the prediction for yvar from a linear regression of yvar on xvar and plots the resulting line, along with a confidence interval Type twoway lfitci price weight
Exercise 3 Draw a scatterplot with best fit line and confidence interval
Running OLS Regressions To run a basic OLS regression, go to statistics linear models and related Linear regression. You’ll end up with a window like on the right. Insert your dependent variable and independent variables from the two drop-down menus. Alternatively, you can also type: “regress dependent variable independent variable names
OLS Continued—The shortcut (ish) Using your command window Regress depvar indepvars [if] [in] [weight] [,options]
Exercise 3 Run a model using several variables in your data set. Example: “regress price mpg headroom trunk weight”
Econometric Tests and Corrections Heteroskedasticy Normality Multicollinearity and high correlation Serial Correlation/autocorrelation
Testing for Heteroskedasticity (1) Null Hypothesis is that the error terms are normally distributed If you do have heteroskedasticity your standard errors are not reliable To test for heteroskedasticity… --Directly after your regression, use the command imtest, white will show the White test for heteroskedasticity
Correcting Heteroskedasticy If you find that you have heteroskedasticity (your p-value is greater than 0.1) then you can run your regression with robust standard errors. regress price mpg headroom trunk, robust
Testing for Heteroskedatsticity (2) You can also look at the residuals of your regression to see if you have non- normal errors. Commands -- predict resid, r creates residuals saves as “resid” -- plot resid dependent_variable graphs residuals against the dependent variable
Test for Skewness of Residuals Run an Skewness/Kurtosis Test -- predict resid, r -- sktest resid calculates skewness/kurtosis
Detecting Multicollinearity To check if you have multicollinearity, you will run a correlation matrix and see if you have a high rho between two variables. correl varlist runs a correlation matrix of all the variables specified Typically rhos greater than 0.6 should be looked at with caution.
Detecting Serial Correlation Auto correlation is common in time-series data sets To test for serial correlation you want to use a Durbin- Watson test. For the Durbin-Watson test you need to time-set your data. -- tsset time_variable or xtset time_variable tells stata your data is a time series -- dwstat finds the durbin-watson statistic
Other Data Manipulation rename rename a variable -- rename old_name new_name -drop delete a variable or observations -keep keep a variable or observation -replace replace a variable with a another (replace observations) -sort sort variables in ascending order -gsort sort variables in ascending or descending order -encode change a string to numeric -decode change a numeric variable to a string -by runs -mvdecode changes occurences of numlist to a missing value code -mvencode changes missing to specified numbers
Getting Help help command command information search keyword searches all sources search net keyword only searches the internet findit keyword searches unofficial sites as well You can also google any problem you are having and you’ll likely pull up a stata forum at stata.com
Neatly Putting Results into Word You want your results to be easily read in a word document. The easiest and quickest way to copy your results into a word document is to 1. Highlight the portion you want 2. Right click on the highlighted portion 3. Click copy as picture 4. Past (ctrl v) into a word document
Practice—copy as picture and paste You should end up with something that looks pretty—like this…
Saving your Data and Work To save your work, you want to close your work log. To save your data, you want to go to file, save as, and name your.dta file. –Please note that “saving” will only save the data, not your commands or log.
Conclusion This was a brief introduction to Stata. We covered the basics of opening stata, importing data, generating new variables, running a basic regression and discussed common problems and fixes, and saving your work in stata and word. The best advice for each of you is to go play around with STATA and have fun. If you need or want help, I’m happy to help you.
Questions? If you have additional questions at a later date, please stop by Palmer 118