Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to STATA Before you get frustrated, imagine processing data by hand and think dearly of STATA.

Similar presentations


Presentation on theme: "Introduction to STATA Before you get frustrated, imagine processing data by hand and think dearly of STATA."— Presentation transcript:

1 Introduction to STATA Before you get frustrated, imagine processing data by hand and think dearly of STATA.

2 Workshop Outline Downloading STATA How STATA thinks? Using commands Importing data from Excel Tracking your work Do files Logs Generating New Variables Running OLS regressions Drawing a scatterplot with line of best fit Regression Tests Manipulating your data Copying results over to Word Saving your data and work

3 Thinking in STATA STATA is a model for working with data: similar to a word processor You can work with a copy of your data that is loaded into the processor memory. However, there will be no changes to the copy on the disk unless you explicitly replace the file. STATA is both connected to the web and your folders STATA uses commands STATA can save several different file types: .do files—txt files with your commands, for future reference and editing .log files—txt files with your output, for future reference and printing .dta files—data files in stata format .gph files—graph files in stata format .ado files—programs in stata

4 Command Window Data Summary Command Summary Command Results, main place to monitor your work

5 Commands Syntax: Commandvarlist if expin range List of Variables Observation number written beginning #/end # Ex—1/10 If expression Set with a qualifier like >5 meaning greater than five, or ==20 meaning is twenty

6 CategoryStata Commands Getting online help search, findit, help Operating system interface pwd, cd, sysdir, mkdir, rmdir, dir, erase, copy, type Using and saving data from disk use, save, append, merge, compress Inputting data into Stata input, edit, infile, infix, insheet The Internet and updating Stata update, net, ado, news Basic data reporting describe, codebook, list, browse, count, inspect, summarize, table, tabulate Data manipulation generate, replace, egen, rename, drop, keep, sort, encode, decode, order, by, reshape, collapse Formatting format, label Keeping track of your work log, notes Convenience display Most Common Commands

7 Getting Help Stata will provide information when an error occurs Just click on the blue error message to get more information A viewer will pop up with a reason for the error Search To search for the appropriate command type “help” into your command window. Still cannot find your answer… use Google Forums Blogs Electronic manuals

8 Working with Directories Stata is interactively connected to your folders You can directly pull or save files from anywhere on your computer pwd  tells you what directory you are currently working in use filename  open any file saved in that directory save filename  save a file in stata format save filename, replace  overwrites the dataset mkdir  makes a new directory, (a new folder) cd  change your directory You can get to my directory by typing “cd C:users\cbenson\workshops * IN General DO NOT SAVE IN THE STATA DIRECTORY --save your work files elsewhere, like your hdrive.

9 Importing Data from Excel Copy and paste In Excel, copy your full data set Open your data editor by clicking “data” then “data editor” Click on the first cell, and then “paste” Use first row as “variable names” Save as a “.dta” file

10 Clearing Data.clear  removes any data that you might be working on, unless you have saved the data, none of the changes you made will affect the data set. This is important to do before you import new data Dictionaries Can specify how you want to import data (search “dictionaries” to learn more

11 Tracking your work Logs-keeps track of your all your commands and results Do Files-keeps your commands and allows you to re-execute work.

12 Logs Saves your results window Create a log by clicking on the notebook (no pencil), or by typing “log using filename” this will save in the current directory. Suspend a log by typing “log off” Re-open a log by typing “log on” Close a log by typing “log close” Add to a closed log by typing “log using filename, append”

13 Do Files You’ll want do files for your thesis and class assignments! Do files allow you to keep your commands so that you can re- run your work at a later date. They are very helpful for generating new variables, data manipulation that is multi-step, and tedious repetitive commands. To start a do-file, click on the notebook with a pencil button, or go to “window-do file editor—new do file”

14 DATA Reporting Describe  basic information on variables Summarize  basic descriptive statistics Codebook  descriptive statistics, lots of information List  spreadsheet form Label  create variable labels and values Table  frequency table q  stops STATA in whatever it is running Inspect  displays simple summary of data’s attributes Tabulate  table of frequencies Count  count observations satisfying specified conditions

15 Generating New Variables To generate a new variable go to “data—create or change data—new variable” You’ll get a screen like on the side  Type in an expression that you want to generate. Alternatively, you could type the command, “generate new variable name = expression”

16 Exercise 1 1. Generate a variable named lnprice = ln(price) 2. Generate a variable that is an indicator variable for domestic cars (there are additional ways to go about this, I’ve included one below) Generate domestic=0 Replace domestic=1 if foreign==“Domestic” 3. Generate fuelefficient=1 if mgp>25

17 A Scatterplot with Best Fit Line Only for scatterplot graph Type: graph twoway scatter price weight Only for best fit line Type: graph twoway lfit price weight To draw a scatterplot with best fit line Type: graph twoway (lfit price weight) (scatter price weight) Remember dependent variable “y” axis. Independent variable “x” axis. The order of the variables in the command depends on which one do you choose as a dependent variable.

18 Exercise 2 Draw a scatterplot with best fit line

19 A Scatterplot with Best Fit Line and Confidence Interval Confidence interval: a range of values so defined that there is a specified probability that the value of a parameter lies within it. Scatterplot with CI: Calculates the prediction for yvar from a linear regression of yvar on xvar and plots the resulting line, along with a confidence interval Type twoway lfitci price weight

20 Exercise 3 Draw a scatterplot with best fit line and confidence interval

21 Running OLS Regressions To run a basic OLS regression, go to statistics  linear models and related  Linear regression. You’ll end up with a window like on the right. Insert your dependent variable and independent variables from the two drop-down menus. Alternatively, you can also type: “regress dependent variable independent variable names

22 OLS Continued—The shortcut (ish) Using your command window Regress depvar indepvars [if] [in] [weight] [,options]

23 Exercise 3 Run a model using several variables in your data set. Example: “regress price mpg headroom trunk weight”

24 Econometric Tests and Corrections Heteroskedasticy Normality Multicollinearity and high correlation Serial Correlation/autocorrelation

25 Testing for Heteroskedasticity (1) Null Hypothesis is that the error terms are normally distributed If you do have heteroskedasticity your standard errors are not reliable To test for heteroskedasticity… --Directly after your regression, use the command imtest, white  will show the White test for heteroskedasticity

26 Correcting Heteroskedasticy If you find that you have heteroskedasticity (your p-value is greater than 0.1) then you can run your regression with robust standard errors. regress price mpg headroom trunk, robust

27 Testing for Heteroskedatsticity (2) You can also look at the residuals of your regression to see if you have non- normal errors. Commands -- predict resid, r  creates residuals saves as “resid” -- plot resid dependent_variable  graphs residuals against the dependent variable

28 Test for Skewness of Residuals Run an Skewness/Kurtosis Test -- predict resid, r -- sktest resid  calculates skewness/kurtosis

29 Detecting Multicollinearity To check if you have multicollinearity, you will run a correlation matrix and see if you have a high rho between two variables. correl varlist  runs a correlation matrix of all the variables specified Typically rhos greater than 0.6 should be looked at with caution.

30 Detecting Serial Correlation Auto correlation is common in time-series data sets To test for serial correlation you want to use a Durbin- Watson test. For the Durbin-Watson test you need to time-set your data. -- tsset time_variable or xtset time_variable  tells stata your data is a time series -- dwstat  finds the durbin-watson statistic

31 Other Data Manipulation rename  rename a variable -- rename old_name new_name -drop  delete a variable or observations -keep  keep a variable or observation -replace  replace a variable with a another (replace observations) -sort  sort variables in ascending order -gsort  sort variables in ascending or descending order -encode  change a string to numeric -decode  change a numeric variable to a string -by  runs -mvdecode  changes occurences of numlist to a missing value code -mvencode  changes missing to specified numbers

32 Getting Help help command  command information search keyword  searches all sources search net keyword  only searches the internet findit keyword  searches unofficial sites as well You can also google any problem you are having and you’ll likely pull up a stata forum at stata.com

33 Neatly Putting Results into Word You want your results to be easily read in a word document. The easiest and quickest way to copy your results into a word document is to 1. Highlight the portion you want 2. Right click on the highlighted portion 3. Click copy as picture 4. Past (ctrl v) into a word document

34 Practice—copy as picture and paste You should end up with something that looks pretty—like this…

35 Saving your Data and Work To save your work, you want to close your work log. To save your data, you want to go to file, save as, and name your.dta file. –Please note that “saving” will only save the data, not your commands or log.

36 Conclusion This was a brief introduction to Stata. We covered the basics of opening stata, importing data, generating new variables, running a basic regression and discussed common problems and fixes, and saving your work in stata and word. The best advice for each of you is to go play around with STATA and have fun. If you need or want help, I’m happy to help you.

37 Questions? If you have additional questions at a later date, please stop by Palmer 118


Download ppt "Introduction to STATA Before you get frustrated, imagine processing data by hand and think dearly of STATA."

Similar presentations


Ads by Google