1. Overview Do-files Summary statistics Correlation Linear regression

Slides:



Advertisements
Similar presentations
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Advertisements

Basics of Biostatistics for Health Research Session 2 – February 14 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Minitab® 15 Tips and Tricks
EViews Student Version. Today’s Workshop Basic grasp of how EViews manages data Creating Workfiles Importing data Running regressions Performing basic.
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Discrete Choice Modeling William Greene Stern School of Business New York University Lab Sessions.
Applied Econometrics Second edition
Stata and logit recap. Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with.
1. Overview Brief guide to the display windows and toolbar
Instrumental Variables Estimation and Two Stage Least Square
Introduction to SPSS Allen Risley Academic Technology Services, CSUSM
INTRODUCTION TO STATA Võ Tuấn Khoa Trần Thế Trung.
The World’s Fastest Crash Course in Statistics Or, What You Need to Know to Answer Your Research Question 13 November 2006.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
By Hrishikesh Gadre Session II Department of Mechanical Engineering Louisiana State University Engineering Equation Solver Tutorials.
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Course director: Mark Pletcher Teaching Assistant: Lee Zane.
Graphing With Excel 2010 University of Michigan – Dearborn Science Learning Center Based on a presentation by James Golen Revised by Annette Sieg…
Statistics: Data Analysis and Presentation Fr Clinic II.
Graphing With Excel Presented by Frank H. Osborne, Ph. D. © 2008 ID 2950 Technology and the Young Child.
A Simple Guide to Using SPSS© for Windows
EViews. Agenda Introduction EViews files and data Examining the data Estimating equations.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) Jawaharlal Nehru University (JNU) New Delhi India
Guide to Using Excel For Basic Statistical Applications To Accompany Business Statistics: A Decision Making Approach, 6th Ed. Chapter 14: Multiple Regression.
FEBRUARY, 2013 BY: ABDUL-RAUF A TRAINING WORKSHOP ON STATISTICAL AND PRESENTATIONAL SYSTEM SOFTWARE (SPSS) 18.0 WINDOWS.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
Stata 12 Merging Guide Nathan Favero Texas A&M University October 19, 2012.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Using SPSS for Windows Part II Jie Chen Ph.D. Phone: /6/20151.
API-208: Stata Review Session Daniel Yew Mao Lim Harvard University Spring 2013.
Key Data Management Tasks in Stata
STATA: An Introduction Into the Basics Prof. Dr. Herbert Brücker University of Bamberg Seminar “Migration and the Labour Market” May 31, 2012.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
Introduction to Statistical Computing in Clinical Research Biostatistics 212.
Advanced Stata Workshop FHSS Research Support Center.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Within Subjects Analysis of Variance PowerPoint.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 12.3.
Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education.
STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
Data Analysis, Presentation, and Statistics
Today Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation – GOF.
Chapter 2: Excel Basics and Formatting Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Stata Review Session Economics 1018 Abby Williamson and Hongyi Li November 17, 2006.
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
Ec 2390: Section 1 Useful STATA commands Jack Willis September 14th, 2015.
Analyzing Data. Learning Objectives You will learn to: – Import from excel – Add, move, recode, label, and compute variables – Perform descriptive analyses.
Before the class starts: 1) login to a computer 2) start Stata 13.
Introduction to Eviews Eviews Workshop September 6, :30 p.m.-3:30 p.m.
Metrics Lab Econometric Problems Lab. Import the Macro data from Excel and use first row as variable names Time set the year variable by typing “tsset.
Introduction to Excel EC 151 Principles of Microeconomics Block 3,
IENG-385 Statistical Methods for Engineers SPSS (Statistical package for social science) LAB # 1 (An Introduction to SPSS)
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
Econ 326 Prof. Mariana Carrera Lab Session X [DATE]
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
Introduction to SPSS.
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
DEPARTMENT OF COMPUTER SCIENCE
Migration and the Labour Market
Eviews Tutorial for Labor Economics Lei Lei
Ordinary Least Square estimator using STATA
Chapter 13 Excel Extension: Now You Try!
Presentation transcript:

An Introduction to Stata for Economists Part II: Data Analysis Kerry L. Papps

1. Overview Do-files Summary statistics Correlation Linear regression Generating predicted values and hypothesis testing Instrumental variables and other estimators Panel data capabilities Panel estimators

2. Overview (cont.) Writing loops Graphs

3. Comment on notation used Consider the following syntax description: list [varlist] [in range] Text in typewriter-style font should be typed exactly as it appears (although there are possibilities for abbreviation). Italicised text should be replaced by desired variable names etc. Square brackets (i.e. []) enclose optional Stata commands (do not actually type these).

4. Comment on notation used (cont.) For example, an actual Stata command might be: list name occupation This notation is consistent with notation in Stata Help menu and manuals.

5. Do-files Do-files allow commands to be saved and executed in “batch” form. We will use the Stata do-file editor to write do-files. To open do-file editor click Window  Do-File Editor or click Can also use WordPad or Notepad: Save as “Text Document” with extension “.do” (instead of “.txt”). Allows larger files than do-file editor.

6. Do-files (cont.) Note: a blank line must be included at the end of a WordPad do-file (otherwise last line will not run). To run a do-file from within the do-file editor, either select Tools  Do or click If you highlight certain lines of code, only those commands will run. To run do-file from the main Stata windows, either select File  Do or type: do dofilename

7. Do-files (cont.) Can “comment out” lines by preceding with * or by enclosing text within /* and */. Can save the contents of the Review window as a do-file by right-clicking on window and selecting “Save All...”.

8. Univariate summary statistics tabstat produces a table of summary statistics: tabstat varlist [, statistics(statlist)] Example: tabstat age educ, stats(mean sd sdmean n) summarize displays a variety of univariate summary statistics (number of non-missing observations, mean, standard deviation, minimum, maximum): summarize [varlist]

9. Multivariate summary statistics table displays table of statistics: table rowvar [colvar] [, contents(clist varname)] clist can be freq, mean, sum etc. rowvar and colvar may be numeric or string variables. Example: table sex educ, c(mean age median inc)

10. Multivariate summary statistics (cont.) One “super-column” and up to 4 “super-rows” are also allowed. Missing values are excluded from tables by default. To include them as a group, use the missing option with table.

EXERCISE 1 11. Generating simple statistics Open the do-file editor in Stata. Run all your solutions to the exercises from here. Open nlswork.dta from the internet as follows: webuse nlswork Type summarize to look at the summary statistics for all variables in the dataset. Generate a wage variable, which exponentiates ln_wage: gen wage=exp(ln_wage)

EXERCISE 1 (cont.) 12. Generating simple statistics Restrict summarize to hours and wage and perform it separately for non-married and married (i.e. msp==0 and 1). Use tabstat to report the mean, median, minimum and maximum for hours and wage. Report the mean and median of wage by age (along the rows) and race (across the columns) : table age race, c(mean wage median wage)

13. Sets of dummy variables Dummy variables take the values 0 and 1 only. Large sets of dummy variables can be created with: tab varname, gen(dummyname) When using large numbers of dummies in regressions, useful to name with pattern, e.g. id1, id2… Then id* can be used to refer to all variables beginning with *.

14. Correlation To obtain the correlation between a set of variables, type: correlate [varlist] [[weight]] [, covariance] covariance option displays the covariances rather than the correlation coefficients. pwcorr displays all the pairwise correlation coefficients between the variables in varlist: pwcorr [varlist] [[weight]] [, sig]

15. Correlation (cont.) sig option adds a line to each row of matrix reporting the significance level of each correlation coefficient. Difference between correlate and pwcorr is that the former performs listwise deletion of missing observations while the latter performs pairwise deletion. To display the estimated covariance matrix after a regression command use: estat vce

16. Correlation (cont.) (This matrix can also be displayed using Stata’s matrix commands, which we will not cover in this course.)

17. Linear regression To perform a linear regression of depvar on varlist, type: regress depvar varlist [[weight]] [if exp] [, noconstant robust] depvar is the dependent variable. varlist is the set of independent variables (regressors). By default Stata includes a constant. The noconstant option excludes it.

18. Linear regression (cont.) robust specifies that Stata report the Huber-White standard errors (which account for heteroskedasticity). Weights are often used, e.g. when data are group averages, as in: regress inflation unemplrate year [aweight=pop] This is weighted least squares (i.e. GLS). Note that here year allows for a linear time trend.

19. Post-estimation commands After all estimation commands (i.e. regress, logit) several predicted values can be computed using predict. predict refers to the most recent model estimated. predict yhat, xb creates a new variable yhat equal to the predicted values of the dependent variable. predict res, residual creates a new variable res equal to the residuals.

20. Post-estimation commands (cont.) Linear hypotheses can be tested (e.g. t-test or F-test) after estimating a model by using test. test varlist tests that the coefficients corresponding to every element in varlist jointly equal zero. test eqlist tests the restrictions in eqlist, e.g.: test sex==3 The option accumulate allows a hypothesis to be tested jointly with the previously tested hypotheses.

21. Post-estimation commands (cont.) Example: regress lnw sex race school age test sex race test school == age, accum

EXERCISE 2 22. Linear regression Compute the correlation between wage and grade. Is it significant at the 1% level? Generate a variable called age2 that is equal to the square of age (the square operator in Stata is ^). Create a set of race dummies with: tab race, gen(race) Regress ln_wage on: age, age2, race2, race3, msp, grade, tenure, c_city.

EXERCISE 2 (cont.) 23. Linear regression Display the covariance matrix from this regression. Use predict to generate a variable res containing the residuals from the equation. Use summarize to confirm that the mean of the residuals is zero. Rerun the regression and report Huber-White standard errors.

24. Additional estimators Instrumental variables: ivregress 2sls depvar exogvars (endogvars=ivvars) Both exogvars and ivvars are used as instruments for endogvars. For example: ivregress 2sls price inc pop (qty=cost) Logit: logit depvar indepvars

25. Additional estimators (cont.) Probit: probit depvar indepvars Ordered probit: oprobit depvar indepvars Tobit: tobit depvar indepvars, ll(cutoff) For example, tobit could be used to estimate labour supply: tobit hrs educ age child, ll(0)

EXERCISE 3 26. IV and probit Repeat the regression from Exercise 2 using ivregress 2sls and instrument for tenure using union and south. Compare the results with those from Exercise 2. Estimate a probit model for union with the following regressors: age, age2, race2, race3, msp, grade, c_city, south.

27. Panel data manipulation Panel data generally refer to the repeated observation of a set of fixed entities at fixed intervals of time (also known as longitudinal data). Stata is particularly good at arranging and analysing panel data. Stata refers to two panel display formats: Wide form: useful for display purposes and often the form data obtained in. Long form: needed for regressions etc.

28. Panel data manipulation (cont.) Example of wide form: Note the naming convention for inc. i xij id sex inc2008 inc2009 inc2010 1 5000 5500 6000 2 2000 2200 3300 3 3000 1000

29. Panel data manipulation (cont.) Example of long form: i j xij id year sex inc 1 2008 5000 2009 5500 2010 6000 2 2000 2200 3300 3 3000 1000

30. Panel data manipulation (cont.) To change from long to wide form, type: reshape wide varlist, i(ivarname) j(jvarname) varlist is the list of variables to be converted from long to wide form. i(ivarname) specifies the variable(s) whose unique values denote the spatial unit. j(jvarname) specifies the variable whose unique values denote the time period.

31. Panel data manipulation (cont.) To change from wide to long form, type: reshape long stublist, i(ivarname) j(jvarname) stublist is the “word” part of the names of variables to be converted from wide to long form, e.g. “inc” above. It is important to name variables in this format, i.e. word description followed by year.

32. Panel data manipulation (cont.) To move between the above example datasets use: reshape long inc, i(id) j(year) reshape wide inc, i(id) j(year) These steps “undo” each other.

33. Lags You can “declare” the data to be in panel form, with the tsset command: tsset panelvar timevar For example: tsset country year After using tsset, a lag can be created with: gen lagname = L.varname Similarly, L2.varname gives the second lag.

34. Panel estimators Panel data estimation: xtreg depvar indepvars [, re fe i(panelvar)] i(panelvar) specifies the variable corresponding to an independent unit (e.g. country). This can be omitted if the data have been tsset. re and fe specify how we wish to treat the time-invariant error term (random effects vs fixed effects).

35. Panel estimators (cont.) An alternative to fe is to regress depvar on a set of dummy variables for each panel unit. You should either drop one dummy or use the noconstant option to avoid the dummy variable trap, although Stata automatically drops regressors when they are perfectly collinear. To perform a Hausman test of fixed vs random effects, first run each estimator and save the estimates, then use the hausman command:

36. Panel estimators (cont.) xtreg depvar indepvars, fe estimates store fe_name xtreg depvar indepvars, re estimates store re_name hausman fe_name re_name You must list the fe_name before re_name in the hausman command.

EXERCISE 4 37. Manipulating a panel Declare the data to be a panel using tsset, noting that idcode is the panel variable and year is the time variable. Generate a new variable lwage equal to the lag of wage and confirm that this contains the correct values by listing some data (use the break button): list idcode year wage lwage Save the file as “NLS data” in a folder of your choice.

EXERCISE 4 (cont.) 38. Manipulating a panel Using the same regressors from the regress command in Exercise 2, run a fixed effects regression for ln_wage using xtreg. Note that all time invariant variables are dropped. Store the estimates as fixed. Run a random effects regression and store the estimates as random. Perform a Hausman test of random vs fixed effects. Which is preferred?

EXERCISE 4 (cont.) 39. Manipulating a panel Drop all variables other than idcode, year and wage using the keep command (quicker than using drop). Use the reshape wide option to rearrange the data so that the first column represents each person (idcode) and the other columns contain wage for a particular year. Return the data to long form (change wide to long in the command).

EXERCISE 4 (cont.) 40. Manipulating a panel Do not save the new dataset.

41. Writing loops The foreach command allows one to repeat a sequence of commands over a set of variables: foreach name of varlist varlist { Stata commands referring to `name’ } Stata sequentially sets name equal to each element in varlist and executes the commands enclosed in braces. name should be enclosed within the characters ` and ’ when referred to within the braces.

42. Writing loops (cont.) name can be any word and is an example of a “local macro”. For example: foreach var of varlist age educ inc { gen l`var’=log(`var’) drop `var’ }

EXERCISE 5 43. Using loops in regression Open “NLS data” and rerun the fixed effects regression from Exercise 4. Use foreach with varlist to loop over all the regressors and report their t-statistics (using test). Use foreach with varlist to create a loop that renames each variable by adding “68” to the end of the existing name.

44. Graphs To obtain a basic histogram of varname, type: histogram varname, discrete freq To display a scatterplot of two (or more) variables, type: scatter varlist [[weight]] weight determines the diameter of the markers used in the scatterplot.

45. Graphs (cont.) There are options for (among other things): Adding a title (title) Altering the scale of the axes (xscale, yscale) Specifying what axis labels to use (xlabel, ylabel) Changing the markers used (msymbol) Changing the connecting lines (connect)

46. Graphs (cont.) Particularly useful is mlabel(varname) which uses the values of varname as markers in the scatterplot. Example: scatter gdp unemplrate, mlabel(country)

47. Graphs (cont.) Graphs are not saved by log files (separate windows). Select File  Save Graph. To insert in a Word document etc., select Edit  Copy and then paste into Word document. This can be resized but is not interactive (unlike Excel charts etc.).