Introduction to STATA Before you get frustrated, imagine processing data by hand and think dearly of STATA.

Slides:



Advertisements
Similar presentations
Welcome to eDMR This PowerPoint presentation is designed to show eDMR users how to login and begin using the eDMR system.
Advertisements

Summary Statistics/Simple Graphs in SAS/EXCEL/JMP.
EViews Student Version. Today’s Workshop Basic grasp of how EViews manages data Creating Workfiles Importing data Running regressions Performing basic.
STATA Introductory manual. QUIZ What are the main OLS assumptions? 1.On average right 2.Linear 3.Predicting variables and error term uncorrelated 4.No.
Applied Econometrics Second edition
Stata and logit recap. Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with.
Chapter 3: Editing and Debugging SAS Programs. Some useful tips of using Program Editor Add line number: In the Command Box, type num, enter. Save SAS.
1. Overview Brief guide to the display windows and toolbar
Introduction to SPSS Allen Risley Academic Technology Services, CSUSM
By Hrishikesh Gadre Session II Department of Mechanical Engineering Louisiana State University Engineering Equation Solver Tutorials.
1 An Introduction to IBM SPSS PSY450 Experimental Psychology Dr. Dwight Hennessy.
1 MADE STATA Introductory manual. 2 MADE QUIZ What are the main OLS assumptions? 1.On average right 2.Linear 3.Predicting variables and error term uncorrelated.
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Course director: Mark Pletcher Teaching Assistant: Lee Zane.
A Simple Guide to Using SPSS© for Windows
Introduction to Spreadsheets Presented by Frank H. Osborne, Ph. D. © 2005 Bio 2900 Computer Applications in Biology.
EViews. Agenda Introduction EViews files and data Examining the data Estimating equations.
Getting Started with your data
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
E XCEL P ROJECT T UTORIAL. G ETTING YOUR UNIQUE DATA SET … Go to the stat 216 homepage: and.
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Introduction to SPSS (For SPSS Version 16.0)
L INUX C OMMAND L INE I NTERFACE G UNAANBAN.G
Or CMD/BATCH.  Title this comand makes the cmd prompt’s title whatever you would like it to be.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
Day 1: Getting Started Department of Economics
Econometric Analysis Using Stata
1 CCPR Computing Services Workshop: Introduction to Stata June, 2006.
Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health
Introduction to SPSS Edward A. Greenberg, PhD
LINDSEY BREWER CSSCR (CENTER FOR SOCIAL SCIENCE COMPUTATION AND RESEARCH) UNIVERSITY OF WASHINGTON September 17, 2009 Introduction to SPSS (Version 16)
A Brief Introduction to Stata(1). 1. Getting Started.
Learning the TSP2: a guide for students at the 国際総合学類筑波大学 RUNNING REGRESSIONS FROM A SPREADSHEET FILE If you are using a network browser to view this program,
API-208: Stata Review Session Daniel Yew Mao Lim Harvard University Spring 2013.
STATA Mini Course Fall 2015 Jane Leber Herr Littauer 113 1Stata Mini Course – Spring 2015.
IC 3 BASICS, Internet and Computing Core Certification Key Applications Lesson 11 Organizing the Worksheet.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
Introduction to Statistical Computing in Clinical Research Biostatistics 212.
Advanced Stata Workshop FHSS Research Support Center.
VIDEO: INTRODUCTION TO STATA EMBA Data Analysis Professor Timothy Simcoe Boston University School of Management.
© Buddy Freeman, Independence of error assumption. In many business applications using regression, the independent variable is TIME. When the data.
Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education.
STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
1.Introduction to SPSS By: MHM. Nafas At HARDY ATI For HNDT Agriculture.
Today Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation – GOF.
Key Applications Module Lesson 22 — Managing and Reporting Database Information Computer Literacy BASICS.
Stata Review Session Economics 1018 Abby Williamson and Hongyi Li November 17, 2006.
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
Data Analysis using Stata workshop #4 / Kristin Bott reed.edu > K.Bott / Instructional Technology Services Reed College / Portland, OR.
Before the class starts: 1) login to a computer 2) start Stata 13.
Introduction to Eviews Eviews Workshop September 6, :30 p.m.-3:30 p.m.
Metrics Lab Econometric Problems Lab. Import the Macro data from Excel and use first row as variable names Time set the year variable by typing “tsset.
Supply Application – Controlled Exception (CEX) Viewer Tutorial AFLCMC/WF Air Force Security Assistance and Cooperation (AFSAC) Directorate "THIS BRIEFING/PRESENTATION/DOCUMENT.
IENG-385 Statistical Methods for Engineers SPSS (Statistical package for social science) LAB # 1 (An Introduction to SPSS)
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
Project 1 to 3. Project 1 (10 pts) (use the word document to enter results and answers – Save this file as Lname_BUA350_Cohort#_projects#.doc Go to Total.
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
QM222 Class 13 Section D1 Omitted variable bias (Chapter 13.)
DEPARTMENT OF COMPUTER SCIENCE
Econometrics 704 Emilio Cuilty
ECONOMETRICS ii – spring 2018
Introduction Introduction to Stata 2016.
Lab 2 Data Manipulation and Descriptive Stats in R
Introduction to Stata Spring 2017.
Migration and the Labour Market
Eviews Tutorial for Labor Economics Lei Lei
Stata Basic Course Lab 2.
Running a Java Program using Blue Jay.
Evaluation of Public Policy
Presentation transcript:

Introduction to STATA Before you get frustrated, imagine processing data by hand and think dearly of STATA.

Workshop Outline Downloading STATA How STATA thinks? Using commands Importing data from Excel Tracking your work Do files Logs Generating New Variables Running OLS regressions Drawing a scatterplot with line of best fit Regression Tests Manipulating your data Copying results over to Word Saving your data and work

Thinking in STATA STATA is a model for working with data: similar to a word processor You can work with a copy of your data that is loaded into the processor memory. However, there will be no changes to the copy on the disk unless you explicitly replace the file. STATA is both connected to the web and your folders STATA uses commands STATA can save several different file types: .do files—txt files with your commands, for future reference and editing .log files—txt files with your output, for future reference and printing .dta files—data files in stata format .gph files—graph files in stata format .ado files—programs in stata

Command Window Data Summary Command Summary Command Results, main place to monitor your work

Commands Syntax: Commandvarlist if expin range List of Variables Observation number written beginning #/end # Ex—1/10 If expression Set with a qualifier like >5 meaning greater than five, or ==20 meaning is twenty

CategoryStata Commands Getting online help search, findit, help Operating system interface pwd, cd, sysdir, mkdir, rmdir, dir, erase, copy, type Using and saving data from disk use, save, append, merge, compress Inputting data into Stata input, edit, infile, infix, insheet The Internet and updating Stata update, net, ado, news Basic data reporting describe, codebook, list, browse, count, inspect, summarize, table, tabulate Data manipulation generate, replace, egen, rename, drop, keep, sort, encode, decode, order, by, reshape, collapse Formatting format, label Keeping track of your work log, notes Convenience display Most Common Commands

Getting Help Stata will provide information when an error occurs Just click on the blue error message to get more information A viewer will pop up with a reason for the error Search To search for the appropriate command type “help” into your command window. Still cannot find your answer… use Google Forums Blogs Electronic manuals

Working with Directories Stata is interactively connected to your folders You can directly pull or save files from anywhere on your computer pwd  tells you what directory you are currently working in use filename  open any file saved in that directory save filename  save a file in stata format save filename, replace  overwrites the dataset mkdir  makes a new directory, (a new folder) cd  change your directory You can get to my directory by typing “cd C:users\cbenson\workshops * IN General DO NOT SAVE IN THE STATA DIRECTORY --save your work files elsewhere, like your hdrive.

Importing Data from Excel Copy and paste In Excel, copy your full data set Open your data editor by clicking “data” then “data editor” Click on the first cell, and then “paste” Use first row as “variable names” Save as a “.dta” file

Clearing Data.clear  removes any data that you might be working on, unless you have saved the data, none of the changes you made will affect the data set. This is important to do before you import new data Dictionaries Can specify how you want to import data (search “dictionaries” to learn more

Tracking your work Logs-keeps track of your all your commands and results Do Files-keeps your commands and allows you to re-execute work.

Logs Saves your results window Create a log by clicking on the notebook (no pencil), or by typing “log using filename” this will save in the current directory. Suspend a log by typing “log off” Re-open a log by typing “log on” Close a log by typing “log close” Add to a closed log by typing “log using filename, append”

Do Files You’ll want do files for your thesis and class assignments! Do files allow you to keep your commands so that you can re- run your work at a later date. They are very helpful for generating new variables, data manipulation that is multi-step, and tedious repetitive commands. To start a do-file, click on the notebook with a pencil button, or go to “window-do file editor—new do file”

DATA Reporting Describe  basic information on variables Summarize  basic descriptive statistics Codebook  descriptive statistics, lots of information List  spreadsheet form Label  create variable labels and values Table  frequency table q  stops STATA in whatever it is running Inspect  displays simple summary of data’s attributes Tabulate  table of frequencies Count  count observations satisfying specified conditions

Generating New Variables To generate a new variable go to “data—create or change data—new variable” You’ll get a screen like on the side  Type in an expression that you want to generate. Alternatively, you could type the command, “generate new variable name = expression”

Exercise 1 1. Generate a variable named lnprice = ln(price) 2. Generate a variable that is an indicator variable for domestic cars (there are additional ways to go about this, I’ve included one below) Generate domestic=0 Replace domestic=1 if foreign==“Domestic” 3. Generate fuelefficient=1 if mgp>25

A Scatterplot with Best Fit Line Only for scatterplot graph Type: graph twoway scatter price weight Only for best fit line Type: graph twoway lfit price weight To draw a scatterplot with best fit line Type: graph twoway (lfit price weight) (scatter price weight) Remember dependent variable “y” axis. Independent variable “x” axis. The order of the variables in the command depends on which one do you choose as a dependent variable.

Exercise 2 Draw a scatterplot with best fit line

A Scatterplot with Best Fit Line and Confidence Interval Confidence interval: a range of values so defined that there is a specified probability that the value of a parameter lies within it. Scatterplot with CI: Calculates the prediction for yvar from a linear regression of yvar on xvar and plots the resulting line, along with a confidence interval Type twoway lfitci price weight

Exercise 3 Draw a scatterplot with best fit line and confidence interval

Running OLS Regressions To run a basic OLS regression, go to statistics  linear models and related  Linear regression. You’ll end up with a window like on the right. Insert your dependent variable and independent variables from the two drop-down menus. Alternatively, you can also type: “regress dependent variable independent variable names

OLS Continued—The shortcut (ish) Using your command window Regress depvar indepvars [if] [in] [weight] [,options]

Exercise 3 Run a model using several variables in your data set. Example: “regress price mpg headroom trunk weight”

Econometric Tests and Corrections Heteroskedasticy Normality Multicollinearity and high correlation Serial Correlation/autocorrelation

Testing for Heteroskedasticity (1) Null Hypothesis is that the error terms are normally distributed If you do have heteroskedasticity your standard errors are not reliable To test for heteroskedasticity… --Directly after your regression, use the command imtest, white  will show the White test for heteroskedasticity

Correcting Heteroskedasticy If you find that you have heteroskedasticity (your p-value is greater than 0.1) then you can run your regression with robust standard errors. regress price mpg headroom trunk, robust

Testing for Heteroskedatsticity (2) You can also look at the residuals of your regression to see if you have non- normal errors. Commands -- predict resid, r  creates residuals saves as “resid” -- plot resid dependent_variable  graphs residuals against the dependent variable

Test for Skewness of Residuals Run an Skewness/Kurtosis Test -- predict resid, r -- sktest resid  calculates skewness/kurtosis

Detecting Multicollinearity To check if you have multicollinearity, you will run a correlation matrix and see if you have a high rho between two variables. correl varlist  runs a correlation matrix of all the variables specified Typically rhos greater than 0.6 should be looked at with caution.

Detecting Serial Correlation Auto correlation is common in time-series data sets To test for serial correlation you want to use a Durbin- Watson test. For the Durbin-Watson test you need to time-set your data. -- tsset time_variable or xtset time_variable  tells stata your data is a time series -- dwstat  finds the durbin-watson statistic

Other Data Manipulation rename  rename a variable -- rename old_name new_name -drop  delete a variable or observations -keep  keep a variable or observation -replace  replace a variable with a another (replace observations) -sort  sort variables in ascending order -gsort  sort variables in ascending or descending order -encode  change a string to numeric -decode  change a numeric variable to a string -by  runs -mvdecode  changes occurences of numlist to a missing value code -mvencode  changes missing to specified numbers

Getting Help help command  command information search keyword  searches all sources search net keyword  only searches the internet findit keyword  searches unofficial sites as well You can also google any problem you are having and you’ll likely pull up a stata forum at stata.com

Neatly Putting Results into Word You want your results to be easily read in a word document. The easiest and quickest way to copy your results into a word document is to 1. Highlight the portion you want 2. Right click on the highlighted portion 3. Click copy as picture 4. Past (ctrl v) into a word document

Practice—copy as picture and paste You should end up with something that looks pretty—like this…

Saving your Data and Work To save your work, you want to close your work log. To save your data, you want to go to file, save as, and name your.dta file. –Please note that “saving” will only save the data, not your commands or log.

Conclusion This was a brief introduction to Stata. We covered the basics of opening stata, importing data, generating new variables, running a basic regression and discussed common problems and fixes, and saving your work in stata and word. The best advice for each of you is to go play around with STATA and have fun. If you need or want help, I’m happy to help you.

Questions? If you have additional questions at a later date, please stop by Palmer 118