STATA TUTORIAL: LAB 1. 1. STATA windows  The command window  The viewer/results window  The review of commands window  The variable window.

Slides:



Advertisements
Similar presentations
Basics of Biostatistics for Health Research Session 2 – February 14 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Advertisements

EViews Student Version. Today’s Workshop Basic grasp of how EViews manages data Creating Workfiles Importing data Running regressions Performing basic.
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Stata and logit recap. Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with.
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
Inference for Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
The World’s Fastest Crash Course in Statistics Or, What You Need to Know to Answer Your Research Question 13 November 2006.
BA 555 Practical Business Analysis
1 Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Consider the variable.
Sociology 601 Class 23: November 17, 2009 Homework #8 Review –spurious, intervening, & interactions effects –stata regression commands & output F-tests.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
An Introduction into Stata I Prof. Dr. Herbert Brücker University of Bamberg Seminar “Migration and the Labour Market” Session 3, June 9, 2011.
Interpreting Bi-variate OLS Regression
Excel Data Analysis Tools Descriptive Statistics – Data ribbon – Analysis section – Data Analysis icon – Descriptive Statistics option – Does NOT auto.
Getting Started with your data
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Linear Regression/Correlation
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
PY550 Research and Statistics Dr. Mary Alberici Central Methodist University.
Day 1: Getting Started Department of Economics
L2: BECOMING SELF- SUFFICIENT IN STATA Getting started with Stata Angela Ambroz May 2015.
Econometric Analysis Using Stata
Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health
1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©
A Brief Introduction to Stata(1). 1. Getting Started.
Learning the TSP2: a guide for students at the 国際総合学類筑波大学 RUNNING REGRESSIONS FROM A SPREADSHEET FILE If you are using a network browser to view this program,
Using SPSS for Windows Part II Jie Chen Ph.D. Phone: /6/20151.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
API-208: Stata Review Session Daniel Yew Mao Lim Harvard University Spring 2013.
STATA: An Introduction Into the Basics Prof. Dr. Herbert Brücker University of Bamberg Seminar “Migration and the Labour Market” May 31, 2012.
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
VIDEO: INTRODUCTION TO STATA EMBA Data Analysis Professor Timothy Simcoe Boston University School of Management.
Getting Started With Stata Session 1 Jim Anthony John Troost Department of Epidemiology Michigan State University.
Simple Linear Regression ANOVA for regression (10.2)
June 21, Objectives  Enable the Data Analysis Add-In  Quickly calculate descriptive statistics using the Data Analysis Add-In  Create a histogram.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education.
STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.
4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}
Chapter 8 Minitab Recipe Cards. Confidence intervals for the population mean Choose Basic Statistics from the Stat menu and 1- Sample t from the sub-menu.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Today Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation – GOF.
Stata Review Session Economics 1018 Abby Williamson and Hongyi Li November 17, 2006.
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
Introduction to Eviews Eviews Workshop September 6, :30 p.m.-3:30 p.m.
Data Workshop H397. Data Cleaning  Inputting data  Missing Values  Converting String Variables  Creating Scales  Creating Dummy Variables.
IENG-385 Statistical Methods for Engineers SPSS (Statistical package for social science) LAB # 1 (An Introduction to SPSS)
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the.
If sig is less than 0.05 (A) then the test is significant at 95% confidence (B) then the test is significant at 90% confidence (C) then the test is significant.
Introduction to SPSS Review of Concepts (stats and scales) Data entry (the workspace and labels) – By hand – Import Excel Running an analysis-
Advanced Quantitative Techniques
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
Inference for Regression
DEPARTMENT OF COMPUTER SCIENCE
ECONOMETRICS ii – spring 2018
Eviews Tutorial for Labor Economics Lei Lei
Lab Overview Aiman Moyaid Said | PhD.
Stata Basic Course Lab 2.
Reasoning in Psychology Using Statistics
Reasoning in Psychology Using Statistics
Presentation transcript:

STATA TUTORIAL: LAB 1

1. STATA windows  The command window  The viewer/results window  The review of commands window  The variable window

2. Working with STATA A. Opening Data B. Using a “log” file C. Useful Commands D. Using a “do” file

A. Opening Data  Shows you your data  Check this frequently, especially after commands you are unsure about

A. Opening your data  If your data is in STATA format, then:  Go to “File”>”Open”>Browse Location where data stored>double click  In Command window type: use “Fill In Correct Path Name\filename.dta”  Practice with “Wage1.dta”

A. Opening your data-Data editor/browser  Data editor/data browser shows you your data  Go to “Window”>”Data Editor”  Click on “Data Editor” or “Data Browser” icons (editor: can modify data by typing in cell...like Excel; browser: locked, so can’t make changes)  Good to look at data when load data or after commands so that can understand structure of data.

A. Opening your data-Variable Window  Now that you have data loaded, you can see the variables that are included in the data listed in the variable window.  Name...name of variable  Label...description of what variable is  Type/Format...how STATA stores the variable format  Click on variable and it appears in command window.

A. Opening your data-What do the variables look like?  wage educ exper tenure nonwhite female married numdep smsa northcen south west construc ndurman trcommpu trade services profserv profocc clerocc servocc lwage expersq tenursq  What values do they take?  Wage...tenure, numdep are actual #’s  Nonwhite...servocc take values of 0 or 1...qualitative measures of some personal characteristics  Lwage...tenursq are transformations of other variables (ln, square)

A. Opening your data (advanced)  If your data is a comma delimited file: insheet using “filename.txt”  If your data is a raw data file:  It must have a dictionary file and you must use the “infile” command  infile using “dictionaryname.dct”  dictionary file will refer to data that has a “.dat” or “.raw” extenstion

B. The “log” file  The log file is an “output file”  Creates and saves a log with all the actions performed by STATA and all the results  How to open/close?  Go to “File”>“log”>“begin”  Go to “File”>”log”>”close”  How to view it later?  Go to“File”>“log”>“view”, and search for your filename, keeping in mind it has extension “.log”

C. Useful Commands  “describe”:  STATA will list all the variables, their labels, types, and tell you the # of observations  Two types of variables: 1. Numerical 2. String (usually appear in red in the data browser) You can convert a string variable to numerical using the “destring” command: ie. “destring var1, replace” or “destring var1, force replace”

C. Useful Commands  “summarize, sum, summ”  tells STATA to compute summary statistics (mean, standard deviations, and so forth) for all variables  useful to identify outliers and get an idea of your data  i.e. summarize (will do all variables)  i.e. summarize wage educ (just does wage and educ..note, no “,” between variables)

C. Useful Commands  How many observations are there?  What is the average value of wage?  What is the min and max of tenure?

C. Useful Commands  “tabulate, tab”  Shows the frequency and percent of each value of the variable in the dataset  i.e. tabulate tenure  i.e. tab wage (long list, to display all press space bar)  i.e. tab educ female (gives education by gender)

C. Useful Commands  “generate, gen”  Creates a new variable  gen weeklywage=wage*40  tab weeklywage  gen prevexper=exper-tenure  gen lwage=ln(wage)...gen newlwage=ln(wage)  gen expersp=exper*exper or gen expersq=(exper)^2

C. Useful Commands  “if” command allows you to use only a portion of the observations  tab wage if female==1  sum exper if educ>=13  gen expermomwkid=exper-1 if female==1  gen expermomwkid=exper-1 if female==1 & numdep!=0

C. Useful Commands  “reg” reg dependent variable independent variable (s)  reg wage educ Increase in education by 1 unit (year) is predicted to increase hourly wage by $0.54 R sq= When educ=0, wage is predicted to be -$0.90.

C. SLR Wage regression Increase in education by 1 unit (year) is predicted to increase hourly wage by $ increase by 6 years=6*$0.54=$3.24 R sq=0.1648; variation in education explains 16.4% of variation in wages When educ=0, wage is predicted to be -$0.90. Variance of estimator is

C. Reading the output table  SSTotal --The total variability around the mean.  SSResidual --The sum of squared errors:  SSModel (aka SSE)  Observe SSModel=SSTotal - SSResidual.  Note that SSModel / SSTotal is equal to , the value of R-Square

C. Reading the output table  Coefficients:  wagePredicted = *educ  Statistics (Ch. 4)  t and P>|t| - These columns provide the t-value and 2-tailed p-value used in testing the null hypothesis that the coefficient (parameter) is 0.  [95% Conf. Interval] - This shows a 95% confidence interval for the coefficient. (the coefficient will not be statistically significant if the confidence interval includes 0)

C. Reading the output table  After the regression, type:  predict wagehat, xb Tells us predicted value of wage, given that observations value of education  predict uhat, resid tells us portion of wage that is not explained by the independent variable(s)

C. Useful Commands  “replace”: replace value with a new one  replace wage=4 if wage<4  “drop”: drop entire variable or just some observations  drop prevexper  drop if educ<=8  “keep”  keep wage educ  Keep if educ>=8 Be careful with these commands!!

C. Operators  < less than  > greater than  <= less than or equal to  >= greater than or equal to  == equal to  !=. or ~= not equal to  & and  | or

E. The “do” file  A text file that you can type all your commands in and store.  Helpful to keep a file of what commands you run in case you want to re-run them later.  How to open/save a do file?  Go to “Window”>”Do-File Editor”  Or click on “New Do-File Editor”  Save the do file (.do)  To open saved do file, open a new do-file and search for where you saved it.

E. The “do” file:  Comments in your do file: /* */  STATA ignores the text that comes after * (does not execute them)  these lines can be used to describe what the commands are doing, or allows you to write comments. /*the following command summarizes the variable wage*/ sum wage

E. The “do” file  From the STATA do-file editor  click “do” for STATA to execute all commands  can highlight and click “do” to execute only the highlighted command lines  click “run” for STATA to execute all commands, but you won’t see results in viewer/results window  All the commands in a do-file can be typed into the command window and run from there, but this is helpful if you want to do same thing over and over.

E. The “do” file  Each command must have it’s own line  Stata will not run: sum wage sum educ But will run: sum wage sum educ sum wage educ

F. Save your data  Saving in Stata format:  save “Type in correct path name\file name.dta”  Go to “File”>”Save” or “Save As”

G. Other Commands  Increasing memory, variables  “set memory 200m”  “set maxvar 400”  Clear the file  “clear”  For long commands  # delimit ;  tells STATA that each STATA command ends with a semicolon...instead of line break  Do not forget the “;” and write this even after the comment lines that start with *.

G. Other Commands  sort  i.e. sort educ  i.e. sort educ female  by educ: summarize wage (Note, must sort first by educ before can use by educ)  Graphs  twoway (scatter wage educ )  histogram wage

H. MLR Wage Regression Including other covariates doesn’t change estimate on wage by much. R sq increases Variables have expected sign: Higher wage if have more experience, are married or have family(because probably very devoted worker), and live in metropolitan area. Women generally get paid less than men.