STATA TUTORIAL: LAB 1
1. STATA windows The command window The viewer/results window The review of commands window The variable window
2. Working with STATA A. Opening Data B. Using a “log” file C. Useful Commands D. Using a “do” file
A. Opening Data Shows you your data Check this frequently, especially after commands you are unsure about
A. Opening your data If your data is in STATA format, then: Go to “File”>”Open”>Browse Location where data stored>double click In Command window type: use “Fill In Correct Path Name\filename.dta” Practice with “Wage1.dta”
A. Opening your data-Data editor/browser Data editor/data browser shows you your data Go to “Window”>”Data Editor” Click on “Data Editor” or “Data Browser” icons (editor: can modify data by typing in cell...like Excel; browser: locked, so can’t make changes) Good to look at data when load data or after commands so that can understand structure of data.
A. Opening your data-Variable Window Now that you have data loaded, you can see the variables that are included in the data listed in the variable window. Name...name of variable Label...description of what variable is Type/Format...how STATA stores the variable format Click on variable and it appears in command window.
A. Opening your data-What do the variables look like? wage educ exper tenure nonwhite female married numdep smsa northcen south west construc ndurman trcommpu trade services profserv profocc clerocc servocc lwage expersq tenursq What values do they take? Wage...tenure, numdep are actual #’s Nonwhite...servocc take values of 0 or 1...qualitative measures of some personal characteristics Lwage...tenursq are transformations of other variables (ln, square)
A. Opening your data (advanced) If your data is a comma delimited file: insheet using “filename.txt” If your data is a raw data file: It must have a dictionary file and you must use the “infile” command infile using “dictionaryname.dct” dictionary file will refer to data that has a “.dat” or “.raw” extenstion
B. The “log” file The log file is an “output file” Creates and saves a log with all the actions performed by STATA and all the results How to open/close? Go to “File”>“log”>“begin” Go to “File”>”log”>”close” How to view it later? Go to“File”>“log”>“view”, and search for your filename, keeping in mind it has extension “.log”
C. Useful Commands “describe”: STATA will list all the variables, their labels, types, and tell you the # of observations Two types of variables: 1. Numerical 2. String (usually appear in red in the data browser) You can convert a string variable to numerical using the “destring” command: ie. “destring var1, replace” or “destring var1, force replace”
C. Useful Commands “summarize, sum, summ” tells STATA to compute summary statistics (mean, standard deviations, and so forth) for all variables useful to identify outliers and get an idea of your data i.e. summarize (will do all variables) i.e. summarize wage educ (just does wage and educ..note, no “,” between variables)
C. Useful Commands How many observations are there? What is the average value of wage? What is the min and max of tenure?
C. Useful Commands “tabulate, tab” Shows the frequency and percent of each value of the variable in the dataset i.e. tabulate tenure i.e. tab wage (long list, to display all press space bar) i.e. tab educ female (gives education by gender)
C. Useful Commands “generate, gen” Creates a new variable gen weeklywage=wage*40 tab weeklywage gen prevexper=exper-tenure gen lwage=ln(wage)...gen newlwage=ln(wage) gen expersp=exper*exper or gen expersq=(exper)^2
C. Useful Commands “if” command allows you to use only a portion of the observations tab wage if female==1 sum exper if educ>=13 gen expermomwkid=exper-1 if female==1 gen expermomwkid=exper-1 if female==1 & numdep!=0
C. Useful Commands “reg” reg dependent variable independent variable (s) reg wage educ Increase in education by 1 unit (year) is predicted to increase hourly wage by $0.54 R sq= When educ=0, wage is predicted to be -$0.90.
C. SLR Wage regression Increase in education by 1 unit (year) is predicted to increase hourly wage by $ increase by 6 years=6*$0.54=$3.24 R sq=0.1648; variation in education explains 16.4% of variation in wages When educ=0, wage is predicted to be -$0.90. Variance of estimator is
C. Reading the output table SSTotal --The total variability around the mean. SSResidual --The sum of squared errors: SSModel (aka SSE) Observe SSModel=SSTotal - SSResidual. Note that SSModel / SSTotal is equal to , the value of R-Square
C. Reading the output table Coefficients: wagePredicted = *educ Statistics (Ch. 4) t and P>|t| - These columns provide the t-value and 2-tailed p-value used in testing the null hypothesis that the coefficient (parameter) is 0. [95% Conf. Interval] - This shows a 95% confidence interval for the coefficient. (the coefficient will not be statistically significant if the confidence interval includes 0)
C. Reading the output table After the regression, type: predict wagehat, xb Tells us predicted value of wage, given that observations value of education predict uhat, resid tells us portion of wage that is not explained by the independent variable(s)
C. Useful Commands “replace”: replace value with a new one replace wage=4 if wage<4 “drop”: drop entire variable or just some observations drop prevexper drop if educ<=8 “keep” keep wage educ Keep if educ>=8 Be careful with these commands!!
C. Operators < less than > greater than <= less than or equal to >= greater than or equal to == equal to !=. or ~= not equal to & and | or
E. The “do” file A text file that you can type all your commands in and store. Helpful to keep a file of what commands you run in case you want to re-run them later. How to open/save a do file? Go to “Window”>”Do-File Editor” Or click on “New Do-File Editor” Save the do file (.do) To open saved do file, open a new do-file and search for where you saved it.
E. The “do” file: Comments in your do file: /* */ STATA ignores the text that comes after * (does not execute them) these lines can be used to describe what the commands are doing, or allows you to write comments. /*the following command summarizes the variable wage*/ sum wage
E. The “do” file From the STATA do-file editor click “do” for STATA to execute all commands can highlight and click “do” to execute only the highlighted command lines click “run” for STATA to execute all commands, but you won’t see results in viewer/results window All the commands in a do-file can be typed into the command window and run from there, but this is helpful if you want to do same thing over and over.
E. The “do” file Each command must have it’s own line Stata will not run: sum wage sum educ But will run: sum wage sum educ sum wage educ
F. Save your data Saving in Stata format: save “Type in correct path name\file name.dta” Go to “File”>”Save” or “Save As”
G. Other Commands Increasing memory, variables “set memory 200m” “set maxvar 400” Clear the file “clear” For long commands # delimit ; tells STATA that each STATA command ends with a semicolon...instead of line break Do not forget the “;” and write this even after the comment lines that start with *.
G. Other Commands sort i.e. sort educ i.e. sort educ female by educ: summarize wage (Note, must sort first by educ before can use by educ) Graphs twoway (scatter wage educ ) histogram wage
H. MLR Wage Regression Including other covariates doesn’t change estimate on wage by much. R sq increases Variables have expected sign: Higher wage if have more experience, are married or have family(because probably very devoted worker), and live in metropolitan area. Women generally get paid less than men.