Download presentation
Presentation is loading. Please wait.
Published byKevin Walker Modified over 9 years ago
1
STATA: An Introduction Into the Basics Prof. Dr. Herbert Brücker University of Bamberg Seminar “Migration and the Labour Market” May 31, 2012
2
Repetition of last meeting (I/III) 1The Borjas (QJE 2003) model The regression equation: y ijt = wage or unemployment rate p ijt = migration share s i = vector of eduction dummies (i = 1, 2, 3, 4) x j = vector of work experience dummies (j = 1, 2, …, 8) π t = vector of time dummies (t = 1, 2, …, 25) s i × x j = vector of time-experience dummies s i × π t = vector of education-time dummies x i × π t = vector of experience-time dummies
3
Repetition of last meeting (II/III) The linear regression model The univariate model The multivariate model Econometric basics endogenous (LHS) and exogenous (RHS) variables regression coefficients/parameters error (disturbance) term assumptions of linear regression model ordinary least squares estimator Interpretation of regression output coefficient standard deviation t-statistics p-value
4
Repetition of last meeting (III/III) Regression diagnostics R-squared and adjusted R-squared F-statistics Degrees of freedom
5
Contents of Today’s Meeting 1The STATA Software package 2The Structure of STATA: Three files 3Getting started 4The STATA Menues 5The General Structure of STATA 6Working with DO FILES 7Describe your data 8Running regressions
6
1STATA SOFTWARE PACKAGE Image of STATA DVD in the Campus Net under: “\\software\campliz” or: “\\software.uni-bamberg.de\campliz” Then: Start -> Ausführen Them: INSERT your licence: Serial number: …. Code: …. Authorisation key: ….
7
2Structure of STATA: Three files 1.The DATA file (.dta) where you have your data. You can watch you data with the DATA BROWSER and edit your data with the DATA EDITOR 2.The DO file (.do) where you run and save your commands of any session. Very useful (i) to organise your data set, (ii) to see what you have done in the last session, (iii) to replicate what you have done in last session, (iv) to exchange work with your collaborators. You write and run your commands with the DO FILE EDITOR
8
2Structure of STATA: Three files 3.The LOG file (.log) which automatically reports all things which you have done during your session. Is automatically saved after your session. Not often used, but useful if something goes wrong.
9
3Getting started: the STATA empty window
10
3Getting started: The STATA empty window The main window: shows commands, output and messages which arrive during your session The command window: here you can type your commands The variables window: Shows variables of your dataset The review window reports your previous commands
11
3Getting started: the windows after data loading List of variables Reports commands (one in this case) Reports result of commands
12
3Getting started In principle, you can start your STATA session by (i) loading your data set and (ii) typing your commands in the command window. It is however recommended to use the DO FILE EDITOR right from the beginning. But let’s look at the STATA menues first.
13
4The STATA Menues For watching your data and changing your data by hand you need the DATA BROWSER and the DATA EDITOR. For starting and running your DO files you need the DO FILE EDITOR. The other menues are not relevant for the beginning. The data path The data editor The data browser The do file editor The variables manager The help menue
14
4The STATA Menues: The DATA EDITOR/BROWSER The difference between the data browser and the data editor is that you can manipulate data in the editor and only watch them in the browser.
15
4The STATA Menues: The DATA EDITOR/BROWSER You have two types of variables: NUMERICAL variables (black) and so-called STRING variables (blue) (e.g. text). STATA can identify STRING variables, but you cannot do numerical operations with them. STRING variable NUMERICAL variable
16
4The STATA Menues: The DATA EDITOR/BROWSER HINT: You can transfer data e.g. from an EXCEL file into a STATA file by copy and paste (STRG C + STRG V) and vice versa in the data editor. But you have to be careful that you EXCEL is run in English, otherwise your data might be read as STRING variables by STATA. Of course there are many other ways to transfer data from Excel to STATA.
17
5 The Grammar of STATA General Structure of STATA commands [prefix :] command [varlist] [if] [in] [weight] [, options]
18
5 General structure of STATA We will concentrate on: [prefix :] command [varlist] [if] [in] [weight] [, options]
19
5General structure of STATA We will concentrate on: [prefix :] command [varlist] [if] [in] [weight] [, options] What you want to do?
20
5 General structure of STATA There are two types of variables (data): numerical variables, e.g.: 0, 1, 501, 0.5, -12 etc. string variables, e.g.: no voc train, male, female etc. How to deal with the data types: Numerical variables: you can do all mathematical operations, e.g. var1 + var2, var1/var2, var1*var2 etc. String variables: You have to use quotation marks for identifcation, e.g. var1 = 1 if sex == “female”
21
6Working with DO FILES The standard approach is to start your work with a DO FILE Click on the DO FILE editor button after starting STATA Load an existing DO FILE or start a new one Start the DO FILE with a command to load your data, e.g. use “path\data.dta”, clear or, more specifically, with use “C:\Users\Herbert\Documents\STATA\Wagecurve\DE.dta", clear
22
Open your DO FILE editor After starting STATA click on the DO FILE editor button The do file editor
23
How does a DO FILE look like Commands Descriptions of what you have done in stars *
24
The DO FILE menue Clicking this button runs the entire DO FILE (not recommended) Clicking this button runs a selection of marked commands (recommended) Note: STATA stops the DO File execution after the first mistake in your commands. That makes it advisable to proceed step by step.
25
6Step 1: Loading your data use “C:\Users\Herbert\Documents\STATA\Wagecurve\DE.dta", clear The use command loads the data the “path\DE.dta” provides STATA the information on the path where to find the data and the name of the data file (e.g. DE.dta) the clear command after the comma clears the memory, which is needed if you have used other data sets before Push the “Execute Selection (DO)” button to run the selected command(s) You can also run the entire DO File by pushing the “Execute Selection Quietly (RUN)” button
26
Loading your data (I/II) 1.Write the command use „path\XXX.dta“, clear 2.Mark the line and run the command by clicking the execution button
27
Loading your data (II/II)
28
6Step 2: Manipulating your data (I/VI) It is useful to save only a basic data set and generate the variables you need at the beginning of each session. That saves storage space (recommended in case of large data sets) Generating DUMMY variables Use the gen command, e.g. gen D_ed1 = 0 This creates a variable consisting only of zeros Then use the replace command, e.g. replace D_ed1 = 1 if ed1 == 1 This replaces the zeros with 1 if the variables ed1 has a values of 1.
29
Generating Dummy Variables: DO FILE commands
30
Generating Dummy Variables: STATA main window
31
6Step 2: Manipulating your data (II/VI) Another example for generating dummy variables: Use the gen command, e.g. gen year_1 = 0 This creates a variable consisting only of zeros Then use the replace command, e.g. Year_1 = 1 if year == 1991 This replaces the zeros with 1 if the year variable has a values of 1991 Note: The STATA syntax requires that you have to use after an if command always a double == for the definition of the value
32
6Step 2: Manipulating your data (III/VI) Creating series of dummy variables if it is too cumbersome to create them individually, e.g. in case of interaction dummies Syntax: forvalues i = 1/3 { forvalues j = 1/4{ gen D_ed`i’*D_ex`j’ } } i.e. for each value I = 1,2,3 and each value j = 1,2,3,4 you generate an interaction dummy by multiplying the dummy variables for education and experience. Take care of the {}!
33
Generating Dummy Variables: Advanced techniques
36
6Step 2: Manipulating your data (IV/VI) Transforming variables into log variables Syntax: gen ln_wijt = ln wijt By using again the gen command you can transform the wage variable wijt into the natural logarithm of the wage by applying the ln operator
37
Transforming data
38
6Step 2: Manipulating your data (V/VI) Useful operators in STATA: + add -subtract *multiply /divide lntransform into natural log exptransform into exponential value
39
6Step 2: Manipulating your data (VI/VI) Control what you have done Check you variables for mistakes in the browse modus of the data set You can delete wrong variable by using the drop command, e.g. drop ln_wijt Which simply drops your variable from the data set. Then you can create the correct one.
40
6Step 3Organize your data with globals It is not convenient if you have to work with too many variables, e.g. 200 dummy variables (that is cumbersome to type some by hand) You can define globals, which comprise many variables Syntax: glo [name of global [list of variables] glo D_i Ded_1 Ded_2 D_ed3 i.e the global D_i consists of the variables Ded_1 Ded_2 and Ded_3 If you want to use the global later you have to type $[globalname], i.e. $D_i
41
Creating globals
42
7Describe your data (I/II) Any econometric analysis requires in the first step that you provide descriptive statistics to the reader. This helps to understand what’s going on This can be easily done with the sum command sum [variable name(s)] sum LHijt LFijt wijt ln_wijt The sum command creates a table with the complete descriptive statistics, i.e. observations, mean, standard deviation, minimum, maximum
43
Summary statistics
45
7Describe your data (II/II) Present your data graphically It is usually helpful if you present the main information /vairables in your data set graphically There are many graphical commands, use the Graphics menue the simplest way is to show the development of your variable(s) over time Syntax: graph twoway line [variable1] [variable2] if … graph twoway line wqjt year if ed==1 & ex == 1 This produces a two-dimensional variable with the wage on the vertical and the year on the horizontal axis for education group 1 and experience group 1
46
Making a graph
47
Graph of mean wage in education 1 and experience 1 group
48
Graph of migration rate in edu 1 and exp 1 group
49
8Running regressions The standard OLS regression command in STATA is Syntax regress depvar [list of indepvar ] [if], [options] regress ln_wijt m_ijt D_i D_j D_t
50
8Running Regressions Recall: What is a linear regression model The general econometric model: γ i indicates the dependent (or: endogenous) variable x 1i,ki exogenous variable, explaining the independent variable β 0 constant or the y-axis intercept (if x = 0) β 1,2,k regression coefficient or parameter of regression ε i residual, disturbance term
51
Running a regression model Regression command Dependent variable Independent variables Globals !
52
Running a Regression: Output
53
How to interpret the output of a regression 1. Observations 2. fit of the model 3. F-Test 4. R-squared 5. adjusted R- squared 6. Root Mean Standard Error analysis of significance levels variance of model β0β0 β1β1 degrees of freedom 95% confidence interval
54
8Running Regressions: Panel Models Very often you use panel models, i.e. models which have a group and time series dimension There exist special estimators for this, e.g. fixed or random effects models A fixed effects model is a model where you have a fixed (constant) effect for each individual/group. This is equivalent to a dummy variable for each group A random effects model is a model where you have a random effect for each individual group, which is based on assumptions on the distribution of individual effects
55
8Running Regressions: Panel Models Preparation for Panel Models: For running panel models STATA needs to identify the group(individual) and time series dimension Therefore you need an index for each group and an index for each time period Then use the tsset command to organize you dataset as a panel data set Syntax: tsset index year where index is the group/individual index and year the time index
56
Preparation: Running the tsset command
57
8Running Regressions: Panel Models Then you can use panel estimatos, e.g. the xtreg estimator Syntax xtregress depvar [list of indepvar ] [if], [options] regress ln_wijt m_ijt, fe i.e. in the example we run a simple fixed effects panel regression model which is equivalent to include a dummy variable for each group (in this case education-experience group)
58
Running a Panel Regression: command
59
Running a Panel Regression: Output
60
Next Meeting: June 14, 2012 Room RZ 00/006
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.