Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –

Similar presentations


Presentation on theme: "Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –"— Presentation transcript:

1 Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation – Goodness Of Fit – Coefficients – Checking assumptions

2 Introduction to Stata Note: we did this interactively for the larger part …

3 Stata file types.ado – programs that add commands to Stata.do – Batch files that execute a set of Stata commands.dta – Data file in Stata’s format.log – Output saved as plain text by the log using command

4 The working directory The working directory is the default directory for any file operations such as using & saving data, or logging output cd “d:\my work\”

5 Saving output to log files Syntax for the log command log using [filename], replace text To close a log file log close

6 Using and saving datasets Load a Stata dataset use d:\myproject\data.dta, clear Save save d:\myproject\data, replace Using change directory cd d:\myproject use data, clear save data, replace

7 Entering data Data in other formats – You can use SPSS to convert data (read in or save as a data file in another format, for instance Stata’s.dta format) – You can use the infile and insheet commands to import data in ASCII format Entering data by hand – Type edit or just click on the data-editor button

8 Do-files You can create a text file that contains a series of commands. It is the equivalent of SPSS syntax (but way easier to memorize) Use the do-file editor to work with do-files

9 Adding comments in do-files // or * denote comments stata should ignore Stata ignores whatever follows after /// and treats the next line as a continuation Example II

10 A recommended template for do-files capture log close //if a log file is open, close it, otherwise disregard set more off //dont'pause when output scrolls off the page cd d:\myproject //change directory to your working directory log using myfile, replace text //log results to file myfile.log … here you put the rest of your Stata commands … log close //close the log file

11 Serious data analysis Ensure replicability use do+log files Document your do-files – What is obvious today, is baffling in six months Keep a research log – Diary that includes a description of every program you run Develop a system for naming files

12 Serious data analysis New variables should be given new names Use variable labels and notes Double check every new variable ARCHIVE

13 Stata syntax examples

14 Stata syntax example regress y x1 x2 if x3<20, cluster(x4) 1.regress = command – What action do you want to performed 2.y x1 x2 = Names of variables, files or other objects – On what things is the command performed 3.if x3 <20 = Qualifier on observations – On which observations should the command be performed 4., cluster(x4) = Options – What special things should be done in executing the command

15 More examples tabulate smoking race if agemother>30, row More elaborate if-statements: sum agemother if smoking==1 & weightmother<100

16 Elements used for logical statements OperatorDefinitionExample ==is equal in value to if male == 1 !=not equal in value to if male !=1 >greater than if age > 20 >=greater than or equal to if age >=21 <less than if age < 66 <=less than or equal to if age <=65 &and if age==21 & male==1 |or if age =65

17 Missing values Automatically excluded when Stata fits models (same as in SPSS); they are stored as the largest positive values Beware!! – The expression “ age>65 ” can thus also include missing values (these are also larger than 65) – To be sure type: “ age>65 & age!=.”

18 Selecting observations drop [variable list] keep [variable list] drop if age<65 Note: they are then gone forever. This is not SPSS’s [filter] command.

19 Creating new variables Generating new variables generate age2 = age*age (for more complicated functions, there also exists a command “egen”, as we will see later)

20 Useful functions FunctionDefinitionExample +addition gen y = a+b -subtraction gen y = a-b /Division gen density=population/area *Multiplication gen y = a*b ^Take to a power gen y = a^3 lnNatural log gen lnwage = ln(wage) expexponential gen y = exp(b) sqrtSquare root gen agesqrt = sqrt(age)

21 Replace command replace has the same syntax as generate but is used to change values of a variable that already exists gen age_dum =. replace age_dum = 0 if age < 5 replace age_dum = 1 if age >=5

22 Recode Change values of existing variables – Change 1 to 2 and 3 to 4 in origvar, and call the new variable myvar1: recode origvar (1=2)(3=4), gen(myvar1) – Change 1’s to missings in origvar, and call the new variable myvar2: recode origvar (1=.), gen(myvar2)

23 Logistic regression Logistic

24 Logistic regression We use a set of data collected by the state of California from 1200 high schools measuring academic achievement. Our dependent variable is called hiqual. Our predictor variable will be a continuous variable called avg_ed, which is a measure of the average education (ranging from 1 to 5) of the parents of the students in the participating high schools.

25 OLS in Stata

26

27 Logistic regression in Stata

28

29 Multiple predictors

30 MODEL FIT Consider model fit using: 1)The likelihood ratio test 2)The pseudo-R2 (proportional change in log-likelihood) 3)The classification table

31 Model fit: the likelihood ratio test

32 Model fit: LR test

33 Pseudo R2: proportional change in LL

34 A second measure of fit: the classification Table

35 Classification table for the model with the predictors

36 Interpreting coefficients

37 Interpreting coefficients: significance -16.29 = -12.05/0.74

38 Interpretation of coefficients: direction

39

40 Interpretation of coefficients: magnitude

41 Interpretation of coefficients: Magnitude

42 Assumptions and outliers

43 The link test (sort equivalent to linearity assumption in MR)

44 Multicollinearity (here we cheat a little)

45 Influential observations: check the residuals

46 Have a closer look at the outlier residual

47

48 And this helps a little (but not much)

49 Assumptions (continued): The model should fit equally well everywhere

50 Goodness of fit: Hosmer & Lemeshow Average Probability In j th group

51 First logistic regression

52 Then postestimation command

53 Including interaction term helps...

54 ... as you can see here Ok now

55 To do Perform a logistic regression analysis (check interaction effects as well!)


Download ppt "Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –"

Similar presentations


Ads by Google