Today Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation – GOF.

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Statistical Analysis SC504/HS927 Spring Term 2008
Exercise 7.5 (p. 343) Consider the hotel occupancy data in Table 6.4 of Chapter 6 (p. 297)
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Applied Econometrics Second edition
Using Excel Biostatistics 212 Lecture 4. Housekeeping Questions about Lab 3? –replace vs. recode Final Project Dataset! –“Housekeeping” commands vs. data.
Stata and logit recap. Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with.
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
Logit & Probit Regression
Outliers Split-sample Validation
© Copyright 2000, Julia Hartman 1 An Interactive Tutorial for SPSS 10.0 for Windows © by Julia Hartman Binomial Logistic Regression Next.
A Simple Guide to Using SPSS© for Windows
Ordinal Logistic Regression
Outliers Split-sample Validation
Generating new variables and manipulating data with STATA Biostatistics 212 Session 2.
Multiple Regression – Basic Relationships
Multinomial Logistic Regression Basic Relationships
Standard Binary Logistic Regression
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Logistic Regression – Basic Relationships
Logistic Regression – Complete Problems
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Split Sample Validation General criteria for split sample validation Sample problems.
Stepwise Binary Logistic Regression
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems.
STATA User Group September 2007 Shuk-Li Man and Hannah Evans.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid Using the IEA IDB Analyzer Correlations & Regression.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health
Hierarchical Binary Logistic Regression
9/23/2015Slide 1 Published reports of research usually contain a section which describes key characteristics of the sample included in the study. The “key”
Multinomial Logistic Regression Basic Relationships
A Brief Introduction to Stata(1). 1. Getting Started.
Using SPSS for Windows Part II Jie Chen Ph.D. Phone: /6/20151.
API-208: Stata Review Session Daniel Yew Mao Lim Harvard University Spring 2013.
Dealing with data All variables ok? / getting acquainted Base model Final model(s) Assumption checking on final model(s) Conclusion(s) / Inference Better.
Slide 1 The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics.
Multilevel Linear Models Field, Chapter 19. Why use multilevel models? Meeting the assumptions of the linear model – Homogeneity of regression coefficients.
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
Lecture Slide #1 Logistic Regression Analysis Estimation and Interpretation Hypothesis Tests Interpretation Reversing Logits: Probabilities –Averages.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems Homework Problems.
Advanced Methods and Models in Behavioral Research – 2011/2012 Advanced Models and Methods in Behavioral Research Chris Snijders
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.
Slide 1 The Kleinbaum Sample Problem This problem comes from an example in the text: David G. Kleinbaum. Logistic Regression: A Self-Learning Text. New.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
AMMBR II Gerrit Rooks. Checking assumptions in logistic regression Hosmer & Lemeshow Residuals Multi-collinearity Cooks distance.
Logistic Regression Analysis Gerrit Rooks
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
Stata: Getting Starting and Being Productive with VA Data Give me six hours to chop down a tree and I will spend the first four sharpening the axe. --Abraham.
Analyzing Data. Learning Objectives You will learn to: – Import from excel – Add, move, recode, label, and compute variables – Perform descriptive analyses.
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Using a set-up file to read ASCII data into Stata
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
Principles and Worldwide Applications, 7th Edition
An Interactive Tutorial for SPSS 10.0 for Windows©
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
DEPARTMENT OF COMPUTER SCIENCE
ECONOMETRICS ii – spring 2018
Multiple logistic regression
Introduction to Logistic Regression
Stata Basic Course Lab 2.
Multiple Regression – Split Sample Validation
SEM: Step by Step In AMOS and Mplus.
Chapter 13 Excel Extension: Now You Try!
Presentation transcript:

Today Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation – GOF – Coefficients – Checking assumptions

Stata file types.ado – programs that add commands to Stata.do – Batch files that execute a set of Stata commands.dta – Data file in Stata’s format.log – Output saved as plain text by the log using command

The working directory The working directory is the default directory for any file operations such as using & saving data, or logging output cd “d:\my work\”

Saving output to log files Syntax for the log command –log using filename [, append replace [smcl|text]] To close a log file –log close

Using and saving datasets Load a Stata dataset – use d:\myproject\data.dta, clear Save – save d:\myproject\data, replace Using change directory – cd d:\myproject – Use data, clear – save data, replace

Entering data Data in other formats – You can use SPSS to convert data – You can use the infile and insheet commands to import data in ASCII format Entering data by hand – Type edit or just click on the data-editor button

Do-files You can create a text file that contains a series of commands Use the do-editor to work with do-files Example I

Adding comments // or * denote comments stata should ignore Stata ignores whatever follows after /// and treats the next line as a continuation Example II

A recommended structure //if a log file is open, close it capture log close //dont'pause when output scrolls off the page set more off //change directory to your working directory cd d:\myproject //log results to file myfile.log log using myfile, replace text // * myfile.do-written 7 feb 2010 to illustrate do-files // your commands here //close the log file log close

Serious data analysis Ensure replicability use do+log files Document your do-files – What is obvious today, is baffling in six months Keep a research log – Diary that includes a description of every program you run Develop a system for naming files

Serious data analysis New variables should be given new names Use labels and notes Double check every new variable ARCHIVE

The Stata syntax Regress y x1 x2 if x3 <20, cluster(x4) 1.Regress = Command – What action do you want to performed 2.y x1 x2 = Names of variables, files or other objects – On what things is the command performed 3.if x3 <20 = Qualifier on observations – On which observations should the command be performed 4., cluster(x4) = Options – What special things should be done in executing the command

Examples tabulate smoking race if agemother > 30, row Example of the if qualifier – sum agemother if smoking == 1 & weightmother < 100

Elements used for logical statements OperatorDefinitionExample ==Equal toIf male == 1 !=Not equal toIf male !=1 >Greater thanIf age > 20 >=Greater than or equal toIf age >=21 <Less thanIf age<66 <=Less than or equal toIf age<=65 &AndIf age==21&male ==1 |orIf age =65

Missing values Automatically excluded when Stata fits models, they are stored as the largest positive values Beware – The expression ‘age > 65’ can thus also include missing values – To be sure type: ‘age > 65 & age !=.’

Selecting observations drop variable list Keep variable list drop if age < 65

Creating new variables generate command – generate age2 = age * age – generate – see help function – !!sometimes the command egen is a useful alternative, f.i. – egen meanage = mean(age)

Useful functions FunctionDefinitionExample +additiongen y = a+b -subtractiongen y = a-b /Divisiongen density=population/area *Multiplicationgen y = a*b ^Take to a powergen y = a^3 lnNatural loggen lnwage = ln(wage) expexponentialgen y = exp(b) sqrtSquare rootGen agesqrt = sqrt(age)

Replace command replace has the same syntax as generate but is used to change values of a variable that already exists gen age_dum =. replace age = 0 if age < 5 replace age = 1 if age >=5

Recode Change values of exisiting variables – Change 1 to 2 and 3 to 4: recode origvar (1=2)(3=4), gen(myvar1) – Change missings to 1: recode origvar (.=1), gen(origvar)

Logistic regression Lets use a set of data collected by the state of California from 1200 high schools measuring academic achievement. Our dependent variable is called hiqual. Our predictor variable will be a continuous variable called avg_ed, which is a continuous measure of the average education (ranging from 1 to 5) of the parents of the students in the participating high schools.

OLS in Stata

Logistic regression in Stata

Multiple predictors

Model fit: the likelihood ratio test

Model fit: LR test

Pseudo R2: proportional change in LL

Classification Table

Interpreting coefficients: significance

Comparing models

After the full model and storage, estimate nested model

Likelihood ratio test

Interpretation of coefficients: direction

Interpretation of coefficients: Magnitude

the assumptions of logistic regression The true conditional probabilities are a logistic function of the independent variables. No important variables are omitted. No extraneous variables are included. The independent variables are measured without error. The observations are independent. The independent variables are not linear combinations of each other.

Hosmer & Lemeshow Test divides sample in subgroups, checks whether difference between observed and predicted is about equal in these groups Test should not be significant (indicating no difference)

Goodness of fit: Hosmer & Lemeshow Average Probability In j th group

First logistic regression

Then postestimation command

Specification error

Including interaction term helps

Ok now

Multicollinearity

Influential observations

To do Perform a logistic regression analysis Use apilog.dta Awards = dependent variable