Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –

Slides:



Advertisements
Similar presentations
Basics of Biostatistics for Health Research Session 2 – February 14 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Advertisements

Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Using Excel Biostatistics 212 Lecture 4. Housekeeping Questions about Lab 3? –replace vs. recode Final Project Dataset! –“Housekeeping” commands vs. data.
Stata and logit recap. Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with.
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
Introduction to SPSS Allen Risley Academic Technology Services, CSUSM
Outliers Split-sample Validation
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
© Copyright 2000, Julia Hartman 1 An Interactive Tutorial for SPSS 10.0 for Windows © by Julia Hartman Binomial Logistic Regression Next.
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
A Simple Guide to Using SPSS© for Windows
Outliers Split-sample Validation
Stata Introduction Sociology 229A, Class 2 Copyright © 2008 by Evan Schofer Do not copy or distribute without permission.
Generating new variables and manipulating data with STATA Biostatistics 212 Session 2.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
Standard Binary Logistic Regression
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Logistic Regression – Basic Relationships
Logistic Regression – Complete Problems
8/9/2015Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Split Sample Validation General criteria for split sample validation Sample problems.
Introduction to SPSS (For SPSS Version 16.0)
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems.
STATA User Group September 2007 Shuk-Li Man and Hannah Evans.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid Using the IEA IDB Analyzer Correlations & Regression.
Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health
Hierarchical Binary Logistic Regression
Scottish Social Survey Network: Master Class 1 Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling.
A Brief Introduction to Stata(1). 1. Getting Started.
Using SPSS for Windows Part II Jie Chen Ph.D. Phone: /6/20151.
API-208: Stata Review Session Daniel Yew Mao Lim Harvard University Spring 2013.
Dealing with data All variables ok? / getting acquainted Base model Final model(s) Assumption checking on final model(s) Conclusion(s) / Inference Better.
Slide 1 The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics.
Multilevel Linear Models Field, Chapter 19. Why use multilevel models? Meeting the assumptions of the linear model – Homogeneity of regression coefficients.
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
Then click the box for Normal probability plot. In the box labeled Standardized Residual Plots, first click the checkbox for Histogram, Multiple Linear.
12c.1 ANOVA - A Mixed Design (Between And within subjects) These notes are developed from “Approaching Multivariate Analysis: A Practical Introduction”
Using Excel Biostatistics 212 Lecture 4. Housekeeping Questions about Lab 3? –replace vs. recode –Cross-checking/recoding missing values –Analysis of.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.
Slide 1 The Kleinbaum Sample Problem This problem comes from an example in the text: David G. Kleinbaum. Logistic Regression: A Self-Learning Text. New.
Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education.
STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.
Logistic Regression. Linear Regression Purchases vs. Income.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
AMMBR II Gerrit Rooks. Checking assumptions in logistic regression Hosmer & Lemeshow Residuals Multi-collinearity Cooks distance.
PSY6010: Statistics, Psychometrics and Research Design Professor Leora Lawton Spring 2007 Wednesdays 7-10 PM Room 204.
Logistic Regression Analysis Gerrit Rooks
Today Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation – GOF.
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
Stata: Getting Starting and Being Productive with VA Data Give me six hours to chop down a tree and I will spend the first four sharpening the axe. --Abraham.
Access Queries and Forms. Adding a New Field  To insert a field after you have saved your table, open Access, and open the table  It is easier to add.
Analyzing Data. Learning Objectives You will learn to: – Import from excel – Add, move, recode, label, and compute variables – Perform descriptive analyses.
Using a set-up file to read ASCII data into Stata
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
An Interactive Tutorial for SPSS 10.0 for Windows©
DEPARTMENT OF COMPUTER SCIENCE
ECONOMETRICS ii – spring 2018
Multiple logistic regression
Stata Basic Course Lab 4.
Lab Overview Aiman Moyaid Said | PhD.
Stata Basic Course Lab 2.
Multiple Regression – Split Sample Validation
Multinomial Logistic Regression: Complete Problems
SEM: Step by Step In AMOS and Mplus.
Presentation transcript:

Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation – Goodness Of Fit – Coefficients – Checking assumptions

Introduction to Stata Note: we did this interactively for the larger part …

Stata file types.ado – programs that add commands to Stata.do – Batch files that execute a set of Stata commands.dta – Data file in Stata’s format.log – Output saved as plain text by the log using command

The working directory The working directory is the default directory for any file operations such as using & saving data, or logging output cd “d:\my work\”

Saving output to log files Syntax for the log command log using [filename], replace text To close a log file log close

Using and saving datasets Load a Stata dataset use d:\myproject\data.dta, clear Save save d:\myproject\data, replace Using change directory cd d:\myproject use data, clear save data, replace

Entering data Data in other formats – You can use SPSS to convert data (read in or save as a data file in another format, for instance Stata’s.dta format) – You can use the infile and insheet commands to import data in ASCII format Entering data by hand – Type edit or just click on the data-editor button

Do-files You can create a text file that contains a series of commands. It is the equivalent of SPSS syntax (but way easier to memorize) Use the do-file editor to work with do-files

Adding comments in do-files // or * denote comments stata should ignore Stata ignores whatever follows after /// and treats the next line as a continuation Example II

A recommended template for do-files capture log close //if a log file is open, close it, otherwise disregard set more off //dont'pause when output scrolls off the page cd d:\myproject //change directory to your working directory log using myfile, replace text //log results to file myfile.log … here you put the rest of your Stata commands … log close //close the log file

Serious data analysis Ensure replicability use do+log files Document your do-files – What is obvious today, is baffling in six months Keep a research log – Diary that includes a description of every program you run Develop a system for naming files

Serious data analysis New variables should be given new names Use variable labels and notes Double check every new variable ARCHIVE

Stata syntax examples

Stata syntax example regress y x1 x2 if x3<20, cluster(x4) 1.regress = command – What action do you want to performed 2.y x1 x2 = Names of variables, files or other objects – On what things is the command performed 3.if x3 <20 = Qualifier on observations – On which observations should the command be performed 4., cluster(x4) = Options – What special things should be done in executing the command

More examples tabulate smoking race if agemother>30, row More elaborate if-statements: sum agemother if smoking==1 & weightmother<100

Elements used for logical statements OperatorDefinitionExample ==is equal in value to if male == 1 !=not equal in value to if male !=1 >greater than if age > 20 >=greater than or equal to if age >=21 <less than if age < 66 <=less than or equal to if age <=65 &and if age==21 & male==1 |or if age =65

Missing values Automatically excluded when Stata fits models (same as in SPSS); they are stored as the largest positive values Beware!! – The expression “ age>65 ” can thus also include missing values (these are also larger than 65) – To be sure type: “ age>65 & age!=.”

Selecting observations drop [variable list] keep [variable list] drop if age<65 Note: they are then gone forever. This is not SPSS’s [filter] command.

Creating new variables Generating new variables generate age2 = age*age (for more complicated functions, there also exists a command “egen”, as we will see later)

Useful functions FunctionDefinitionExample +addition gen y = a+b -subtraction gen y = a-b /Division gen density=population/area *Multiplication gen y = a*b ^Take to a power gen y = a^3 lnNatural log gen lnwage = ln(wage) expexponential gen y = exp(b) sqrtSquare root gen agesqrt = sqrt(age)

Replace command replace has the same syntax as generate but is used to change values of a variable that already exists gen age_dum =. replace age_dum = 0 if age < 5 replace age_dum = 1 if age >=5

Recode Change values of existing variables – Change 1 to 2 and 3 to 4 in origvar, and call the new variable myvar1: recode origvar (1=2)(3=4), gen(myvar1) – Change 1’s to missings in origvar, and call the new variable myvar2: recode origvar (1=.), gen(myvar2)

Logistic regression Logistic

Logistic regression We use a set of data collected by the state of California from 1200 high schools measuring academic achievement. Our dependent variable is called hiqual. Our predictor variable will be a continuous variable called avg_ed, which is a measure of the average education (ranging from 1 to 5) of the parents of the students in the participating high schools.

OLS in Stata

Logistic regression in Stata

Multiple predictors

MODEL FIT Consider model fit using: 1)The likelihood ratio test 2)The pseudo-R2 (proportional change in log-likelihood) 3)The classification table

Model fit: the likelihood ratio test

Model fit: LR test

Pseudo R2: proportional change in LL

A second measure of fit: the classification Table

Classification table for the model with the predictors

Interpreting coefficients

Interpreting coefficients: significance = /0.74

Interpretation of coefficients: direction

Interpretation of coefficients: magnitude

Interpretation of coefficients: Magnitude

Assumptions and outliers

The link test (sort equivalent to linearity assumption in MR)

Multicollinearity (here we cheat a little)

Influential observations: check the residuals

Have a closer look at the outlier residual

And this helps a little (but not much)

Assumptions (continued): The model should fit equally well everywhere

Goodness of fit: Hosmer & Lemeshow Average Probability In j th group

First logistic regression

Then postestimation command

Including interaction term helps...

... as you can see here Ok now

To do Perform a logistic regression analysis (check interaction effects as well!)