L3: BIG STATA CONCEPTS Getting started with Stata Angela Ambroz May 2015.

Slides:



Advertisements
Similar presentations
Stata as a Data Entry Management Tool
Advertisements

Research Methods Lecture 3 More STATA Ian Walker Room S2.109   Slides available at:
Felicity Clemens 18 May 2005 Data cleaning: hints and tips Felicity Clemens Stata Users’ Group meeting London, 17 & 18 th May 2005.
FLIPPING THE CLASSROOM: ADVENTURES IN STUDENTS’ SELF DIRECTED STUDY ERI TOMITA AND JULIE DEVINE.
Computing for Research I Spring 2011 Primary Instructor: Elizabeth Garrett-Mayer Stata Programming February 28.
Computing for Research I Spring 2013 Primary Instructor: Elizabeth Garrett-Mayer Stata Programming February 21.
ANOVA: PART II. Last week  Introduced to a new test:  One-Way ANOVA  ANOVA’s are used to minimize family-wise error:  If the ANOVA is statistically.
The World’s Fastest Crash Course in Statistics Or, What You Need to Know to Answer Your Research Question 13 November 2006.
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 An Introduction to IBM SPSS PSY450 Experimental Psychology Dr. Dwight Hennessy.
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Course director: Mark Pletcher Teaching Assistant: Lee Zane.
Division Example 2x - 3y + 4z = 10 x + 6y - 3z = 4 -5x + y + 2z = 3 A*X = B where A = B = >> X = A\B X =
CSC 160 Computer Programming for Non-Majors Lecture #7: Variables Revisited Prof. Adam M. Wittenstein
Introduction to a Programming Environment
Slide 1. Slide 2 Administrivia Nate's office hours are Wed, 2-4, in 329 Soda! TA Clint will be handing out a paper survey in class sometime this week.
Games and Simulations O-O Programming in Java The Walker School
Simple Python Loops Sec 9-7 Web Design.
Kids Computer Club House
STATA User Group September 2007 Shuk-Li Man and Hannah Evans.
L1: INTRODUCTION Getting started with Stata Angela Ambroz May 2015.
TECHNOLOGY TOOLS BY ALEXA BELTZ & MICKAELA PEREZ.
L2: BECOMING SELF- SUFFICIENT IN STATA Getting started with Stata Angela Ambroz May 2015.
Stata 12 Merging Guide Nathan Favero Texas A&M University October 19, 2012.
An Introduction to Textual Programming
Making a Timer in Alice.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Project organisation in Stata Adrian Spoerri and Marcel Zwahlen Department of Social and Preventive Medicine University of Berne, Switzerland Research.
Scottish Social Survey Network: Master Class 1 Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling.
CS161 Topic #21 CS161 Introduction to Computer Science Topic #2.
Discussion of Assignment 9 1 CSE 2312 Computer Organization and Assembly Language Programming Vassilis Athitsos University of Texas at Arlington.
Experiences with multiple propensity score matching Jan Hagemejer & Joanna Tyrowicz University of Warsaw & National Bank of Poland.
Key Data Management Tasks in Stata
STATA Mini Course Fall 2015 Jane Leber Herr Littauer 113 1Stata Mini Course – Spring 2015.
Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007.
Chapter 7 File I/O 1. File, Record & Field 2 The file is just a chunk of disk space set aside for data and given a name. The computer has no idea what.
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
Diagnostic Pathfinder for Instructors. Diagnostic Pathfinder Local File vs. Database Normal operations Expert operations Admin operations.
Introduction to Statistical Computing in Clinical Research Biostatistics 212.
Getting Started With Stata Session 1 Jim Anthony John Troost Department of Epidemiology Michigan State University.
BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.
11/25/2015Slide 1 Scripts are short programs that repeat sequences of SPSS commands. SPSS includes a computer language called Sax Basic for the creation.
Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education.
STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.
Updated on: September 4, 2010 CIS67 Foundations for Creating Web Pages Professor Al Fichera.
Comparison of different output options from Stata
Coach Michele’s Group Coaching July 5, Copyright (c) Michele Caron, 2011 Today’s Topic Success and Productivity – The Power of No.
Introduction to Computer Programming - Project 2 Intro to Digital Technology.
1) the value you want to look up Vlookups are really easy......just remember you need 4 things: 2) The table range you want to look up to 3) The column.
R Workshop #2 Basic Data Analysis. What we did last week: Understand the basics of how R works Generated objects (vectors, matrices, etc.) Read in data.
Stata Review Session Economics 1018 Abby Williamson and Hongyi Li November 17, 2006.
: LSS1 Longitudinal Studies Seminars: Longitudinal Analyses Using STATA Stirling University, Data and Variable Management Paul Lambert.
Today’s Agenda ML Development Workflow –Emacs –Using use –The REPL More ML –Shadowing Variables –Debugging Tips –Boolean Operations –Comparison Operations.
Ec 2390: Section 1 Useful STATA commands Jack Willis September 14th, 2015.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Welcome to Introduction to Psychology! Let’s share a bit about where we are all from…
Mannheim Research Institute for the Economics of Aging SHARE Data Cleaning General rules and procedures Stephanie Stuck MEA Antwerp.
Econ 326 Prof. Mariana Carrera Lab Session X [DATE]
Introduction to Python
Econometrics 704 Emilio Cuilty
ECONOMETRICS ii – spring 2018
How to Run a Java Program
Introduction to Stata Spring 2017.
STATA User Group September 2007
Introduction to TouchDevelop
How to Run a Java Program
Stata Basic Course Lab 4.
Stata Basic Course Lab 2.
Ordinary Least Square estimator using STATA
Presentation transcript:

L3: BIG STATA CONCEPTS Getting started with Stata Angela Ambroz May 2015

Today Homework review and questions How do we work with multiple datasets? What is a loop? How do we loop? What are locals, and why are they helpful? Which commands can I use for some inferential statistics?

Review Homework solution video will be up this weekend: Lecture 2’s key messages: Organizing a.do file Importing data (use file.dta, clear) Cleaning Basic summary statistics (tab q1) Questions?

Big concepts in Stata First-level benefit of Stata: replicable.do files, full suite of statistical tests But the real power: 1. Combining datasets (merge, append) 2. Loops (foreach) 3. Macros Understanding these concepts will make you work faster and more reliably

Multiple datasets Sometimes your data is spread out across 2+ files Stata has two commands for combining datasets: 1. append – More people (going long) 2. merge – Same people, more questions (going wide) The character “Data” (an android) from Star Trek: The Next Generation.

The append command When you use append, you are adding a dataset to the bottom of another one idnameagegenderurban_ rural 001Elvis42Maleurban 002Youdi19Malerural 003Nellin22Femalerural idnameagegenderurban_ rural 675Miriam50Femaleurban 918Angela41Femalerural file1.dta file2.dta

The append command use file1.dta, clear append using file2.dta idnameagegenderurban_ rural 001Elvis42Maleurban 002Youdi19Malerural 003Nellin22Femalerural 675Miriam50Femaleurban 918Angela41Femalerural file1.dta file2.dta

The merge command idnameagegenderurban_ rural 001 Elvis 42Maleurban 002 Youdi 19Malerural 003 Nellin 22Femalerural idnameasset_q uint educati on 421 Angela Youdi Natalie 5 3 file1.dtafile2.dta When you use merge, you are combining datasets about the same people/households/observations.

The merge command idnameagegenderurban_ rural 001 Elvis 42Maleurban 002 Youdi 19Malerural 003 Nellin 22Femalerural idnameasset_q uint educati on 421 Angela Youdi Natalie 5 3 file1.dtafile2.dta merge 1:1 id using file4.dta

The merge command idnameagegenderurban_ rural 001 Elvis 42Maleurban 002 Youdi 19Malerural 003 Nellin 22Femalerural idnameasset_q uint educati on 421 Angela Youdi Natalie 5 3 file1.dtafile2.dta merge 1:1 id using file4.dta

The merge command idnameagegenderurban_ru ral asset_qu int educatio n _merge 001 Elvis 42Maleurban Youdi 19Malerural Nellin 22Femalerural Natalie Angela file1.dta merged with file4.dta (Note the gaps!)

The merge command When you merge, you combine two datasets about the same people. You’re adding more questions/variables. idnameagegenderurban_ rural 001Elvis42Maleurban 002Youdi19Malerural 003Nellin22Femalerural

The merge command use file1.dta, clear merge 1:1 id using file2.dta idnameagegenderurban_ rural asset_ quint educat ion 001Elvis42Maleurban35 002Youdi19Malerural13 003Nellin22Femalerural12 file1.dta file2.dta

The merge command To merge, you need to use a unique identifier. (You can check if a variable is a unique identifier by using the command isid.) Here, id is a unique identifier. idnameagegenderurban_ rural asset_ quint educat ion 001Elvis42Maleurban35 002Youdi19Malerural13 003Nellin22Femalerural12

The merge command GOOD identifiers: long, complicated numbers BAD identifiers: strings (e.g. names!) Always review your merge report:

The merge command GOOD identifiers: long, complicated numbers BAD identifiers: strings (e.g. names!) Always review your merge report: _merge is an automatically- created variable that tells you which dataset each observation comes from

The merge command idnameagegenderurban_ru ral asset_qu int educatio n _merge 001 Elvis 42Maleurban Youdi 19Malerural Nellin 22Femalerural Natalie Angela file1.dta merged with file2.dta (Note the gaps!)

Big Stata concept #2: Loops This is a loop. Dizzee Rascal, “I Don’t Need a Reason” (2013)

Loops A loop: Repeating the same action over and over… and over… again. This is what computers were made for. The “mind blown” Internet GIF meme. This is another loop.

Loops: An example Suppose you have 10 variables (q1, q2,...,q10) that all need to be cleaned. You want to replace a value with. (missing) whenever you see -888 (no response in the survey). replace q1=. if q1==-888 replace q2=. if q2==-888 replace q3=. if q3==-888 replace q4=. if q4==-888 replace q5=. if q5==-888 replace q6=. if q6==-888 replace q7=. if q7==-888 replace q8=. if q8==-888 replace q9=. if q9==-888 replace q10=. if q10==-888 this is one way to do it. this is technically correct. IT WILL WORK.

Loops: An example Suppose you have 10 variables (q1, q2,...,q10) that all need to be cleaned. You want to replace a value with. (missing) whenever you see -888 (no response in the survey). foreach i of varlist q1-q10 { replace `i’=. if `i’==-888 } this is the LOOP way. it is better.

Dissecting the loop Are you copy+pasting? Then you should use a loop. Which part of the command repeats? Which changes? replace q1=. if q1==-888 replace q2=. if q2==-888 replace q3=. if q3==-888 replace q4=. if q4==-888 replace q5=. if q5==-888 replace q6=. if q6==-888 replace q7=. if q7==-888 replace q8=. if q8==-888 replace q9=. if q9==-888 replace q10=. if q10==-888 foreach i of varlist q1-q10 { replace `i’=. if `i’==-888 }

Dissecting the loop Are you copy+pasting? Then you should use a loop. Which part of the command repeats? Which changes? replace q1=. if q1==-888 replace q2=. if q2==-888 replace q3=. if q3==-888 replace q4=. if q4==-888 replace q5=. if q5==-888 replace q6=. if q6==-888 replace q7=. if q7==-888 replace q8=. if q8==-888 replace q9=. if q9==-888 replace q10=. if q10==-888 foreach i of varlist q1-q10 { replace `i’=. if `i’==-888 }

Dissecting the loop Tell Stata that you’re starting a loop: foreach Tell Stata to use a placeholder: i Replace every placeholder (`i’) with a value from the variable list, q1 to q10 (varlist q1-q10) …and run the command on each value replace q1=. if q1==-888 replace q2=. if q2==-888 replace q3=. if q3==-888 replace q4=. if q4==-888 replace q5=. if q5==-888 replace q6=. if q6==-888 replace q7=. if q7==-888 replace q8=. if q8==-888 replace q9=. if q9==-888 replace q10=. if q10==-888 foreach i of varlist q1-q10 { replace `i’=. if `i’==-888 }

Dissecting the loop Tell Stata that you’re starting a loop: foreach Tell Stata to use a placeholder: i Replace every placeholder (`i’) with a value from the variable list, q1 to q10 (varlist q1-q10) …and run the command on each value replace q1=. if q1==-888 replace q2=. if q2==-888 replace q3=. if q3==-888 replace q4=. if q4==-888 replace q5=. if q5==-888 replace q6=. if q6==-888 replace q7=. if q7==-888 replace q8=. if q8==-888 replace q9=. if q9==-888 replace q10=. if q10==-888 foreach i of varlist q1-q10 { replace `i’=. if `i’==-888 }

Dissecting the loop Tell Stata that you’re starting a loop: foreach Tell Stata to use a placeholder: i Replace every placeholder (`i’) with a value from the variable list, q1 to q10 (varlist q1-q10) …and run the command on each value replace q1=. if q1==-888 replace q2=. if q2==-888 replace q3=. if q3==-888 replace q4=. if q4==-888 replace q5=. if q5==-888 replace q6=. if q6==-888 replace q7=. if q7==-888 replace q8=. if q8==-888 replace q9=. if q9==-888 replace q10=. if q10==-888 foreach i of varlist q1-q10 { replace `i’=. if `i’==-888 }

Dissecting the loop Tell Stata that you’re starting a loop: foreach Tell Stata to use a placeholder: i Replace every placeholder (`i’) with a value from the variable list, q1 to q10 (varlist q1-q10) …and run the command on each value replace q1=. if q1==-888 replace q2=. if q2==-888 replace q3=. if q3==-888 replace q4=. if q4==-888 replace q5=. if q5==-888 replace q6=. if q6==-888 replace q7=. if q7==-888 replace q8=. if q8==-888 replace q9=. if q9==-888 replace q10=. if q10==-888 foreach i of varlist q1-q10 { replace `i’=. if `i’==-888 }

Dissecting the loop Tell Stata that you’re starting a loop: foreach Tell Stata to use a placeholder: i Replace every placeholder (`i’) with a value from the variable list, q1 to q10 (varlist q1-q10) …and run the command on each value replace q1=. if q1==-888 replace q2=. if q2==-888 replace q3=. if q3==-888 replace q4=. if q4==-888 replace q5=. if q5==-888 replace q6=. if q6==-888 replace q7=. if q7==-888 replace q8=. if q8==-888 replace q9=. if q9==-888 replace q10=. if q10==-888 foreach i of varlist q1-q10 { replace `i’=. if `i’==-888 }

You can loop almost anything Loop over the numbers from 1 to 100: forval i=1/100 Loop over a group of names: foreach i in “Angela” “Jean-Luc” “Ada” NOTE: You can call your placeholder anything you want (i, j, variable, stuff, thingie, x, etc…) NOTE: The syntax for loops is tricky, use help foreach a lot! (I still use help foreach every time I run a loop. And it’s been YEARS.)

Loop syntax: A note about quotes And please don’t forget: when using your placeholder in your command, you need to use a weird quote mark (`) and a non-weird one (’): `i’ foreach i of varlist q1-q10 { replace `i’=. if `i’==-888 }

Loop syntax: A note about quotes And please don’t forget: when using your placeholder in your command, you need to use a weird quote mark (`) and a non-weird one (’): `i’ foreach i of varlist q1-q10 { replace `i’=. if `i’==-888 } Just one of those things.

Big Stata concept #3: Macros A convenient transition! This placeholder is actually our next big Stata concept Macros – AKA locals and globals `i’ a local! or maybe a global!

What is a macro? A macro is a placeholder for some text It only works in a.do file (not in the Command Editor) Rather than typing some text again and again, you save it as a macro Some example macros: local population_mean_age = 18 ttest sample_age = `population_mean_age’

What is a macro? A macro is a placeholder for some text It only works in a.do file (not in the Command Editor) Rather than typing some text again and again, you save it as a macro Some example macros: global my_folder “C:/MyDocuments/Stata” use “$my_folder/file1.dta”, clear

What is a macro? A macro is a placeholder for some text It only works in a.do file (not in the Command Editor) Rather than typing some text again and again, you save it as a macro Some example macros: local control_vars q1 q2 q3 q4 foreach var of local control_vars { rename `var’ `var’_control }

What is a macro? A macro is a placeholder for some text It only works in a.do file (not in the Command Editor) Rather than typing some text again and again, you save it as a macro Some example macros: summarize resp_farmer local avg_rural`r(mean)’

Two types of macros GLOBALSLOCALS

Two types of macros GLOBALS - Stata remembers them for as long as it’s open - Call them like $this LOCALS -Stata only remembers them for the duration of the.do file -Call them like `this’

Stata has its own (hidden) macros With many commands, Stata generates a list of macros See what’s saved via return list and ereturn list These are handy if you want to use them for something in a loop

Stata has its own (hidden) macros With many commands, Stata generates a list of macros See what’s saved via return list and ereturn list These are handy if you want to use them for something in a loop

Stata has its own (hidden) macros How can I see which globals are active during this session of Stata? macro list

Inferential statistics This mini-course will not cover statistics Great resources: Khan Academy, Coursera, EdX, Udacity Many built-in commands for inferential statistics: pwcorr ttest anova sampsi sampclus reg logit Use help to learn more

Some examples Say you want to run a naïve OLS regression to see the effect of x on y: reg y x …with multiple control variables (x1, x2, x3): reg y x1 x2 x3 …with fixed effects for regions (region): xi: reg y x1 x2 x3 i.region OR areg y x1 x2 x3, absorb(region) …with clustering of standard errors at the school-level: reg y x, cluster(school) OR reg y x, vce(cluster school)

Homework Homework 3 – Merge some data, do some simple stats. Optional: check statistical significance with a t-test. Instructions are on the course website: Stuck? /Skype. Finished? your completed.do file to me. DEADLINE: Wednesday, 27 May 2015