L2: BECOMING SELF- SUFFICIENT IN STATA Getting started with Stata Angela Ambroz May 2015.

Slides:



Advertisements
Similar presentations
The essentials managers need to know about Excel
Advertisements

Stata as a Data Entry Management Tool
Machine Learning Homework
Process Monitoring is only the first step in improving process efficiency.
Basics of Biostatistics for Health Research Session 2 – February 14 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Do files, log files, and workflow in Stata Biostatistics 212 Lecture 2.
SPSS Basics I Dr. Isaac Gusukuma Department of Social Work, Sociology and Criminal Justice Dr. Trent Terrell Department of Psychology NOTE: Accessing Workshop.
The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!
Tools of the Trade: An Introduction to SPSS Presenter: Michael Duggan, Suffolk University
Stata and logit recap. Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with.
Strategies for solving scientific problems using computers.
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
Introduction to SPSS Allen Risley Academic Technology Services, CSUSM
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Course director: Mark Pletcher Teaching Assistant: Lee Zane.
1 Business 90: Business Statistics Professor David Mease Sec 03, T R 7:30-8:45AM BBC 204 Lecture 6 = More of Chapter “Presenting Data in Tables and Charts”
So, You’re Going to Write an Empirical Paper Statlab Workshop October 31 st, 2003 David Nickerson.
Downloading and Installing AutoCAD Architecture 2015 This is a 4 step process 1.Register with the Autodesk Student Community 2.Downloading the software.
Good Data Management Practices Patty Glynn 10/31/05
Getting Started with your data
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Linux & Shell Scripting Small Group Lecture 4 How to Learn to Code Workshop group/ Erin.
BTEc unit 12 software development
Bret Juliano. Introduction What Documentation is Required? – To use a program – To believe a program – To modify a program The Flow-Chart Curse Self-Documenting.
L1: INTRODUCTION Getting started with Stata Angela Ambroz May 2015.
TECHNOLOGY TOOLS BY ALEXA BELTZ & MICKAELA PEREZ.
by Chris Brown under Prof. Susan Rodger Duke University June 2012
General Programming Introduction to Computing Science and Programming I.
Key Data Management Tasks in Stata
STATA Mini Course Fall 2015 Jane Leber Herr Littauer 113 1Stata Mini Course – Spring 2015.
Organizing a project, making a table Biostatistics 212 Lecture 7.
Organizing a project, making a table Biostatistics 212 Session 5.
L3: BIG STATA CONCEPTS Getting started with Stata Angela Ambroz May 2015.
How to Run a Java Program CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
Background Type your question here (statement of the problem) Hypothesis Type your answer / solution here Write hypothesis before you begin the experiment.
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
Software Development Process.  You should already know that any computer system is made up of hardware and software.  The term hardware is fairly easy.
Python Programming Using Variables and input. Objectives We’re learning to build functions and to use inputs and outputs. Outcomes Build a function Use.
Organizing a project, making a table Biostatistics 212 Lecture 7.
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
Making Decisions uCode: October Review What are the differences between: o BlueJ o Java Computer objects represent some thing or idea in the real.
Getting Started With Stata Session 1 Jim Anthony John Troost Department of Epidemiology Michigan State University.
Analyses using SPSS version 19
BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.
Slide 1 Project 1 Task 2 T&N3311 PJ1 Information & Communications Technology HD in Telecommunications and Networking Task 2 Briefing The Design of a Computer.
11/25/2015Slide 1 Scripts are short programs that repeat sequences of SPSS commands. SPSS includes a computer language called Sax Basic for the creation.
Blackboard 8: Grade Center This workshop is for existing users of Blackboard interested in keeping track of student grades online. Blackboard replaced.
Writing ODK Surveys in XLSFORM
PSC 47410: Data Analysis Workshop  What’s the purpose of this exercise?  The workshop’s research questions:  Who supports war in America?  How consistent.
Intermediate 2 Computing Unit 2 - Software Development.
Matthew Glenn AP2 Techno for Tanzania This presentation will cover the different utilities on a computer.
How to use By Lauren Fowler. Adding Attachments Attachments are pictures, videos and files that you have on your computer. You can add these to.
Group 1 BDMPS Project Work The Survey on Use of ICT Facilities in TIC.
Introduction to Eclipse Programming with an Integrated Development Environment.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Preparing to collect data. Make sure you have your materials Surveys –All surveys should have a unique numerical identifier on each page –You can write.
Careers that Fit your Personality, Interests and Talents 9 th /10 th Grade.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
1 Taking Notes. 2 STOP! Have I checked all your Source cards yet? Do they have a yellow highlighter mark on them? If not, you need to finish your Source.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
SOC 305, Southeastern Louisiana University Prof. Robert Martin.
Downloading and Installing GRASP-AF Workshop Ian Robson Information Analyst, North of England Cardiovascular Network.
Mannheim Research Institute for the Economics of Aging SHARE Data Cleaning General rules and procedures Stephanie Stuck MEA Antwerp.
Basic Programming I Working with QBasic
Advanced Quantitative Techniques
AP CSP: Cleaning Data & Creating Summary Tables
DEPARTMENT OF COMPUTER SCIENCE
ECONOMETRICS ii – spring 2018
Stata Basic Course Lab 2.
Presentation transcript:

L2: BECOMING SELF- SUFFICIENT IN STATA Getting started with Stata Angela Ambroz May 2015

Today Homework review and questions Writing our first.do file Commands, commands, commands Data cleaning The dreaded error message

Review Homework 1 solution video is up: Lecture 1’s key messages:.do files are your friend Stata syntax is similar to programming The only command you need to know is help Questions?

Writing our first.do file Today, let’s walk through creating our first.do file We’ll learn best practices on how to organize things We’ll learn the basics of: 1. Importing data 2. Cleaning data 3. Exploring data 4. Outputting basic summary statistics Once this is done, the world is your oyster!

The.do file lifestyle and philosophy Before we begin, let’s talk art and beauty Your.do file is both the analysis and the presentation of the analysis Making it clear – even beautiful – will save you lots of confusion later An example of something beautiful: “Conversion on the Way to Damascus”, by Caravaggio (1601) Think: Who will read my.do file?

The.do file lifestyle and philosophy Before we begin, let’s talk art and beauty Your.do file is both the analysis and the presentation of the analysis Making it clear – even beautiful – will save you lots of confusion later An example of something beautiful: “Conversion on the Way to Damascus”, by Caravaggio (1601) Think: Will my.do file run on ANY computer, as-is?

The worst.do file I could think of abbreviated commands it breaks here! it overwrites the data! aaaaaaaah Also: -It’s a giant pile of text (no spaces) -There are ZERO informative comments -Who wrote this? -Why?! -We’ll never know. -This doesn’t run on my computer! it browses the data, ugh omg it browses it AGAIN

A much better.do file abundant comments to explain and organize headings, to organize the work friendly. sets up the user’s stata (clears previous data, closes previous logs) brief explanations of each substantial piece of work

Good.do file structure LOTS of comments LOTS of space Include information about the.do file’s author, origin date, purpose, input and output data Make sure it can run on other people’s computers If at all possible, do your analysis and do not overwrite your data with it

Good.do file structure Be considerate to your two main audiences: 1. A forgetful you in the future 2. Other people with different machines

Good folder structure Organizing your folder well will also save you lots of time in the future!

Good folder structure A “raw data” folder – This contains a copy of the data, as you received it from the survey company. This data is never altered.

Good folder structure A “data” folder – This contains the data that you use for analysis (and may alter).

Good folder structure A “logs” folder – Here, you keep logs of all of your Stata sessions. This is good to go back and see the input/output of your work.

Good folder structure Your.do files. You can number them in the order that they are meant to be called. A “master”.do file can include lots of information about the project and analysis.

Sketch our.do file on paper GOAL: Conduct summary statistics on the Sauti za Wananchi constitutional round data (szw-constitution.dta). Put Angela out of a job. TO DO 1. Import the Sauti data. 2. Make sure it’s clean. 3. Find out the percentage of people that are: Aware of the constitutional draft process. Planning to vote for the new constitution.

Making a.do file TO DO 1. Import the Sauti data. 2. Make sure it’s clean. 3. Find out the percentage of people that are: Aware of the constitutional draft process. Planning to vote for the new constitution.

1. Importing data To import Stata datasets (.dta): use “file.dta”, clear To import Excel files (.xls): import excel “file.xls”, clear To import Excel files (.csv): insheet using “file.csv”, clear

File paths Note: You need to explicitly tell Stata in which folder to look for your file. Quick version: use “C:/MyDocs/file.dta”, clear Better version: cd “C:/MyDocuments/” use “file.dta”, clear Best version: global myfolder“C:/MyDocuments” use “$myfolder/file.dta”, clear

Making a.do file TO DO 1. Import the Sauti data. 2. Make sure it’s clean. 3. Find out the percentage of people that are: Aware of the constitutional draft process. Planning to vote for the new constitution.

Checking the data out Some good commands to just get to know the data: browse describe describe, short summarize codebook

Cleaning the data 70%* of all data “analysis” is actually just data cleaning Data cleaning means preparing the data for analysis This can include: Converting strings to numerics (e.g. “1,756”-->1756 ) Creating new variables based on old ones (e.g. avg_weekly_mobile_credit = avg_daily_mobile_credit * 7 ) Checking for any weird observations (duplicates, all missing, etc.) Deciding how to treat outliers. Whatever the data needs! Data cleaning is more art than science. * 80% of all statistics are made up.

Cleaning the data: An example GOAL: Let’s create a variable out of q3 which just calculates whether a person knows the steps or not.

Cleaning the data: An example generate knows_process =. replace knows_process = 1 if q3==1 replace knows_process = 0 if q3!=1 & q3!=. label def knowledge 1 "yes" 0 "no“ label val knows_process knowledge

Basic summary statistics TO DO 1. Import the Sauti data. 2. Make sure it’s clean. 3. Find out the percentage of people that are: aware of the constitutional draft process, how they would vote in the referendum. q1 – Awareness of the constitutional draft process q12 – How people plan to vote in the referendum Many different ways to find frequency: we’ll use the popular tabulate

Basic summary statistics Input: tab q1, m Output:

Basic summary statistics Input: tab q1, m Output: Command: tabulate the frequencies of each answer for variable q1

Basic summary statistics Input: tab q1, m Output: Option: tell me how many observations are missing

Basic summary statistics Input: tab q1, m Output: The variable’s label

Basic summary statistics Input: tab q1, m Output: Number of people in each category

Basic summary statistics Input: tab q1, m Output: Percentage of people in each category

Basic summary statistics Input: tab q1, m Output: The cumulative frequency

Now what? Options to output from Stata: Copy + paste Generate an Excel of summary statistics: tabout Generate a new dataset: save, outsheet, export Generate a Word document of estimation results: outreg2 Create visualizations? graph bar chart

The error message :(

Sometimes Stata gives you an error. Don’t panic! Sometimes the error explains what went wrong (this is rare...) Error-checking: Try to isolate your error: run each line, line by line Run it from your.do file (highlight the line and CTRL+D) Re-type it in the Command Editor Is there a typo somewhere in your code? Did you try help? The rubber duck technique Google!

The rubber duck technique Used in programming You explain what you’re doing, out loud, to something at your desk (like a rubber duck) Often, as you explain, the solution reveals itself Here to help.

Homework Homework 2 – Write your first.do file. Instructions are in the Google form. Stuck? /Skype. Finished? your completed.do file to me. Fill out the Google form. DEADLINE: COB Friday, 22 May 2015