Data Analysis using Stata workshop #4 / Kristin Bott reed.edu > K.Bott / Instructional Technology Services Reed College / Portland, OR.

Slides:



Advertisements
Similar presentations
Legal Meetings: Extended Instructions on Movica and Screencast.
Advertisements

Housekeeping: Variable labels, value labels, calculations and recoding
Stata as a Data Entry Management Tool
Machine Learning Homework
Data Analysis using SPSS By Dr. Shaik Shaffi Ahamed Ph. D
1 Research Methods Lecture 2 The dummies’ guide to STATA Wiji Arulampalam 18/10/2006.
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Stata and logit recap. Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with.
Microsoft Excel 2003 To start Excel, click the start button. A slightly different procedure might be required for computers on a network. If you need assistance,
Stata Intro Practice Exercises Debby Kermer, George Mason University Libraries Data Services.
Adrián de la Garza Jeremy Green 27 March 2009
Teaching Statistics Using Stata Software Susan Hailpern BSN MPH MS Department of Epidemiology and Population Health Albert Einstein College of Medicine.
INTRODUCTION TO STATA Võ Tuấn Khoa Trần Thế Trung.
Today: Run SAS programs on Saturn (UNIX tutorial) Runs SAS programs on the PC.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Course director: Mark Pletcher Teaching Assistant: Lee Zane.
A Simple Guide to Using SPSS© for Windows
1 Student Registration Import Must be imported into PBA by 2/18.
Generating new variables and manipulating data with STATA Biostatistics 212 Session 2.
CPSC 203 Introduction to Computers Lab 21, 22 by Jie (Jeff) Gao Location: ES650.
Version 4 for Windows NEX T. Welcome to SphinxSurvey Version 4,4, the integrated solution for all your survey needs... Question list Questionnaire Design.
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
Financial Information Management Managing Financial Information Critical Thinking Business Process Modeling WINIT Control Structures Homework.
The basics of the Online Portal
Introduction to Computers Connie Dalrymple. What is a computer? Sources:
L2: BECOMING SELF- SUFFICIENT IN STATA Getting started with Stata Angela Ambroz May 2015.
Econometric Analysis Using Stata
 Overview of SPSS  Interface  Getting Started  Managing Data  Descriptive Statistics  Basic Analysis  Additional Resources.
Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health
Introduction to SPSS Edward A. Greenberg, PhD
4/22/2017 5:36 PM EViews Training Creating Workfiles.
Question and Answer Session for Nonprofit Leadership Faculty Facilitator: Peggy McCoey Assistant Professor and Program Director MS in Computer Information.
Microsoft Access Get a green book. Page AC 2 Define Access Define database.
A Brief Introduction to Stata(1). 1. Getting Started.
Learning the TSP2: a guide for students at the 国際総合学類筑波大学 RUNNING REGRESSIONS FROM A SPREADSHEET FILE If you are using a network browser to view this program,
Key Data Management Tasks in Stata
Instructors begin using McGraw-Hill’s Homework Manager by creating a unique class Web site in the system. The Class Homepage becomes the entry point for.
Math 3400 Computer Applications of Statistics Lecture 1 Introduction and SAS Overview.
STATA Mini Course Fall 2015 Jane Leber Herr Littauer 113 1Stata Mini Course – Spring 2015.
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Lesson 1 Introduction.
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
Key Words: Functional Skills. Key Words: Spreadsheets.
What is SPSS  SPSS is a program software used for statistical analysis.  Statistical Package for Social Sciences.
Introduction to Statistical Computing in Clinical Research Biostatistics 212.
Advanced Stata Workshop FHSS Research Support Center.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
VIDEO: INTRODUCTION TO STATA EMBA Data Analysis Professor Timothy Simcoe Boston University School of Management.
© 2010 Pearson Education, Inc. | Publishing as Prentice Hall1 Computer Literacy for IC 3 Unit 2: Using Productivity Software Chapter 1: Starting with Microsoft.
Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education.
STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Creating a Database Angelo Lafratta- Website: Search: Keith Valley Physical.
CS5604: Final Presentation ProjOpenDSA: Log Support Victoria Suwardiman Anand Swaminathan Shiyi Wei Department of Computer Science, Virginia Tech December.
Econometrics-3 XENA BONDARENKO. I. Preparation for Data Analysis a)Create / change working directory b)Specify data c)End Stata d)The four Stata windows.
Stata Review Session Economics 1018 Abby Williamson and Hongyi Li November 17, 2006.
Stata: Getting Starting and Being Productive with VA Data Give me six hours to chop down a tree and I will spend the first four sharpening the axe. --Abraham.
Before the class starts: 1) login to a computer 2) start Stata 13.
Introduction to STATA Before you get frustrated, imagine processing data by hand and think dearly of STATA.
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
LINGO TUTORIAL.
DEPARTMENT OF COMPUTER SCIENCE
Working with Data in Windows
ECONOMETRICS ii – spring 2018
Lab 1 Introductions to R Sean Potter.
Lab 2 Data Manipulation and Descriptive Stats in R
Introduction to Stata Spring 2017.
Objectives This is an introduction to the statistical software STATA aiming at: Preparing the participants in STATA basics (interphase and commands) for.
Stata Basic Course Lab 2.
Introduction to Matlab
Stata Basic Course.
Presentation transcript:

Data Analysis using Stata workshop #4 / Kristin Bott reed.edu > K.Bott / Instructional Technology Services Reed College / Portland, OR

Where we’re headed (<60 min) Step zero : how to find and pilot Stata Step one : basic data skills Step two : external data Step three : data analysis Help and resources K.Bott / Instructional Technology Services Reed College / Portland, OR

Step zero : How to find + pilot

Accessing Stata Where: – Eliot 110/ PPW (social sciences), Psychology computer lab – (mLab, faculty use) – IRC’s in the ETC – Library, various Note: “Keyed” software – currently a limited number of seats for Reed users – If you run into problems w/ access, let me know Options for buying your own copy Options K.Bott / Instructional Technology Services Reed College / Portland, OR

Accessing Stata On your computers -- > Applications >> StataSE open Stata ! … and let’s take a look. K.Bott / Instructional Technology Services Reed College / Portland, OR

Review Variables (variable) properties Command Results

What did I do? What data am I working with? Data details Current action is here! What happened? Point + click (GUI) vs command-line

Stata datasets Pre-loaded, useful for training / learning Type:. sysuse dir. set more off K.Bott / Instructional Technology Services Reed College / Portland, OR

Stata datasets Pre-loaded, useful for training / learning Type:. sysuse dir. set more off. sysuse census (1980 Census data by state) K.Bott / Instructional Technology Services Reed College / Portland, OR

Step one: basic data skills

Data is loaded: now what? What sort of information do you want to know about your data? How can we get to this information? K.Bott / Instructional Technology Services Reed College / Portland, OR

Some things you may want to know Range of data / outliers Missing values [How many? How coded?] Types of variables Variation of data K.Bott / Instructional Technology Services Reed College / Portland, OR

first-glance tools. summarize (whole dataset).summarize marriage Observation / mean / StdDev / Min / Max. describe (whole dataset).describe [var] var name / storage type / disp format / value label / var label. codebook (whole dataset).codebook [var] type / range / units / unique / missing / mean / std dev / %tiles

first-glance tools. summarize (whole dataset).summarize [var] Observation / mean / StdDev / Min / Max. describe (whole dataset).describe marriage var name / storage type / disp format / value label / var label. codebook (whole dataset).codebook [var] type / range / units / unique / missing / mean / std dev / %tiles

first-glance tools. summarize (whole dataset).summarize [var] Observation / mean / StdDev / Min / Max. describe (whole dataset).describe [var] var name / storage type / disp format / value label / var label. codebook (whole dataset).codebook [var] type / range / units / unique / missing / mean / std dev / %tiles

Variable storage types Describe or variable window shows “storage type” Numbers – byte, int(eger), long, float, double – vary in precision + memory they use (bytes) Letters: – String – str1, str2, … str244 Question :: Why does this matter? K.Bott / Instructional Technology Services Reed College / Portland, OR

Variable storage types Describe or variable window shows “storage type” Numbers – byte, int(eger), long, float, double – vary in precision + memory they use (bytes) Letters: – String – str1, str2, … str244 Question :: Why does this matter? K.Bott / Instructional Technology Services Reed College / Portland, OR You can’t find the mean of words...

first-glance tools. summarize (whole dataset).summarize [var] Observation / mean / StdDev / Min / Max. describe (whole dataset).describe [var] var name / storage type / disp format / value label / var label. codebook (whole dataset).codebook marriage type / range / units / unique / missing / mean / std dev / %tiles

first-glance tools. summarize (whole dataset).summarize [var] Observation / mean / StdDev / Min / Max. describe (whole dataset).describe [var] var name / storage type / disp format / value label / var label. codebook (whole dataset).codebook [var] type / range / units / unique / missing / mean / std dev / %tiles

first-glance tools. summarize (whole dataset).summarize [var] Observation / mean / StdDev / Min / Max. describe (whole dataset).describe [var] var name / storage type / disp format / value label / var label. codebook.codebook [var] type / range / units / unique / missing / mean / std dev / %tiles Why is it missing? (didn’t ask / didn’t answer / doesn’t apply … other?) How is “missing” coded?

Stata datasets Pre-loaded, useful for training / learning Type:. sysuse dir. sysuse auto (1978 Automobile data) K.Bott / Instructional Technology Services Reed College / Portland, OR

some first-glance tools. tabulate foreign variable / frequency / percent / cumulative %.tabulate [var1] [var2] what does this do?.tabulate [var2] [var1] what does this do? K.Bott / Instructional Technology Services Reed College / Portland, OR

some first-glance tools. tabulate (whole dataset).tabulate [var] variable / frequency / percent / cumulative %.tabulate foreign rep78 what does this do?.tabulate [var2] [var1] what does this do? K.Bott / Instructional Technology Services Reed College / Portland, OR

kbott’s first-glance toolbox For dataset or [var] or [var1] [var2].summarize.codebook.describe.tabulate.inspect.browse.list K.Bott / Instructional Technology Services Reed College / Portland, OR

Basics: [browse] subsets of data browse if foreign == 1 (equals) browse if foreign ~= 1 (not equal) browse if foreign != 1 (not equal) browse if mpg > 5 & mpg < 20 (& joins multiple) browse mpg in 1/10 (range of values) K.Bott / Instructional Technology Services Reed College / Portland, OR Can also use view to see results in main window

Basics: alter data.sort var (sorts from low to high).drop var (drop variable, keep rest).keep var (keep variable, drop rest).replace var (replace existing variable).generate var (generate new variable).egen var (extended generate).clear [dataset] (clears from memory, does not erase data)

Example: population growth/ West.sysuse census.browse.gen pred = 1.05*pop if region==“West” generates a new variable pred_pop based on population and region data K.Bott / Instructional Technology Services Reed College / Portland, OR

Example: population growth/ West.sysuse census.browse.gen pred = 1.05*pop if region==“West” what happened? what did we do wrong? what do we do now? K.Bott / Instructional Technology Services Reed College / Portland, OR

Example: population growth/ West.sysuse census.gen pred = 1.05*pop if region==“West” type mismatch, eh? how is region coded? how can we fix this? K.Bott / Instructional Technology Services Reed College / Portland, OR

Example: population growth/ West.sysuse census.gen pred = 1.05*pop if region==4.replace pred = pop if region!=4.br region state pop pred K.Bott / Instructional Technology Services Reed College / Portland, OR

Step two: external data

Accessing external data Stata datafiles = *.dta StatTransfer can help transform other formats – downloads.reed.edu – Key Client Bringing data into Stata insheet – for spreadsheets *.csv import excel – for *.xls/*.xlsx files – Also accessible through menus (may be easier) K.Bott / Instructional Technology Services Reed College / Portland, OR

Example: Final exam scores, Math 141 Question: Does spending longer on your exam affect your final grade? Data via Albert Kim File is in the folder you downloaded File > Import > Excel > import first row as variable names K.Bott / Instructional Technology Services Reed College / Portland, OR Check your working directory! pwd

Step three: data analysis + visualization

Visualize: Test score data Grades. hist grade, freq. hist grade, frac. hist grade. hist grade, bin(5). hist grade, bin(20). hist grade, norm bin(10) K.Bott / Instructional Technology Services Reed College / Portland, OR

Visualize: Test time data Distribution of time for exam. histogram(time). histogram(time) if major == “Biology”. histogram(time) if major == “Economics” K.Bott / Instructional Technology Services Reed College / Portland, OR

Visualize: test score + time data. scatter grade time. lfit grade time. scatter grade time | lfit grade time K.Bott / Instructional Technology Services Reed College / Portland, OR

Visualize: test score + time data. scatter grade time. fit grade time. scatter grade time | lfit grade time Do you believe that result? What else do you want to know? K.Bott / Instructional Technology Services Reed College / Portland, OR

Analyze: test score data. correlate grade time. regress grade time K.Bott / Instructional Technology Services Reed College / Portland, OR

Analyze: test score data. correlate grade time. regress grade time. ttest grade time What is the problem here? K.Bott / Instructional Technology Services Reed College / Portland, OR

Analyze: test score data. correlate grade time. regress grade time. ttest grade time K.Bott / Instructional Technology Services Reed College / Portland, OR

Analyze: test score data (t-test) By major, is there a significant difference in the amount of time taken for exams?. ttest time, by(major) K.Bott / Instructional Technology Services Reed College / Portland, OR

Analyze: test score data (t-test) By major, is there a significant difference in the amount of time taken for exams?. ttest time, by(major) K.Bott / Instructional Technology Services Reed College / Portland, OR

Analyze: test score data (t-test) Is there a significant difference between the amt of time that bio majors + econ majors take for their exams?. ttest time if major==“Biology” | major==“Economics”, by(major) K.Bott / Instructional Technology Services Reed College / Portland, OR

Analyze: test score data (ANOVA) Within majors, does amount of time spent on an exam significantly affect the grade (outcome) of exam?. by major: anova grade time. by major: oneway grade time Note: many ANOVA permutations; see Psych Stata page, UCLA resources, and/or K.Bott for more info K.Bott / Instructional Technology Services Reed College / Portland, OR

Analyze: test score data (regression). regress grade time Within majors, does amount of time spent on an exam significantly affect the grade (outcome) of exam?. by major: regress grade time K.Bott / Instructional Technology Services Reed College / Portland, OR

Help + additional resources

Additional packages / tools Need a tool that isn’t there? Find it!.findit outreg.ssc install outreg K.Bott / Instructional Technology Services Reed College / Portland, OR

Stata lab notebook: Log files Example: click log button on menu (GUI) Watch results window log using [filename] log on / log off log suspend / log resume log close K.Bott / Instructional Technology Services Reed College / Portland, OR

Key to collaboration: Do files Save you time for repetitious tasks Minimizes errors Store your data analysis process.doedit or via GUI K.Bott / Instructional Technology Services Reed College / Portland, OR

Do file: example clear *clear data in memory set more off *silence extra output capture log close log using "wkshp_log.log", replace sysuse auto generate gpm = 1/mpg label var gpm "Gallons per mile" sort foreign twoway (scatter gpm weight), by(foreign, total) regress gpm weight foreign *press do button *run button will only run one command at a time *save do file

Help! Stata Help Menu – contents – search – command At the command line – help – search – findit External (to Stata) resources K.Bott / Instructional Technology Services Reed College / Portland, OR

External (to Stata) Resources our Stata pg: reed.edu/cis/help/software/stata- resourcesreed.edu/cis/help/software/stata- resources Links to :UCLA Stata site Psychology Stata site these slides K.Bott x6642, ETC 225 blogs.reed.edu/ed-tech No set office hours – get in touch to meet

Feedback, please: Questions? -