Presentation is loading. Please wait.

Presentation is loading. Please wait.

STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.

Similar presentations


Presentation on theme: "STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education."— Presentation transcript:

1 STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education

2 Getting the files The do-file used in this workshop as well as all data files are in the Stata Help tab of the course iSite. –Download SATdata.csv, auto.dta and Stata for S-052.do and save them to a new folder called Stata_Workshop on your desktop or on a usb drive.

3 Office: Gutman 324 Email: –stathelp@gse.harvard.edustathelp@gse.harvard.edu Want to set up a consultation? –hgse.service-now.com/ess/research.do Want to learn more on your own? –itservices.gse.harvard.edu/its/services/research-online- resources/stata Contact Information

4 Agenda: Overview I.Overview of Stata II.Getting Started III.‘Do’ files IV.Basic data cleaning V.Basic data management VI.Beginning analysis VII.Questions

5 Getting Help in Stata Many pathways to getting help in Stata:. help command. search command. findit command Use the help menu Look online with a web browser Set up an appointment (stathelp@gse.harvard.edu)!

6 Some notes A word about programming in and using Stata Stata is case sensitive, so Myvar is different from myvar All commands in Stata are lower-case and = “ & “, or = “ | “, not = “ ! “ Assignment is “ = “, value equivalency is “ == “ Missing values are coded as extremely large numbers, and are represented by a. or a blank

7 How to Begin a Session? Specify your directory –cd “_______” Begin using a log file –log using “______.log” Open your data and look at it –insheet using “SATdata.csv”, comma –browse –describe

8 Anatomy of a Stata Command Stata commands follow a pattern: [prefix:] command [varlist] [if] [in] [weight ] [, options] For example: bysort region: summarize expense, detail mean csat if income >= 30000 & region !=. list state in 1/10, nolabel

9 Getting Started Opening Data Stata formatted data (.dta) : use “file name” Comma-separated variables: insheet using “file name”, comma Tab-delimited variables: insheet using “file name”, tab Web-based data files: webuse “web location” Flat-files: Create a dictionary {beyond the scope of this workshop}

10 Looking at Data Look at your data – did our data import correctly? How are our data measured? What kinds of variables do we have? Editor. edit Browser. browse Other commands. codebook. describe

11 Examining Data There are several ways to look at our data in Stata How would we describe the distribution of our data? Graphs of distribution Histograms histogram Scatterplots scatter Charts/Tables of frequency and distribution Frequency tables table Cross-tabs tabulate

12 Basic Data Operations, part 1 Generating a new variable gen newvarname=expression Subsetting keep varlist drop varlist if Joining Two Datasets. Merge Note—this is covered in detail in the Data Management Workshop!

13 Basic Data Operations part 2 Labeling To label a variable: label variable varname labelname To label values:. label define labelname 1 ‘high’ 0 ’low’. label value variable labelname Renaming. rename varname1 varname2 Replacing values of an already generated variable. replace newvarname=expression

14 Apply Your Knowledge Use the SATdata dataset Generate a dichotomous variable called hi_score from the csat variable, where a value of 1 indicates a score of greater than 922 and a 0 is less than or equal to 922. Label it as 0=low and 1=high.

15 Agenda I.Overview of Stata II.Getting Started III.‘Do’ files IV.Basic data cleaning V.Basic data management VI.Beginning analysis VII.Questions

16 Beginning Analysis Useful commands Looking at Distributions table, histogram, summarize Testing the Normality Assumption sktest, ladder, gladder Beginning to Look at Relationships tabulate, pwcorr, ttest, anova

17 Apply Your Knowledge Generate a histogram of the expense variable. Generate a two-way table to see if distributions are the same or different for the values of expense by the different values of your newly created hi_score variable. If you have time, see if there is a significant correlation between scores on SATs and the average amount of money that each state spends on education (expense).

18 Building Regression Models Regression models Linear regression regress depvar indepvar1 indepvar2 … Logistic Regression logit depvar indepvar1 indepvar2 …

19 Apply Your Knowledge Generate two scatterplots – one to look at the relationship between expense and csat, one to look at expense and hi_score. Depending on your estimation of the relationship (linear or not), run the appropriate regression to test for the relative effect of expense on either csat scores or hi_scores.

20 Saving data, code, and output Saving your newly transformed data save “pathname\filename.dta” outsheet using “pathname\filename” Saving your code SAVE YOUR DO-FILE!!!!! Saving your output create a log file. log using “pathname\filename”. log close (!!!!) Not closing = not saving! Saving graphs. graph save

21 Agenda: Overview I.Overview of Stata II.Getting Started III.‘Do’ files IV.Basic data cleaning V.Basic data management VI.Beginning analysis VII.Questions

22 Thanks! Questions? Gutman Library, room 323a StatHelp@gse.harvard.edu http://itservices.gse.harvard.edu/its/services/research


Download ppt "STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education."

Similar presentations


Ads by Google