Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University

Slides:



Advertisements
Similar presentations
Module Introduction and Getting Started with Stata
Advertisements

Basics of Biostatistics for Health Research Session 2 – February 14 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Using Excel Biostatistics 212 Lecture 4. Housekeeping Questions about Lab 3? –replace vs. recode Final Project Dataset! –“Housekeeping” commands vs. data.
Using Excel Biostatistics 212 Lecture 4. Housekeeping Finish Lab 2 today and/or start Lab 3 Mac Addendum Copying and pasting from Stata.
EART20170 Computing, Data Analysis & Communication skills Lecturer: Dr Paul Connolly (F18 – Sackville Building) 2. Computing.
Descriptive Statistical Analyses Reliability Analyses Review of Last Class.
Tailoring Needs Chapter 3. Contents This presentation covers the following: – Design considerations for tailored data-entry screens – Design considerations.
WINKS SDA Statistical Data Analysis (Windows Kwikstat) Getting Started Guide.
Getting Started with STATA By: Katie Droll. Embrace Stata! Stata is your statistical buddy! If you put in a bit of effort to learn the basics, you should.
1 An Introduction to IBM SPSS PSY450 Experimental Psychology Dr. Dwight Hennessy.
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Course director: Mark Pletcher Teaching Assistant: Lee Zane.
Final Review Session.
A Simple Guide to Using SPSS© for Windows
Additional HW Exercise 9.1 (a) A state government official is interested in the prevalence of color blindness among drivers in the state. In a random sample.
Getting Started with your data
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
Assumption of Homoscedasticity
PowerPoint: Tables Computer Information Technology Section 5-11 Some text and examples used with permission from: Note: We are.
Nonparametric and Resampling Statistics. Wilcoxon Rank-Sum Test To compare two independent samples Null is that the two populations are identical The.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
©2001 Chariot Software Group Using MicroGrade Classroom Management Software.
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.
Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Binomial Test PowerPoint Prepared by Alfred P.
Statistical Techniques I EXST7005 Exam 2 Review. Exam Coverage n There will be problems requiring the use of F and Chi square tables. Probabilities from.
Using Excel Biostatistics 212 Lecture 4. Housekeeping Questions about Lab 3? Final Project Dataset! –Check in.
Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
SPSS Presented by Chabalala Chabalala Lebohang Kompi Balone Ndaba.
StAR web server tutorial for ROC Analysis. ROC Analysis ROC Analysis: This module allows the user to input data for several classifiers to be tested.
Learning the TSP2: a guide for students at the 国際総合学類筑波大学 RUNNING REGRESSIONS FROM A SPREADSHEET FILE If you are using a network browser to view this program,
Using SPSS for Windows Part II Jie Chen Ph.D. Phone: /6/20151.
Key Data Management Tasks in Stata
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Computing for Research I Spring 2012 Exploratory Data Analysis and Hypothesis Testing February 21 Primary Instructor: Elizabeth Garrett-MAyer.
Using Excel Biostatistics 212 Lecture 4. Housekeeping Questions about Lab 3? –replace vs. recode –Cross-checking/recoding missing values –Analysis of.
Introduction to Statistical Computing in Clinical Research Biostatistics 212.
Advanced Stata Workshop FHSS Research Support Center.
 Muhamad Jantan & T. Ramayah School of Management, Universiti Sains Malaysia Data Analysis Using SPSS.
Getting Started With Stata Session 1 Jim Anthony John Troost Department of Epidemiology Michigan State University.
Laboratory 1. Introduction to SAS u Statistical Analysis System u Package for –data entry –data manipulation –data storage –data analysis –reporting.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Introduction to Statistical Computing in Clinical Research
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1.
Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education.
STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.
Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
1 Econometrics (NA1031) Lecture 1 Introduction. 2 ”How much” type questions oBy how much a unit change in income affects consumption? oBy how much should.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
1.Introduction to SPSS By: MHM. Nafas At HARDY ATI For HNDT Agriculture.
Basics of Biostatistics for Health Research Session 4 – February 28, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.
Key Applications Module Lesson 22 — Managing and Reporting Database Information Computer Literacy BASICS.
STATS 10x Revision CONTENT COVERED: CHAPTERS
Before the class starts: 1) login to a computer 2) start Stata 13.
Data Workshop H397. Data Cleaning  Inputting data  Missing Values  Converting String Variables  Creating Scales  Creating Dummy Variables.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
business analytics II ▌assignment one - solutions autoparts 
What we’ll cover today Transformations Inferential statistics
DEPARTMENT OF COMPUTER SCIENCE
QM222 Class 8 Section A1 Using categorical data in regression
PSYC 355Competitive Success/snaptutorial.com
PSYC 355 Education for Service-- snaptutorial.com
R Data Manipulation Bootstrapping
Oregon State University
STAT 312 Introduction Z-Tests and Confidence Intervals for a
Eviews Tutorial for Labor Economics Lei Lei
Presentation transcript:

Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University

WCSUG Presentation2 First Course Requirement—Data Entry I want a first course to be able to do the things I want students to do: –Enter and edit data--must be “want to know topic” –Students can do a small survey to get data on topics of interest to them. Voter poll Attitudes toward diversity issues on campus Beliefs about regulating the internet –Learn how to create a codebook, use codebook and codebook, compact Where possible use “real” data

WCSUG Presentation3 First Course Requirement—Data Management Balance statistical content with proper data management content—hard decision Storing original dataset and creating a working dataset Keeping a record of every data modification they make using do-file –Menu system is an aid –Do-files are the requirement Missing values--distinguish types Variable names, labels, and value labels

WCSUG Presentation4 First Course Requirements— Data Management Transformations – log, , exp Logical editing – beware of logical transformations when missing values are present (gen y = x < 10 leads to “.” transforming to 0) Appending –Append student generated datasets Merging –Merging two waves of data

WCSUG Presentation5 First Course Requirements— Data Management Constructing Measures –When to use egen newvar =rowtotal(var1, var2, var3) –When to use egen newvar =rowmean(var1, var2, var3) –When to use misschk command, what it does Suppose the variable category is 0 or 1 If there are missing values in category, there is a difference between –gen y = 1 if category –gen y = 1 if (category==1) –gen y = 1 if (category>0) –The first and third will give scores of 1 for missing values. The second will give a score of 0 for missing values - BEWARE

WCSUG Presentation6 First Course Requirements— Data Management edit command, insheet input, infile (csv files) gen newvar = ln(oldvar) Rarely use replace oldvar = sqrt(oldvar) – only when correcting an error – don’t replace data merge ptid assessment using file, update (need for data to be sorted)

WCSUG Presentation7 First Course Requirement (2) –Data presentation, numerical summary measures – summarize, detail; list; browse; edit; describe; codebook; codebook, compact –Graphic presentation--bar chart, histogram, box plot seem minimum –Probability computations – binomial, binomialtail, chi2, chi2tail, F, Ftail, normal – use of the inverse functions for these.

WCSUG Presentation8 Examples summarize sp,detail; list sp; describe s*; codebook s* display binomial(10,3,0.1) for cumulative or display Binomial(10,3,.1) for reverse cumulative; Note disp 1-binomial(10,2,.1) gives the same result (also binomialtail(10,3,.1) display normal(1.2) gen y = invnormal(uniform())*5+20

WCSUG Presentation9 First Course Requirement (3) Confidence intervals –Binomial – ci—ci variable –Normal – ci—ci variable –Poisson – ci—ci variable, poisson Percentiles – –summarize,d –centile price, c(10(10)90)

WCSUG Presentation10 Examples cii 20 4; –cii 20 4, agresti Sometimes we want to use the Agresti formulation. The exact is usually preferable ci varname, level(99) summarize weakness, detail –Can use su weakn,d (i.e. abbreviate commands, options and variables) centile weakness,c(20,40,60,80) –Or centile weakness,c(20(20)80)

WCSUG Presentation11 First Course Requirements (4) Hypothesis Testing: –Normal r.v.s One sample (including paired data) - Two sample - ttest K samples – ANOVA –Binomial variables One sample – proportion Two samples – tabulate, chi2

WCSUG Presentation 12 Examples ttest sp = 120 [one-sample] ttest spmen = spfem [paired] ttest spmen = spfem, unpaired unequal welch ttest sp, by(sex) [unequal welch etc.] Also immediate form – see help anova sp agegrp

WCSUG Presentation13 Examples bitest success = 0.8 [one sample binomial] tabulate success group, chi2 row col prtest success, by(group) [two sample binomial]

WCSUG Presentation14 First Course Requirements (5) Hypothesis Testing (cont.) –Power considerations – sampsi (or spreadsheet – nice exercise for some good ones) –Nonparametric methods – sign, signrank, ranksum Contingency tables – tabulate, epitab

WCSUG Presentation15 Examples sampsi , p(0.8) r(2) sd1(15.34) sd2(18.23) ranksum sp, by(survive) signrank before = after When should we supplement Stata with other software such as G*power 3 that is free and more flexible than sampsi or other software such as PASS or nQuery Advisor?

WCSUG Presentation16 First Course Requirements (6) Simple linear regression – regress, rvfplot, other diagnostics Correlation – corr, spearman, ktau – I tend not to use corr because of the sensitivity to the normality assumption for tests and confidence intervals Only pwcorr and not corr provide test of significance

WCSUG Presentation17 Examples regress mpg weight rvfplot Stata’s “type a little, get a little” very different from other packages correlate mpg weight or pwcorr mpg weight (especially when you have more than 2 variables – can specify sig and obs—Note that these only work with pwcorr) spearman mpg weight – would be nice to have Stata produce a Spearman correlation matrix

WCSUG Presentation18 Examples It’s easy to use permutation tests. permute anyhcq t=r(t):ttest ald7 if adult==1 & assnum==1,by(anyhcq) (running ttest on estimation sample) Monte Carlo permutation results Number of obs = 97 command: ttest ald7, by(anyhcq) t: r(t) permute var: anyhcq T | T(obs) c n p=c/n SE(p) [95% Conf. Interval] t | Note: confidence interval is with respect to p=c/n. Note: c = #{|T| >= |T(obs)|} One can do similar things with the bootstrap These are easy to use and intuitive for students

WCSUG Presentation19 Use of Stata in the Classroom Use Stata sparingly –It’s not easy to follow commands typed or used from menus – students will get confused –Have handouts of what you do – make spacing large enough that students can annotate – even if only to write nasty things about the instructor –Balancing coverage of Stata, e.g. data management with coverage of Statistics is a constant issue –Remember – it’s a course in statistics, not in Stata

WCSUG Presentation20 Data Sets Place data sets on a LAN or common drive or available for copying to flash drive or CD Use real data –Not too many variables –May have missing values – but should not affect main analyses – unless you want to demonstrate the problems with missing values

WCSUG Presentation21 In the Classroom Using CD rather than flash drive is better(?) –Many desktops have USB port located inconveniently (darn you Dell!) –Sometimes newer PCs have USB port on monitor, and laptops usually have an easy slot for the flash drive –Light level in the room should allow students to read easily –Days of dim projectors are over

WCSUG Presentation22 In the Classroom (2) Enlarge the Stata font by using right mouse button –I have found that 14 point is pretty good –Be careful about wraparound of output – if needed, reduce point size temporarily –Don’t ever use red on blue font –See what I mean? It’s more difficult to read Show how to move and fix windows

WCSUG Presentation23 In the Classroom (2) Optimizing visibility with projector –Use rich color background –Edit  Preferences  General preferences. Blue background option good but it relies on red for errors, green for Standard text, and doesn’t bold fonts. –Custom may be better because you can make fonts bold and pick colors that do not disadvantage students who are colorblind.

WCSUG Presentation24 Virtual Lab A server supporting 30 simultaneous sessions of Stata is remarkably inexpensive. A department can require students to have laptops or provide a cart with enough laptops Because laptops are really “dumb” terminals with server, the laptops can be cheap and not updated very often Any room becomes a lab Students should have 24/7 access to the server

WCSUG Presentation25 Handouts and Data Sets Have handouts of your lecture notes Have handouts of your data analysis demonstrations –Include commands as well as output! Data sets –On line – LAN or CD or Floppy disk --Lots of laptops don’t have floppy drives any more, flash drives are inexpensive Include –Student generated datasets –Datasets with large Ns and relatively few variables

WCSUG Presentation26 Emphasis in Course Lectures devoted to statistics Labs to learning Stata and working on homework and discussion Proper printing of output –Don’t split output between two pages if possible (at least, find a good break point) –Always use a monotype font (such as Courier New)

WCSUG Presentation27 Some Final Issues Multiple testing can distort inference (i.e. doing 100 tests guarantees some significant results – but they may be meaningless) – Worry about this Controlling the digits in the output. Use outreg, estout, esttab

WCSUG Presentation28 The End