Presentation is loading. Please wait.

Presentation is loading. Please wait.

Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University

Similar presentations

Presentation on theme: "Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University"— Presentation transcript:

1 Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University

2 WCSUG Presentation2 First Course Requirement—Data Entry I want a first course to be able to do the things I want students to do: –Enter and edit data--must be “want to know topic” –Students can do a small survey to get data on topics of interest to them. Voter poll Attitudes toward diversity issues on campus Beliefs about regulating the internet –Learn how to create a codebook, use codebook and codebook, compact Where possible use “real” data

3 WCSUG Presentation3 First Course Requirement—Data Management Balance statistical content with proper data management content—hard decision Storing original dataset and creating a working dataset Keeping a record of every data modification they make using do-file –Menu system is an aid –Do-files are the requirement Missing values--distinguish types Variable names, labels, and value labels

4 WCSUG Presentation4 First Course Requirements— Data Management Transformations – log, , exp Logical editing – beware of logical transformations when missing values are present (gen y = x < 10 leads to “.” transforming to 0) Appending –Append student generated datasets Merging –Merging two waves of data

5 WCSUG Presentation5 First Course Requirements— Data Management Constructing Measures –When to use egen newvar =rowtotal(var1, var2, var3) –When to use egen newvar =rowmean(var1, var2, var3) –When to use misschk command, what it does Suppose the variable category is 0 or 1 If there are missing values in category, there is a difference between –gen y = 1 if category –gen y = 1 if (category==1) –gen y = 1 if (category>0) –The first and third will give scores of 1 for missing values. The second will give a score of 0 for missing values - BEWARE

6 WCSUG Presentation6 First Course Requirements— Data Management edit command, insheet input, infile (csv files) gen newvar = ln(oldvar) Rarely use replace oldvar = sqrt(oldvar) – only when correcting an error – don’t replace data merge ptid assessment using file, update (need for data to be sorted)

7 WCSUG Presentation7 First Course Requirement (2) –Data presentation, numerical summary measures – summarize, detail; list; browse; edit; describe; codebook; codebook, compact –Graphic presentation--bar chart, histogram, box plot seem minimum –Probability computations – binomial, binomialtail, chi2, chi2tail, F, Ftail, normal – use of the inverse functions for these.

8 WCSUG Presentation8 Examples summarize sp,detail; list sp; describe s*; codebook s* display binomial(10,3,0.1) for cumulative or display Binomial(10,3,.1) for reverse cumulative; Note disp 1-binomial(10,2,.1) gives the same result (also binomialtail(10,3,.1) display normal(1.2) gen y = invnormal(uniform())*5+20

9 WCSUG Presentation9 First Course Requirement (3) Confidence intervals –Binomial – ci—ci variable –Normal – ci—ci variable –Poisson – ci—ci variable, poisson Percentiles – –summarize,d –centile price, c(10(10)90)

10 WCSUG Presentation10 Examples cii 20 4; –cii 20 4, agresti Sometimes we want to use the Agresti formulation. The exact is usually preferable ci varname, level(99) summarize weakness, detail –Can use su weakn,d (i.e. abbreviate commands, options and variables) centile weakness,c(20,40,60,80) –Or centile weakness,c(20(20)80)

11 WCSUG Presentation11 First Course Requirements (4) Hypothesis Testing: –Normal r.v.s One sample (including paired data) - Two sample - ttest K samples – ANOVA –Binomial variables One sample – proportion Two samples – tabulate, chi2

12 WCSUG Presentation 12 Examples ttest sp = 120 [one-sample] ttest spmen = spfem [paired] ttest spmen = spfem, unpaired unequal welch ttest sp, by(sex) [unequal welch etc.] Also immediate form – see help anova sp agegrp

13 WCSUG Presentation13 Examples bitest success = 0.8 [one sample binomial] tabulate success group, chi2 row col prtest success, by(group) [two sample binomial]

14 WCSUG Presentation14 First Course Requirements (5) Hypothesis Testing (cont.) –Power considerations – sampsi (or spreadsheet – nice exercise for some good ones) –Nonparametric methods – sign, signrank, ranksum Contingency tables – tabulate, epitab

15 WCSUG Presentation15 Examples sampsi 132.86 127.44, p(0.8) r(2) sd1(15.34) sd2(18.23) ranksum sp, by(survive) signrank before = after When should we supplement Stata with other software such as G*power 3 that is free and more flexible than sampsi or other software such as PASS or nQuery Advisor?

16 WCSUG Presentation16 First Course Requirements (6) Simple linear regression – regress, rvfplot, other diagnostics Correlation – corr, spearman, ktau – I tend not to use corr because of the sensitivity to the normality assumption for tests and confidence intervals Only pwcorr and not corr provide test of significance

17 WCSUG Presentation17 Examples regress mpg weight rvfplot Stata’s “type a little, get a little” very different from other packages correlate mpg weight or pwcorr mpg weight (especially when you have more than 2 variables – can specify sig and obs—Note that these only work with pwcorr) spearman mpg weight – would be nice to have Stata produce a Spearman correlation matrix

18 WCSUG Presentation18 Examples It’s easy to use permutation tests. permute anyhcq t=r(t):ttest ald7 if adult==1 & assnum==1,by(anyhcq) (running ttest on estimation sample) Monte Carlo permutation results Number of obs = 97 command: ttest ald7, by(anyhcq) t: r(t) permute var: anyhcq --------------------------------------------------------------------------- T | T(obs) c n p=c/n SE(p) [95% Conf. Interval] -------------+------------------------------------------------------------- t | 1.648305 13 100 0.1300 0.0336.071073.2120407 --------------------------------------------------------------------------- Note: confidence interval is with respect to p=c/n. Note: c = #{|T| >= |T(obs)|} One can do similar things with the bootstrap These are easy to use and intuitive for students

19 WCSUG Presentation19 Use of Stata in the Classroom Use Stata sparingly –It’s not easy to follow commands typed or used from menus – students will get confused –Have handouts of what you do – make spacing large enough that students can annotate – even if only to write nasty things about the instructor –Balancing coverage of Stata, e.g. data management with coverage of Statistics is a constant issue –Remember – it’s a course in statistics, not in Stata

20 WCSUG Presentation20 Data Sets Place data sets on a LAN or common drive or available for copying to flash drive or CD Use real data –Not too many variables –May have missing values – but should not affect main analyses – unless you want to demonstrate the problems with missing values

21 WCSUG Presentation21 In the Classroom Using CD rather than flash drive is better(?) –Many desktops have USB port located inconveniently (darn you Dell!) –Sometimes newer PCs have USB port on monitor, and laptops usually have an easy slot for the flash drive –Light level in the room should allow students to read easily –Days of dim projectors are over

22 WCSUG Presentation22 In the Classroom (2) Enlarge the Stata font by using right mouse button –I have found that 14 point is pretty good –Be careful about wraparound of output – if needed, reduce point size temporarily –Don’t ever use red on blue font –See what I mean? It’s more difficult to read Show how to move and fix windows

23 WCSUG Presentation23 In the Classroom (2) Optimizing visibility with projector –Use rich color background –Edit  Preferences  General preferences. Blue background option good but it relies on red for errors, green for Standard text, and doesn’t bold fonts. –Custom may be better because you can make fonts bold and pick colors that do not disadvantage students who are colorblind.

24 WCSUG Presentation24 Virtual Lab A server supporting 30 simultaneous sessions of Stata is remarkably inexpensive. A department can require students to have laptops or provide a cart with enough laptops Because laptops are really “dumb” terminals with server, the laptops can be cheap and not updated very often Any room becomes a lab Students should have 24/7 access to the server

25 WCSUG Presentation25 Handouts and Data Sets Have handouts of your lecture notes Have handouts of your data analysis demonstrations –Include commands as well as output! Data sets –On line – LAN or CD or Floppy disk --Lots of laptops don’t have floppy drives any more, flash drives are inexpensive Include –Student generated datasets –Datasets with large Ns and relatively few variables

26 WCSUG Presentation26 Emphasis in Course Lectures devoted to statistics Labs to learning Stata and working on homework and discussion Proper printing of output –Don’t split output between two pages if possible (at least, find a good break point) –Always use a monotype font (such as Courier New)

27 WCSUG Presentation27 Some Final Issues Multiple testing can distort inference (i.e. doing 100 tests guarantees some significant results – but they may be meaningless) – Worry about this Controlling the digits in the output. Use outreg, estout, esttab

28 WCSUG Presentation28 The End

Download ppt "Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University"

Similar presentations

Ads by Google