Download presentation
Presentation is loading. Please wait.
Published byAugustus McKenzie Modified over 9 years ago
1
Ann Arbor ASA “Up and Running” Series: Intro Stata
Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association November 29, 2011
2
Ann Arbor ASA (Up and Running): Stata Intro
Agenda Why Stata? The Stata interface The Stata mindset data logging issuing commands via menus understanding command syntax Data management Descriptive statistics and estimation Graphing Adding user-written commands .do files Ann Arbor ASA (Up and Running): Stata Intro
3
Ann Arbor ASA (Up and Running): Stata Intro
Why Stata General purpose, cross-platform package like R or SAS Command line interface combined with point-and-click menus Intuitive and standardized command syntax that is well-documented with formulas, examples and references Many advanced user-written commands Easy to write your own code that is pretty fast Excellent corporate tech support and user community Ann Arbor ASA (Up and Running): Stata Intro
4
Which Stata: MP, SE, IC or Small
Stata is not sold in pieces, every flavor has the same commands Most flavors available for 32- and 64-bit Windows, Mac, and Unix/Linux platforms Stata/IC (Intercooled) can handle up to 2,047 variables Stata/SE (Special Edition) can handle up to variables. Also allows longer string variables and larger matrices Stata/MP has the same limits, but is faster on multicore and multiprocessor computers Small Stata is intended for students and is limited to analyzing data sets with a maximum of 99 variables and 1200 observations All of these versions can read each other’s files within their size limits Ann Arbor ASA (Up and Running): Stata Intro
5
Ann Arbor ASA (Up and Running): Stata Intro
The Stata Interface Results window: All output appears here, except for graphs which will appear in a separate window. Note that output is not automatically saved to a file Command window: Enter commands here interactively Variables window: All variables in the current dataset are listed here. Clicking on a variable sends its name to the command window Review window: Previously issued commands are listed here and can b reissued by clicking on them. Buttons: Shortcuts for many common commands such as log, browse, edit, etc. Menus: Convenient for learning Stata command syntax, but time consuming Look and feel is customizable Ann Arbor ASA (Up and Running): Stata Intro
6
Ann Arbor ASA (Up and Running): Stata Intro
Lab 1A Use the Stata File menu to open the example dataset, auto.dta Ann Arbor ASA (Up and Running): Stata Intro
7
Ann Arbor ASA (Up and Running): Stata Intro
The Stata Mindset Data Logging Issuing commands from menus Understanding command syntax Ann Arbor ASA (Up and Running): Stata Intro
8
Ann Arbor ASA (Up and Running): Stata Intro
Data Stata reads an entire dataset into memory. This is a fundamental difference from other stat packages such as SAS and SPSS Only one dataset at a time in a Stata session This is why there are flavors of Stata – IC, SE, Small Ann Arbor ASA (Up and Running): Stata Intro
9
Reading data into memory
Use the menus File, Open Use the command window use “C:\...\sample.dta”, clear use sample.dta, clear Use the File Open button All methods produce the same result Ann Arbor ASA (Up and Running): Stata Intro
10
Ann Arbor ASA (Up and Running): Stata Intro
Saving data Use the menus File, Save (or Save As…) Use the command window save “C:\...\sample.dta” [, replace] Use the Save button All methods produce the same result Ann Arbor ASA (Up and Running): Stata Intro
11
Ann Arbor ASA (Up and Running): Stata Intro
Logging Stata does not automatically write output to a file! You can do this by starting a log file at the start of your analysis, and closing it at the end Use the menus File, Log Use the command window log using “C:\...\analysis1.log” Use the Log button All methods produce the same result Logs can be created, replaced, suspended, resumed, and appended Ann Arbor ASA (Up and Running): Stata Intro
12
Ann Arbor ASA (Up and Running): Stata Intro
Lab 1B Use the Stata menus to: change the color scheme change the working directory to your desktop lab folder start a log file called “labs.log” in your desktop lab folder save the example auto.dta dataset to your desktop lab folder Ann Arbor ASA (Up and Running): Stata Intro
13
Issuing Commands from Menus
Menus are great for: Familiarizing yourself with Stata’s capabilities, both big picture and command-specific Getting context-sensitive help Learning Stata command syntax The downside: time-consuming, especially for repetitive tasks not all functionality available through the menus! Ann Arbor ASA (Up and Running): Stata Intro
14
Ann Arbor ASA (Up and Running): Stata Intro
Lab 2 To get a codebook for the auto.dta dataset, use the following menu path: Data, Describe data, Describe data contents (codebook) You will see the codebook dialog. Inspect it closely… Ann Arbor ASA (Up and Running): Stata Intro
15
Ann Arbor ASA (Up and Running): Stata Intro
Anatomy of a Dialog Box The Stata command (keyword) that will be submitted Multiple tabs Submit and close dialog Submit and leave dialog open Help, Reset, and Copy Command Ann Arbor ASA (Up and Running): Stata Intro
16
Ann Arbor ASA (Up and Running): Stata Intro
Anatomy of a Dialog Box Use if/in to filter rows Specify logical condition Specify row #s Ann Arbor ASA (Up and Running): Stata Intro
17
Ann Arbor ASA (Up and Running): Stata Intro
Anatomy of a Dialog Box Command options available on additional tabs Ann Arbor ASA (Up and Running): Stata Intro
18
Understanding Command Syntax
The general syntax for all Stata commands is: [prefix:] cmdname [varlist] [=exp] [if exp] [in exp] [weight] [using filename] [, options] Elements in square brackets are optional for some commands Sometimes cmdname is all that is required, for example, codebook or describe The underlined portion of cmdname is shorthand for the command Stata is case sensitive Ann Arbor ASA (Up and Running): Stata Intro
19
Understanding Command Syntax: cmdname
cmdname is Stata’s keyword for a command Examples: generate replace drop regress logistic logit scatter graph bar graph box Enter cmdname exactly as indicated, taking care to use the proper case (usually lower case for commands) Ann Arbor ASA (Up and Running): Stata Intro
20
Understanding Command Syntax: varlist
You can apply the command to particular variables by specifying a varlist Order of variables matters; can use hyphen to indicate a series of variables in order as in: codebook x1-x20 Use wildcard notation for shorthand, such as codebook x* Use _all to apply command to all variables Remember that Stata is case sensitive! Variables gender and Gender are two different things to Stata Ann Arbor ASA (Up and Running): Stata Intro
21
Understanding Command Syntax: =exp
exp is short for expression exp is used by data management commands such as generate and replace For example, to create a constant variable x equal to 1, use: generate x=1 You can also use functions this way: gen x2 = x^2 gen x_sq = x*x gen logx=ln(x) Ann Arbor ASA (Up and Running): Stata Intro
22
Understanding Command Syntax: if/in exp
Without any options, commands apply to all observations/variables in the dataset To filter observations, use the if exp clause: codebook if (x==2 & z>=3) | w==2 Note the parentheses! Also note the difference between = and == (assignment and condition equality, respectively) gen x=1 if y==2 list if gender==“F” Conditional operators in Stata are == (equal to) != (not equal to) > (greater than) >= (greater than or equal to) < (less than) <= (less than or equal to) & (and) | (or) Use in exp to refer to particular row numbers in the dataset: list in 1/10 Ann Arbor ASA (Up and Running): Stata Intro
23
A Brief but Critical Detour: Missing Data
While we are talking about selecting cases using an if exp clause, it is important to note that Stata considers missing the largest possible numeric value Stata represents missing numeric variables with a dot Keep this in mind when filtering cases based on a numeric variable: replace hieduc = 1 if x>3 (potential problem) replace hieduc = 1 x>3 & x<. (playing it safe) replace hieduc = 1 x>3 & x!=. (playing it safe) Ann Arbor ASA (Up and Running): Stata Intro
24
Understanding Command Syntax: weight
Most Stata commands can deal with weighted data, where the weight is a variable in the dataset You need to specify the type of weight and the weight variable, using brackets, as in: summarize x [iweight=weightvarname] Four types of weights: Frequency fweights, for replicated data Probability pweights, for observations sampled with unequal probability of selection Analytic aweights, for data containing averages where the average is weighted by the # obs used in calculating the average Importance iweights, defined by the specific command Ann Arbor ASA (Up and Running): Stata Intro
25
Understanding Command Syntax: using filename
Some commands read in data from external files, or write to files These commands contain a using clause, in which the path and filename appear Merging two datasets together is an example: use “C:\…\master_data.dta,clear merge 1:1 id using “C:\...\using_data.dta This performs a 1:1 match using the key variable, id (merge adds new variables). 1:many merges are also possible Similarly, to stack datasets: use “C:\..\one.dta”,clear append using “C:\...\two.dta” Ann Arbor ASA (Up and Running): Stata Intro
26
Understanding Command Syntax: prefix:
Prefix commands operate on other Stata commands. One common prefix is bysort: bysort gender: summarize wage The bysort prefix sorts and stratifies the summarize command by the gender variable The bysort prefix is also very handy in a data management context, for example, aggregating bysort gender: egen avg_wage = mean(wage) Not all commands permit the use of all or even any prefixes Ann Arbor ASA (Up and Running): Stata Intro
27
Understanding Command Syntax: Where to get HELP
If you know the name of a command, enter help cmdname If you don’t know it, enter findit word1 [word2]… This queries a keyword database and some of the official internet sources (such as Stata FAQs, Stata Journal articles) Google or call Stata Technical Services (really!) Statalist archives CSCAR Stata support at if you are affiliated with the U-M as a grad student, staff or faculty member Ann Arbor ASA (Up and Running): Stata Intro
28
Ann Arbor ASA (Up and Running): Stata Intro
Lab 3 Enter the appropriate commands in the command window (no menus!): open the auto.dta dataset, clearing out what is in memory describe the datatset get the codebook for the first 5 variables in the dataset list out the first 10 observations try out the browse command browse the cases where price is greater than 5000 (but not missing) summarize the price variable where foreign==0 (for domestic cars) use the bysort prefix to summarize the price variable by levels of the foreign variable Ann Arbor ASA (Up and Running): Stata Intro
29
Data Management Commands
We’ve already seen quite a few use save //open/save data browse list //view data codebook describe //10,000 ft view gen, egen replace //create/replace vars merge append //merge/stack datasets Next up: importing exporting aggregating keeping/dropping Ann Arbor ASA (Up and Running): Stata Intro
30
Data Management Commands: importing files
use reads Stata formatted (.dta) datasets. For data created in another software package: Save the data in an excel file, then use the import excel command (new with Stata 12) save the data in a comma separated values file (.csv), or a delimited file, then use the insheet command use the other package to save the data in .dta format (SPSS 17+ and SAS 9.2 can do this) use StatTransfer to convert the file to .dta .dta, delimited, and .csv files are the simplest file types to get into Stata Stata will also import data in other formats, but it’s not always straight-forward To import a .csv file: insheet using “C:\...\new_data.csv”, comma clear Ann Arbor ASA (Up and Running): Stata Intro
31
Data Management Commands: exporting files
save saves the data in .dta format To make the data usable by other software packages: export the data to a comma separated values file (.csv), or a delimited file using outsheet use the other package to open the .dta file and save it in another format (SPSS 17+ and SAS 9.2 can do this) use StatTransfer to convert the file from .dta to something else To export data to a .csv file: outsheet using “C:\...\out_data.csv”, comma Ann Arbor ASA (Up and Running): Stata Intro
32
Data Management Commands: aggregating files
It is a common exercise to aggregate data, or to make a dataset of summary statistics Use the collapse command: collapse (mean) mn_wage=wage (count) count=gender, by(gender) to turn data like this……… into this id gender wage ……… gender count mn_wage 1 M 500 M ## 2 M 550 F 2 ## 3 M 490 4 F 505 5 F 410 Use collapse to produce counts, means, medians, percentiles, extrema, and standard deviations of your data. Ann Arbor ASA (Up and Running): Stata Intro
33
Data Management Commands: keep/drop
To throw away variables, use keep varlist drop varlist To get ride of particular observations, add an if or in clause with no varlist: drop if x==3 keep in 1/100 Ann Arbor ASA (Up and Running): Stata Intro
34
Ann Arbor ASA (Up and Running): Stata Intro
Lab 4 Import the “auto.csv” dataset from your desktop lab folder Save the file in your desktop lab folder as “auto1.dta” Aggregate the dataset by levels of foreign, obtaining the mean and median for price and mpg Drop the median price and median mpg variables Export the aggregated dataset to a .csv file in your desktop lab folder Ann Arbor ASA (Up and Running): Stata Intro
35
Descriptive Statistics and Estimation
We’ve already seen summarize Next up: summarizing (with detail) tabulating estimation (modeling) post-estimation Ann Arbor ASA (Up and Running): Stata Intro
36
Descriptive Statistics and Estimation : summarizing with detail
summarize gives descriptive statistics for numeric variables Use the detail option to get additional descriptive statistics sum x1, detail summarize without a varlist will summarize all numeric variables in the dataset Ann Arbor ASA (Up and Running): Stata Intro
37
Descriptive Statistics and Estimation : tabulating
tabulate gives one- and two-way tables for categorical variables Use the chi2, row, and col options to get a chi-square test, row %, column % tab race, row tab race treatment, chi2 col Ann Arbor ASA (Up and Running): Stata Intro
38
Descriptive Statistics and Estimation : estimation (modeling)
Most estimation commands have the same syntax cmdname yvar(list) xvarlist [,options] Common estimation commands are regress //OLS logit, logistic //logistic mlogit //multinomial ologit //ordinal poisson //poisson xtmixed //mixed Example: reg y x1 x2 x3 Ann Arbor ASA (Up and Running): Stata Intro
39
Descriptive Statistics and Estimation : post-estimation
After you get your estimates you can obtain predictions: predict yhat1 if e(sample) predict yhat2 predict resid, residuals Adjusting the estimated covariance matrix is straight forward: reg y x1 x2 x3, robust reg y x1 x2 x3, cluster(clustervar) Testing hypotheses about parameters: test x1=3 Hypotheses can also be nonlinear and involve combinations of parameters Ann Arbor ASA (Up and Running): Stata Intro
40
Ann Arbor ASA (Up and Running): Stata Intro
Lab 5 Using the auto.dta dataset: summarize the variables price and mpg tabulate the foreign variable regress price on mpg and foreign (OLS regression) save the predicted values in a new variable called yhat save the studentized residuals in a new variable called rstudent Ann Arbor ASA (Up and Running): Stata Intro
41
Ann Arbor ASA (Up and Running): Stata Intro
Graphing Easily customized graphics Graphs can be created via menus or command line Manual adjustment can be done after the graph is generated, using the Graph Editor Graphs can be saved in various file formats and/or pasted into documents Examples: histogram y, normal twoway (scatter y x) (lfit y x) Ann Arbor ASA (Up and Running): Stata Intro
42
Ann Arbor ASA (Up and Running): Stata Intro
Lab 6 Using the auto.dta dataset, create a scatterplot of price on the y-axis, and mpg on the x-axis From the Graph window, start the Graph Editor. Modify the plot titles and colors Save your graph as a Stata .gph file in your desktop lab folder Copy the graph and paste it into a Word or PowerPoint file Ann Arbor ASA (Up and Running): Stata Intro
43
Adding User-written Commands
You can install add-on packages, which are user-written commands made publicly available You may run into these packages if you do a findit search Google go to Help, SJ and user-written programs Installation is usually as simple as clicking thru some links My personal most-used add-ons: mvpatterns gllamm Ann Arbor ASA (Up and Running): Stata Intro
44
Ann Arbor ASA (Up and Running): Stata Intro
Lab 7 Install the mvpatterns add-on package, by typing findit mvpatterns then click on the blue link starting with dm91 Follow links to install Read the help file for mvpatterns Check the missing value patterns for the variables make thru rep78 Close your log file Ann Arbor ASA (Up and Running): Stata Intro
45
Ann Arbor ASA (Up and Running): Stata Intro
.do Files .do files are text files that contain sequences of Stata commands (like a SAS command file, or a SPSS syntax file) Create them using Stata’s .do file editor, or any text editor. Copy from your Review window Type in the commands directly Saving your commands to a .do file(s) is never a bad idea. But use good habits: Comment liberally, using * or /* */ conventions Specify the version of Stata used Use set more off to opt out of Stata’s paging feature, if appropriate You can run the entire .do file, or just a small part of it Stata will stop processing if an error is encountered when commands from a .do file are submitted Ann Arbor ASA (Up and Running): Stata Intro
46
Ann Arbor ASA (Up and Running): Stata Intro
Lab 8 Open the sample.do file in your desktop lab folder Can you describe what is happening in the .do file? Copy all of the commands from tonight’s session into a new .do file Run a small section of commands Run the entire file Ann Arbor ASA (Up and Running): Stata Intro
47
Ann Arbor ASA (Up and Running): Stata Intro
Other Misc. To manage variable attributes, use the Variables Manager. Type help cmdname to find out more about these commands: matrix //matrix algebra mata //fancy matrix programming foreach //looping command xt //panel/longitudinal analysis st //survival analysis svy //analysis of complex survey data Ann Arbor ASA (Up and Running): Stata Intro
48
Ann Arbor ASA (Up and Running): Stata Intro
Additional Resources Stata website, FAQs: UCLA website Christopher F. Baum’s Stata handouts Stata NetCourses CSCAR workshops Ann Arbor ASA (Up and Running): Stata Intro
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.