Presentation is loading. Please wait.

Presentation is loading. Please wait.

EC501 Gabriella Conti University of Essex

Similar presentations


Presentation on theme: "EC501 Gabriella Conti University of Essex"— Presentation transcript:

1 EC501 Gabriella Conti University of Essex
CRASH COURSE IN STATA EC501 Gabriella Conti University of Essex

2 OBJECTIVE Introduce the use of Stata for: Data management Estimation
Cross sections Time series Panel data Testing and prediction

3 OVERVIEW What is Stata Stata resources Getting started Language syntax
Storage types Formats Inputting data Do-files, Ado-files, Log-files Examining the data

4 STATA Stata/SE Intercooled Small maxvar 32,767 2,047 99 matsize 11,000
Stata is a statistical package for managing, analyzing, and graphing data. User-friendly: Command-driven language Interactive Stata power: Which Stata: about Latest version: Stata 8.2 Stata/SE Intercooled Small maxvar 32,767 2,047 99 matsize 11,000 800 40

5 STATA RESOURCES (1) Stata itself: Stata manuals (version 8):
help [command or topic name]; whelp [command or topic name]; help contents; search/net search/findit [command or topic name]. Stata manuals (version 8): Getting Started [GS] User’s Guide [U] Reference [R] Cross-sectional time-series [XT] Time series [TS] Graphics [G] …and lots more… Stata website: FAQs: Statalist: Data sets used in manuals: v. Help nel pull-down menu

6 STATA RESOURCES (2) Stata Technical Bulletin [STB], now The Stata Journal The Boston College Software Archive (user-written commands): net from ssc install [command] Stata is web-aware! UCLA Academic Technology Services: Other resources: STB describes the programs in more details, and also gives practical examples.

7 GETTING STARTED (1) Stata windows:
Results window [Ctrl+1 or click the results icon] Graph window [Ctrl+2] Viewer window [Ctrl+3 or click the viewer icon]: help, search, net search, view Command window [Ctrl+4]: Type commands here (use pag-up and pag–down buttons for past commands) Hit return to execute the command Review window [Ctrl+5]: Past commands appear here (click on command, and it will appear in the command window) Variables window [Ctrl+6]: Variables appear here (click on variable, and it will appear in the command window, or wherever the Target in the Variables window specifies) Data editor [Ctrl+7 or click the data editor icon or type edit in the command window] Data browser [click the data browser icon or type browse in the command window] Do-file editor [Ctrl+8 or click the do-file editor icon]

8 GETTING STARTED (2) Stata toolbar (icons): Open: open a stata dataset.
Save: save a stata dataset. Print: print contents of active window. Log: to start or stop, pause or resume a log file. Viewer: open viewer window, or bring to the front. Results: open results window, or bring to the front. Graph: open graph window, or bring to the front. Do-file editor: open do-file editor, or bring window to the front. Data editor: open data editor, or bring window to the front. Data browser: open data browser, or bring window to the front. More: command to continue when paused in long output. Break: stop the current task. This command returns the system to as it was before you issued the command.

9 GETTING STARTED (3) Commands interface: one of the main changes in Stata 8 is that it now has a Menu toolbar (in the style of SPSS). This enables the user to select an item from a pull-down menu which opens a dialogue box in which you can build Stata commands. It is very useful to learn how to build commands with a compicated syntax (e.g. graphs). The command issued by the dialogue box is submitted as you typed it by hand. Therefore if you cannot remember the syntax of a command, using the dialogue box and then checking the command in the Review window (or using the Page-up button) is a good way to get a reminder.

10 BASIC LANGUAGE SYNTAX [by varlist:] command [varlist] [=exp] [if exp] [in range] [weight] [using filename] [, options] Drop/keep variables or observations according to conditions [if exp] [in range] Logical operators to use with [if exp]: & (and), | (or), != (not) Relational operators can be used in [if exp]: ==, !=, >, >=, <, <=

11 STORAGE TYPES (1) A number may contain a sign, an integer part, a decimal point, a fraction part, an e or E, and a signed integer exponent. Numbers may not contain commas; e.g.: the number 2,210 must be typed as 2210 (or or ). Numbers can be stored in one of five variable types: byte, int, long, float (the default), or double. The table shows the minimum and maximum values for each storage type. Storage type Minimum Maximum Closest to 0 without being 0 bytes byte -127 100 ±1 1 int -32,767 32,740 2 long -2,147,483,647 2,147,483,620 4 float * * double * * 8

12 STORAGE TYPES (2) A string is a sequence of printable characters, and is typically enclosed in double quotes. The quotes are not considered a part of the string. They merely delimit the beginning and end of the string. The special string “”, often called null string, is considered by Stata to be a missing. String variables often contain identifying information, such as the name of the city or state. Such strings are typically listed, but are not used directly in statistical analysis, although the data might be sorted on the string or datasets might be merged on the basis of one or more string variables. Occasionally, strings contain information that is to be used directly in the analysis, such as the sex, which might be coded “male” or “female”. Stata prefers such information to be numerically encoded and stored in numeric variables. Stata’s statistical routines treat string variables as if every observation records a numeric missing value. However, Stata provides two commands for converting string variables into numeric (and back again): encode/decode and destring/string. Strings may contain the character representation of a number – e.g.: “2.3”. You can convert it directly into a numeric variable using the real() function (with generate), or the destring command. Strings are stored in string variables with storage types str1, str2, …, str80. The storage type merely sets the max. length of the string, not its actual length; thus, “example” has length 7 whether it is stored as a str7, a str10, or even a str80. On the other hand, an attempt to assign the string “example” to a str6 would result in “exampl”. The max. length of a string is 80 characters in Intercooled Stata or Small Stata and 244 in Stata/SE. String literals may exceed 80/244 characters, but only the first 80/244 are significant.

13 FORMATS (1) The syntax for a Stata numeric format: first type %
to indicate the start of the format then optionally type - if you want the result left-aligned if you want to retain leading zeros (honored only with the f format) then type a number w stating the width of the result . a number d stating the number of digits to follow the decimal point either e for scientific notation; e.g.: 1.00e+03 or f for fixed format; e.g.: g for general format; Stata chooses based on the number being displayed c to indicate comma format (not allowed with e)

14 FORMATS (2) The syntax for a string format is:
The default format for each of the numeric variable types are: byte %8.0g int %8.0g long %12.0g float %9.0g double %10.0g The default format for a string is %ws or %9s, whichever is wider. first type % to indicate the start of the format then optionally type - if you want the result left-aligned then type a number indicating the width of the result s

15 FILES EXTENSIONS Data file (Stata format): filename.dta
Do-file: filename.do Dictionary file: filename.dct Log-file: filename.smcl (only readable in Stata) Log-file: filename.log (text file) Ado-file: filename.ado

16 INPUTTING DATA (1) Check memory: memory
If not enough memory has been assigned to Stata, you may get the message: no room to add more observations An attempt was made to increase the number of observations beyond what is currently possible. You have the following alternatives: 1. Store your variables more efficiently; see help compress. (Think of Stata's data area as the area of a rectangle; Stata can trade off width and length.) 2. Drop some variables or observations; see help drop. 3. Increase the amount of memory allocated to the data area using the set memory command; see help memory. r(901); Set memory: set memory

17 INPUTTING DATA (2) 1a. use filename [, clear nolabel]
(or click the folder icon) for datasets already in Stata format *.dta If filename is specified without an extension, .dta is assumed. clear permits the data to be loaded even if there is a dataset already in memory and even if that dataset has changed since the data were last saved. nolabel prevents value labels from being loaded. Unlikely that you will ever use it. 1b. use [varlist] [if exp] [in range] using filename [, clear nolabel ] only a subset of the data is loaded.

18 INPUTTING DATA (3) 2. insheet [varlist] using filename [, double [no]names [ comma | tab | delimiter("char") ] clear ] For files created by spreadsheet or database programs (eg. Excel). For text (ASCII) files where there is one observation per line and the values are separated by tabs or commas (*.csv). the first line of the file can contain the variable names or not. double forces Stata to store variables as doubles rather than float. It will only speed insheet processing (but can determine for itself). comma, tab, and delimiter("char") tell Stata how values are separated in the file. It will only speed insheet processing (but can determine for itself when the character is a tab or a comma). If values in the file are separated by semicolon, specify delimiter(";"). clear specifies that it is okay for the new data to replace what is currently in memory. Best point: insheet using filename is all you need.

19 INPUTTING DATA (4) 3a. infile varlist [_skip[(#)] [varlist [_skip[(#)] ...]]] using filename [if exp] [in range] [, automatic byvariable(#) clear] For data in either free or comma-separated-value format (unformatted ASCII (text) data). If filename is specified without an extension, *.raw is assumed. The file must contain only the data, not the variable names. automatic causes creation of value labels from the nonnumeric data read. byvariable(#) specifies that the external file is organized by variables rather than by observations. clear specifies that it is okay for the new data to replace what is currently in memory. All observations on the first variable appear, followed by all observations on the second variable, and so on. All observations on the first variable appear, followed by all observations on the second variable, and so on. variable appear, followed by all observations on the second variable, and so on

20 INPUTTING DATA (5) 3b. infile using dfilename [if exp] [in range] [, automatic using(filename2) clear ] For ASCII (text) data in fixed format with a dictionary. A dictionary describes the contents of the file and will allow reading files in fixed or free format. dfilename contains the dictionary. If dfilename is specified without an extension, .dct is assumed. using(filename2) specifies the name of the file containing the data. If using() is not specified, the data are assumed to follow the dictionary in dfilename or, if the dictionary specifies the name of some other file, that file is assumed to contain the data. If using(filename2) is specified, filename2 is used to obtain the data even if the dictionary itself says otherwise. E.g.: dictionary using D:\DATA\LFS\RAW\OTT92.txt { _column(1) year %2f _column(3) quarter %1f _column(4) region %2f _column(31) sex %1f _column(32) age %2f _column(45) education %1f _column(59) workcond %2f _column(61) workweek %1f _column(62) workday %1f _column(63) workhour %2f _column(65) usualday %1f } automatic causes creation of value labels from the nonnumeric data read. clear specifies that it is okay for the new data to replace what is currently in memory.

21 INPUTTING DATA (6) 4.a infix using dfilename [if exp] [in range] [, using(filename2) clear ] 4.b infix specifications using filename [if exp] [in range] [, clear] For data be in fixed-column format. In the first syntax, dfilename contains the dictionary. If dfilename is specified without an extension, .dct is assumed. using(filename2) specifies the name of the file containing the data. If using() is not specified, the data are assumed to follow the dictionary in dfilename or, if the dictionary specifies the name of some other file, that file is assumed to contain the data. If using(filename2) is specified, filename2 is used to obtain the data even if the dictionary itself says otherwise. E.g.: infix year 1-2 quarter 3 region 4-5 sex 31 age education 45 workcond workweek 61 workday 62 workhour usualday 65 using D:\DATA\LFS\RAW\OTT92.txt clear specifies that it is okay for the new data to replace what is currently in memory.

22 INPUTTING DATA (7) 5. Stat/Transfer:
Performs the conversion of data automatically from one format to .dta format. 6. edit [varlist] [if exp] [in range] [, nolabel] edit brings up a spreadsheet-style data editor for entering new data and editing existing data. 7. input [varlist] [, automatic label ] input allows you to type data directly into the dataset in memory. 8. odbc load [options] odbc allows Stata to load data from ODBC sources. Type help odbc for more on this.

23 DO-FILES Instead of using Stata interactively, you can use do-files.
Highly recommended. A do-file is a standard ASCII text file that includes commands. Filename must include the extension .do. Stata users can use any text editor to create do-files, or they can use the built-in do-file editor. You can include comments using the indicators *, /* */, //, ///. You can change the end-of-line delimiter for long lines: E.g.: #delimit ; once you change the line delimiter to semicolon, all lines, even short ones, must end in semicolons. A do-file is executed by Stata: when you type: do filename in the command window. When you click the “do current file” button in the do-file editor.

24 ADO-FILES An ado-file is an ASCII text file that contains a Stata program. When you type a command that Stata does not know (i.e. it is not a built-in command), it looks in certain places for an ado-file of that name. If Stata finds it, Stata loads and executes it, so it appears to you as if the ado-command is just another command built into Stata. Use the which command to determine if a command is built in or implemented as an ado-file. Stata looks for ado-files in seven directories. Use the command sysdir to know where they are on your computer.

25 LOG FILES log or click the log icon.
log using filename [, append replace [ text | smcl ] ] log { on | off | close } cmdlog cmdlog using filename [, append replace ] cmdlog { on | off | close } log allows you to make a full record of your Stata session. A log is a file containing what you type and Stata's output. Useful to include the commands to start and stop the logging in the do-file itself. cmdlog allows you to make a record of what you type during your Stata session. A command log contains only what you type and so is a subset of a full log. Command logs are always straight ASCII text files and this makes them easy to convert into do-files. Full logs are recorded in one of two formats: SMCL (Stata Markup and Control Language) or text (meaning ASCII). The default is SMCL, but set logtype can change that, or you can specify an option [ text | smcl ] to state the format you wish. log or cmdlog, typed without arguments, reports the status of logging. log using and cmdlog using open a log file. log close and cmdlog close close the file. Between times, log off and cmdlog off, and log on and cmdlog on can temporarily suspend and resume logging. append specifies that results are to be appended onto the end of an already existing file. If the file does not already exist, a new file is created. replace specifies that filename, if it already exists, is to be overwritten, and so is an alternative to append.

26 CONTROLLING OUTPUT -more– may appear in your results window when you are trying to output a long listing. To see the next line: press Enter. To see the next screen: press any key or click on the –more- at the bottom of the results window, or click the “go” icon. Set more off/on: to switch the more command off/on Very useful in do-files. break: to interrupt a Stata command at any time, use the “break” button, or type q in the command window.

27 NEXT TIME: LAB #1 …and more... Examining the data: describe list
codebook summarize inspect tabulate Organising datasets: rename drop keep generate replace egen sort append Merge …and more...


Download ppt "EC501 Gabriella Conti University of Essex"

Similar presentations


Ads by Google