EC501 Gabriella Conti University of Essex

Slides:



Advertisements
Similar presentations
Housekeeping: Variable labels, value labels, calculations and recoding
Advertisements

CC SQL Utilities.
Data Analysis using SPSS By Dr. Shaik Shaffi Ahamed Ph. D
1 Research Methods Lecture 2 The dummies’ guide to STATA Wiji Arulampalam 18/10/2006.
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
EndNote. What is EndNote:  EndNote is referencing software that enables you to create a database of references from your readings. Your database of references.
Stata and logit recap. Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with.
Chapter 3: Editing and Debugging SAS Programs. Some useful tips of using Program Editor Add line number: In the Command Box, type num, enter. Save SAS.
Tutorial 8: Developing an Excel Application
CTS130 Spreadsheet Lesson 20 Data Consolidation. Consolidation is a process in which data from multiple worksheets or workbooks is combined and summarized.
Understanding Microsoft Excel
1. Overview Brief guide to the display windows and toolbar
1 An Introduction to IBM SPSS PSY450 Experimental Psychology Dr. Dwight Hennessy.
SUNY Morrisville-Norwich Campus- Week 7 CITA 130 Advanced Computer Applications II Spring 2005 Prof. Tom Smith.
XP New Perspectives on Microsoft Office Excel 2003, Second Edition- Tutorial 11 1 Microsoft Office Excel 2003 Tutorial 11 – Importing Data Into Excel.
Division Example 2x - 3y + 4z = 10 x + 6y - 3z = 4 -5x + y + 2z = 3 A*X = B where A = B = >> X = A\B X =
Introduction to SPSS Descriptive Statistics. Introduction to SPSS Statistics Program for the Social Sciences (SPSS) Commonly used statistical software.
Getting Started with your data
SPSS 1: An Introduction to the Statistical Package SPSS Suzie Cro MRC Clinical Trials Unit.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
Introduction to SPSS (For SPSS Version 16.0)
8 Copyright © 2004, Oracle. All rights reserved. Creating LOVs and Editors.
Lesson 1 – Microsoft Excel The goal of this lesson is for students to successfully explore and describe the Excel window and to create a new worksheet.
XP New Perspectives on Introducing Microsoft Office XP Tutorial 1 1 Introducing Microsoft Office XP Tutorial 1.
1 CCPR Computing Services Workshop: Introduction to Stata June, 2006.
Fortran 1- Basics Chapters 1-2 in your Fortran book.
Introduction to SPSS Edward A. Greenberg, PhD
4/22/2017 5:36 PM EViews Training Creating Workfiles.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
A Brief Introduction to Stata(1). 1. Getting Started.
Chapter Three The UNIX Editors. 2 Lesson A The vi Editor.
GUI development with Matlab: GUI Front Panel Components 1 GUI front panel components In this section, we will look at -GUI front panel components -Programming.
1 The EDIT Program The Edit program is a full screen text editor that allows you to: Create text files Create text files Edit an existing text files Edit.
Input, Output, and Processing
Just as there are many human languages, there are many computer programming languages that can be used to develop software. Some are named after people,
EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1.
CREATING TEMPLATES CREATING CUSTOM CHARACTERS IMPORTING BATCH DATA SAVING DATA & TEMPLATES CREATING SERIES DATA PRINTING THE DATA.
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
WINKS 7 Tutorial 7 – Advanced Topic: Labels and Formats Permission granted for use for instruction and for personal use. © Alan C. Elliott,
Creating Graphical User Interfaces (GUI’s) with MATLAB By Jeffrey A. Webb OSU Gateway Coalition Member.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 2 Input, Processing, and Output.
FIX Eye FIX Eye Getting started: The guide EPAM Systems B2BITS.
Data TypestMyn1 Data Types The type of a variable is not set by the programmer; rather, it is decided at runtime by PHP depending on the context in which.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
Chapter 3 MATLAB Fundamentals Introduction to MATLAB Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Lesson 1 – Microsoft Excel * The goal of this lesson is for students to successfully explore and describe the Excel window and to create a new worksheet.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Digital Image Processing Introduction to MATLAB. Background on MATLAB (Definition) MATLAB is a high-performance language for technical computing. The.
Today Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation – GOF.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Chapter – 8 Software Tools.
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Data Entry, Coding & Cleaning SPSS Training Thomas Joshua, MS July, 2008.
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
SPSS For a Beginner CHAR By Adebisi A. Abdullateef
Topics Designing a Program Input, Processing, and Output
ECONOMETRICS ii – spring 2018
Introduction Introduction to Stata 2016.
Chapter 1: Introduction to SAS
WEB PROGRAMMING JavaScript.
Topics Designing a Program Input, Processing, and Output
Objectives This is an introduction to the statistical software STATA aiming at: Preparing the participants in STATA basics (interphase and commands) for.
Stata Basic Course Lab 2.
Topics Designing a Program Input, Processing, and Output
Tutorial 7 – Integrating Access With the Web and With Other Programs
Topics Designing a Program Input, Processing, and Output
Stata Basic Course.
Presentation transcript:

EC501 Gabriella Conti University of Essex CRASH COURSE IN STATA EC501 Gabriella Conti University of Essex

OBJECTIVE Introduce the use of Stata for: Data management Estimation Cross sections Time series Panel data Testing and prediction

OVERVIEW What is Stata Stata resources Getting started Language syntax Storage types Formats Inputting data Do-files, Ado-files, Log-files Examining the data

STATA Stata/SE Intercooled Small maxvar 32,767 2,047 99 matsize 11,000 Stata is a statistical package for managing, analyzing, and graphing data. User-friendly: Command-driven language Interactive Stata power: http://www.stata.com/capabilities/ Which Stata: about Latest version: Stata 8.2 Stata/SE Intercooled Small maxvar 32,767 2,047 99 matsize 11,000 800 40

STATA RESOURCES (1) Stata itself: Stata manuals (version 8): help [command or topic name]; whelp [command or topic name]; help contents; search/net search/findit [command or topic name]. Stata manuals (version 8): Getting Started [GS] User’s Guide [U] Reference [R] Cross-sectional time-series [XT] Time series [TS] Graphics [G] …and lots more… Stata website: http://www.stata.com FAQs: http://www.stata.com/support/faqs Statalist: http://www.stata.com/support/statalist Data sets used in manuals: http://www.stata-press.com/data/ v. Help nel pull-down menu

STATA RESOURCES (2) Stata Technical Bulletin [STB], now The Stata Journal The Boston College Software Archive (user-written commands): net from http://fmwww.bc.edu/RePEc/bocode/ ssc install [command] Stata is web-aware! UCLA Academic Technology Services: http://www.ats.ucla.edu/stat/stata Other resources: http://www.stata.com/links/resources1.html STB describes the programs in more details, and also gives practical examples.

GETTING STARTED (1) Stata windows: Results window [Ctrl+1 or click the results icon] Graph window [Ctrl+2] Viewer window [Ctrl+3 or click the viewer icon]: help, search, net search, view Command window [Ctrl+4]: Type commands here (use pag-up and pag–down buttons for past commands) Hit return to execute the command Review window [Ctrl+5]: Past commands appear here (click on command, and it will appear in the command window) Variables window [Ctrl+6]: Variables appear here (click on variable, and it will appear in the command window, or wherever the Target in the Variables window specifies) Data editor [Ctrl+7 or click the data editor icon or type edit in the command window] Data browser [click the data browser icon or type browse in the command window] Do-file editor [Ctrl+8 or click the do-file editor icon]

GETTING STARTED (2) Stata toolbar (icons): Open: open a stata dataset. Save: save a stata dataset. Print: print contents of active window. Log: to start or stop, pause or resume a log file. Viewer: open viewer window, or bring to the front. Results: open results window, or bring to the front. Graph: open graph window, or bring to the front. Do-file editor: open do-file editor, or bring window to the front. Data editor: open data editor, or bring window to the front. Data browser: open data browser, or bring window to the front. More: command to continue when paused in long output. Break: stop the current task. This command returns the system to as it was before you issued the command.

GETTING STARTED (3) Commands interface: one of the main changes in Stata 8 is that it now has a Menu toolbar (in the style of SPSS). This enables the user to select an item from a pull-down menu which opens a dialogue box in which you can build Stata commands. It is very useful to learn how to build commands with a compicated syntax (e.g. graphs). The command issued by the dialogue box is submitted as you typed it by hand. Therefore if you cannot remember the syntax of a command, using the dialogue box and then checking the command in the Review window (or using the Page-up button) is a good way to get a reminder.

BASIC LANGUAGE SYNTAX [by varlist:] command [varlist] [=exp] [if exp] [in range] [weight] [using filename] [, options] Drop/keep variables or observations according to conditions [if exp] [in range] Logical operators to use with [if exp]: & (and), | (or), != (not) Relational operators can be used in [if exp]: ==, !=, >, >=, <, <=

STORAGE TYPES (1) A number may contain a sign, an integer part, a decimal point, a fraction part, an e or E, and a signed integer exponent. Numbers may not contain commas; e.g.: the number 2,210 must be typed as 2210 (or 2210. or 2210.0). Numbers can be stored in one of five variable types: byte, int, long, float (the default), or double. The table shows the minimum and maximum values for each storage type. Storage type Minimum Maximum Closest to 0 without being 0 bytes byte -127 100 ±1 1 int -32,767 32,740 2 long -2,147,483,647 2,147,483,620 4 float -1.70141173319* 1.70141173319* ± double -8.9884656743* +8.9884656743* 8

STORAGE TYPES (2) A string is a sequence of printable characters, and is typically enclosed in double quotes. The quotes are not considered a part of the string. They merely delimit the beginning and end of the string. The special string “”, often called null string, is considered by Stata to be a missing. String variables often contain identifying information, such as the name of the city or state. Such strings are typically listed, but are not used directly in statistical analysis, although the data might be sorted on the string or datasets might be merged on the basis of one or more string variables. Occasionally, strings contain information that is to be used directly in the analysis, such as the sex, which might be coded “male” or “female”. Stata prefers such information to be numerically encoded and stored in numeric variables. Stata’s statistical routines treat string variables as if every observation records a numeric missing value. However, Stata provides two commands for converting string variables into numeric (and back again): encode/decode and destring/string. Strings may contain the character representation of a number – e.g.: “2.3”. You can convert it directly into a numeric variable using the real() function (with generate), or the destring command. Strings are stored in string variables with storage types str1, str2, …, str80. The storage type merely sets the max. length of the string, not its actual length; thus, “example” has length 7 whether it is stored as a str7, a str10, or even a str80. On the other hand, an attempt to assign the string “example” to a str6 would result in “exampl”. The max. length of a string is 80 characters in Intercooled Stata or Small Stata and 244 in Stata/SE. String literals may exceed 80/244 characters, but only the first 80/244 are significant.

FORMATS (1) The syntax for a Stata numeric format: first type % to indicate the start of the format then optionally type - if you want the result left-aligned if you want to retain leading zeros (honored only with the f format) then type a number w stating the width of the result . a number d stating the number of digits to follow the decimal point either e for scientific notation; e.g.: 1.00e+03 or f for fixed format; e.g.: 1000.0 g for general format; Stata chooses based on the number being displayed c to indicate comma format (not allowed with e)

FORMATS (2) The syntax for a string format is: The default format for each of the numeric variable types are: byte %8.0g int %8.0g long %12.0g float %9.0g double %10.0g The default format for a string is %ws or %9s, whichever is wider. first type % to indicate the start of the format then optionally type - if you want the result left-aligned then type a number indicating the width of the result s

FILES EXTENSIONS Data file (Stata format): filename.dta Do-file: filename.do Dictionary file: filename.dct Log-file: filename.smcl (only readable in Stata) Log-file: filename.log (text file) Ado-file: filename.ado

INPUTTING DATA (1) Check memory: memory If not enough memory has been assigned to Stata, you may get the message: no room to add more observations An attempt was made to increase the number of observations beyond what is currently possible. You have the following alternatives: 1. Store your variables more efficiently; see help compress. (Think of Stata's data area as the area of a rectangle; Stata can trade off width and length.) 2. Drop some variables or observations; see help drop. 3. Increase the amount of memory allocated to the data area using the set memory command; see help memory. r(901); Set memory: set memory

INPUTTING DATA (2) 1a. use filename [, clear nolabel] (or click the folder icon) for datasets already in Stata format *.dta If filename is specified without an extension, .dta is assumed. clear permits the data to be loaded even if there is a dataset already in memory and even if that dataset has changed since the data were last saved. nolabel prevents value labels from being loaded. Unlikely that you will ever use it. 1b. use [varlist] [if exp] [in range] using filename [, clear nolabel ] only a subset of the data is loaded.

INPUTTING DATA (3) 2. insheet [varlist] using filename [, double [no]names [ comma | tab | delimiter("char") ] clear ] For files created by spreadsheet or database programs (eg. Excel). For text (ASCII) files where there is one observation per line and the values are separated by tabs or commas (*.csv). the first line of the file can contain the variable names or not. double forces Stata to store variables as doubles rather than float. It will only speed insheet processing (but can determine for itself). comma, tab, and delimiter("char") tell Stata how values are separated in the file. It will only speed insheet processing (but can determine for itself when the character is a tab or a comma). If values in the file are separated by semicolon, specify delimiter(";"). clear specifies that it is okay for the new data to replace what is currently in memory. Best point: insheet using filename is all you need.

INPUTTING DATA (4) 3a. infile varlist [_skip[(#)] [varlist [_skip[(#)] ...]]] using filename [if exp] [in range] [, automatic byvariable(#) clear] For data in either free or comma-separated-value format (unformatted ASCII (text) data). If filename is specified without an extension, *.raw is assumed. The file must contain only the data, not the variable names. automatic causes creation of value labels from the nonnumeric data read. byvariable(#) specifies that the external file is organized by variables rather than by observations. clear specifies that it is okay for the new data to replace what is currently in memory. All observations on the first variable appear, followed by all observations on the second variable, and so on. All observations on the first variable appear, followed by all observations on the second variable, and so on. variable appear, followed by all observations on the second variable, and so on

INPUTTING DATA (5) 3b. infile using dfilename [if exp] [in range] [, automatic using(filename2) clear ] For ASCII (text) data in fixed format with a dictionary. A dictionary describes the contents of the file and will allow reading files in fixed or free format. dfilename contains the dictionary. If dfilename is specified without an extension, .dct is assumed. using(filename2) specifies the name of the file containing the data. If using() is not specified, the data are assumed to follow the dictionary in dfilename or, if the dictionary specifies the name of some other file, that file is assumed to contain the data. If using(filename2) is specified, filename2 is used to obtain the data even if the dictionary itself says otherwise. E.g.: dictionary using D:\DATA\LFS\RAW\OTT92.txt { _column(1) year %2f _column(3) quarter %1f _column(4) region %2f _column(31) sex %1f _column(32) age %2f _column(45) education %1f _column(59) workcond %2f _column(61) workweek %1f _column(62) workday %1f _column(63) workhour %2f _column(65) usualday %1f } automatic causes creation of value labels from the nonnumeric data read. clear specifies that it is okay for the new data to replace what is currently in memory.

INPUTTING DATA (6) 4.a infix using dfilename [if exp] [in range] [, using(filename2) clear ] 4.b infix specifications using filename [if exp] [in range] [, clear] For data be in fixed-column format. In the first syntax, dfilename contains the dictionary. If dfilename is specified without an extension, .dct is assumed. using(filename2) specifies the name of the file containing the data. If using() is not specified, the data are assumed to follow the dictionary in dfilename or, if the dictionary specifies the name of some other file, that file is assumed to contain the data. If using(filename2) is specified, filename2 is used to obtain the data even if the dictionary itself says otherwise. E.g.: infix year 1-2 quarter 3 region 4-5 sex 31 age 32-33 education 45 workcond 59-60 workweek 61 workday 62 workhour 63-64 usualday 65 using D:\DATA\LFS\RAW\OTT92.txt clear specifies that it is okay for the new data to replace what is currently in memory.

INPUTTING DATA (7) 5. Stat/Transfer: http://www.stattransfer.com/ Performs the conversion of data automatically from one format to .dta format. 6. edit [varlist] [if exp] [in range] [, nolabel] edit brings up a spreadsheet-style data editor for entering new data and editing existing data. 7. input [varlist] [, automatic label ] input allows you to type data directly into the dataset in memory. 8. odbc load [options] odbc allows Stata to load data from ODBC sources. Type help odbc for more on this.

DO-FILES Instead of using Stata interactively, you can use do-files. Highly recommended. A do-file is a standard ASCII text file that includes commands. Filename must include the extension .do. Stata users can use any text editor to create do-files, or they can use the built-in do-file editor. You can include comments using the indicators *, /* */, //, ///. You can change the end-of-line delimiter for long lines: E.g.: #delimit ; once you change the line delimiter to semicolon, all lines, even short ones, must end in semicolons. A do-file is executed by Stata: when you type: do filename in the command window. When you click the “do current file” button in the do-file editor.

ADO-FILES An ado-file is an ASCII text file that contains a Stata program. When you type a command that Stata does not know (i.e. it is not a built-in command), it looks in certain places for an ado-file of that name. If Stata finds it, Stata loads and executes it, so it appears to you as if the ado-command is just another command built into Stata. Use the which command to determine if a command is built in or implemented as an ado-file. Stata looks for ado-files in seven directories. Use the command sysdir to know where they are on your computer.

LOG FILES log or click the log icon. log using filename [, append replace [ text | smcl ] ] log { on | off | close } cmdlog cmdlog using filename [, append replace ] cmdlog { on | off | close } log allows you to make a full record of your Stata session. A log is a file containing what you type and Stata's output. Useful to include the commands to start and stop the logging in the do-file itself. cmdlog allows you to make a record of what you type during your Stata session. A command log contains only what you type and so is a subset of a full log. Command logs are always straight ASCII text files and this makes them easy to convert into do-files. Full logs are recorded in one of two formats: SMCL (Stata Markup and Control Language) or text (meaning ASCII). The default is SMCL, but set logtype can change that, or you can specify an option [ text | smcl ] to state the format you wish. log or cmdlog, typed without arguments, reports the status of logging. log using and cmdlog using open a log file. log close and cmdlog close close the file. Between times, log off and cmdlog off, and log on and cmdlog on can temporarily suspend and resume logging. append specifies that results are to be appended onto the end of an already existing file. If the file does not already exist, a new file is created. replace specifies that filename, if it already exists, is to be overwritten, and so is an alternative to append.

CONTROLLING OUTPUT -more– may appear in your results window when you are trying to output a long listing. To see the next line: press Enter. To see the next screen: press any key or click on the –more- at the bottom of the results window, or click the “go” icon. Set more off/on: to switch the more command off/on Very useful in do-files. break: to interrupt a Stata command at any time, use the “break” button, or type q in the command window.

NEXT TIME: LAB #1 …and more... Examining the data: describe list codebook summarize inspect tabulate Organising datasets: rename drop keep generate replace egen sort append Merge …and more...