Stata Statistical Data Analysis Application Software

Slides:



Advertisements
Similar presentations
BASIC SKILLS AND TOOLS USING ACCESS
Advertisements

Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
EndNote. What is EndNote:  EndNote is referencing software that enables you to create a database of references from your readings. Your database of references.
Stata and logit recap. Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with.
Microsoft Excel The Basics. spreadsheet A type of application program which manipulates numerical and string data in rows and columns of cells. The value.
1 How to Begin Using Stata Lisa Kaltenbach, MS. Biostatistician II Department of Biostatistics Vanderbilt University
1 An Introduction to IBM SPSS PSY450 Experimental Psychology Dr. Dwight Hennessy.
SPSS 1: An Introduction to the Statistical Package SPSS Suzie Cro MRC Clinical Trials Unit.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
COMPREHENSIVE Excel Tutorial 8 Developing an Excel Application.
2. Introduction to the Visual Studio.NET IDE 2. Introduction to the Visual Studio.NET IDE Ch2 – Deitel’s Book.
XP New Perspectives on Introducing Microsoft Office XP Tutorial 1 1 Introducing Microsoft Office XP Tutorial 1.
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 Working with MSSQL Server Code:G0-C# Version: 1.0 Author: Pham Trung Hai CTD.
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Lesson 1 Introduction.
Key Applications Module Lesson 21 — Access Essentials
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
C OMPUTING E SSENTIALS Timothy J. O’Leary Linda I. O’Leary Presentations by: Fred Bounds.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Comparison of different output options from Stata
Unit 3: Text, Fields & Tables DT2510: Advanced CAD Methods.
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
1 Word Processing Intermediate Using Microsoft Office 2000.
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
Institute of Management Studies
Exploring ProFile cont’d.
A quick guide to other statistical software
An Introduction to Epi Info 6/7
Excel Tutorial 8 Developing an Excel Application
Chapter 2: The Visual Studio .NET Development Environment
Spreadsheet – Microsoft Excel 2010
Release Numbers MATLAB is updated regularly
Working in the Forms Developer Environment
Microsoft Excel.
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
Practical Office 2007 Chapter 10
Introduction to SPSS.
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
Lesson 2 Tables and Charts
Managing Worksheets And Workbooks
Guide To UNIX Using Linux Third Edition
DEPARTMENT OF COMPUTER SCIENCE
Microsoft Office Illustrated
Exchanging Data with Other Programs
Boeing Supply Chain Platform (BSCP) Detailed Training
ECONOMETRICS ii – spring 2018
Introduction Introduction to Stata 2016.
EndNote by: fatimah alotaibi.
Microsoft Excel All editions of Microsoft office.
Chapter 2 – Introduction to the Visual Studio .NET IDE
Using JDeveloper.
Objectives This is an introduction to the statistical software STATA aiming at: Preparing the participants in STATA basics (interphase and commands) for.
EndNote What is EndNote? EndNote Library, how to manage?
Navya Thum January 30, 2013 Day 5: MICROSOFT EXCEL Navya Thum January 30, 2013.
Eviews Tutorial for Labor Economics Lei Lei
ICT Spreadsheets Lesson 1: Introduction to Spreadsheets
Stata Basic Course Lab 2.
USER MANUAL - WORLDSCINET
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
Learning Objectives: Creating a new Table Style
Stata Basic Course.
Introduction to Medisoft
Spreadsheets and Data Management
Unit J: Creating a Database
Evaluation of Public Policy
Microsoft Excel 2007 – Level 2
Purchase Document Management
USER MANUAL - WORLDSCINET
Presentation transcript:

Stata Statistical Data Analysis Application Software

The information in this presentation has been gathered from many sources. None of the ideas in this lecture are my original work, although I may have organized them in an original way.

Outline Introduction. Menus vs. commands. Window layout. Inputting Data. Data Management. Variable labels. Variable value lables. Deriving new variables. Sorting rows by columns. Deleting observations/variables. File Management. Do-files. Log files. Stata Help.

Introduction: Why use Stata? According to www.stata.com: Stata is a complete, integrated statistical package that provides everything you need for data analysis, data management, and graphics. Fast, accurate, and easy to use: With a point-and-click, an intuitive command syntax, and online help, All analyses can be reproduced and documented for publication and review. Broad suite of statistical capabilities: Hundreds of statistical tools at your fingertips, from advanced techniques, such as survival models with frailty, dynamic panel data (DPD) regressions, generalized estimating equations (GEE), models with sample selection, ARCH, and estimation with complex survey samples; to linear and generalized linear models (GLM), regressions with count or binary outcomes, ANOVA/MANOVA, ARIMA, cluster analysis, standardization of rates, and case–control analysis; to basic tabulations and summary statistics.

Introduction: Why use Stata? Complete data-management facilities: You can combine and reshape datasets, manage variables, and collect statistics across groups or replicates. You can work with byte, integer, long, float, double, and string variables. Stata also has advanced tools for managing specialized data such as survival/duration data, time-series data, panel/longitudinal data, categorical data, and survey data. Publication-quality graphics: You can choose between existing graph styles or create your own. Responsive and extensible: Stata is so programmable that developers and users add new features every day to respond to the growing demands of today's researchers. Matrix programming—Mata: Mata is both an interactive environment for manipulating matrices and a full development environment that can produce compiled and optimized code.

Introduction: Why use Stata? [2] Cross-platform compatible: Stata is available for Windows, Macintosh, and Unix computers (including Linux). Stata datasets, programs, and other data can be shared across platforms without translation. Complete documentation and other publications: Comes with a base manual. On-line documentation. Publishes a journal and news quarterly. Technical support and learning resources: Free to registered users of Stata. Stata provides online training through NetCourses. You can also participate in short courses sponsored by Stata or third parties in various locations. Widely Used: Stata is distributed in more than 150 countries and is used by professionals in many fields. Affordable: Stata offers several purchase options to fit your budget.

Menus vs. Commands Stata has a set of pull-down menus of commands. Allows user to get results without needing to know syntax. Alternatively, command syntax allows user to reproduce results easily. Convenient if your datasets are updated repeatedly.

Window Layout Stata has 5 windows. Command: where commands are entered. All commands and variables are case sensitive. Results: where results appear. Review: where past commands are listed. Clicking a past command in Review window brings it to the command window where it can be modified and re- executed. Graph: where graphs are displayed (appears only when graphs are requested). Variable: where variables in current dataset are listed. Show Stata Here

Inputting Data Many Options: Manually enter data into the Stata Data Editor. Copy data into the Data Editor from another source (ex.: Excel). Importing an ASCII (text) file. Reading in an Excel spreadsheet (tab- or comma- delimited text file). Open existing Stata Data file. Common file extension: .dta. Use a conversion package (eg, StatTransfer or DBMSCopy) to read in data from another package (eg, SAS data file).

Manually Input Data Open the Data Editor by: Clicking on Data Editor icon (4th from right on tool menu bar, looks like a data file). Via command: edit Can enter numbers or text (appears red). To define variable names: Note: variables are automatically named var1, var2, … Double-click on top of column to view/edit “Variable Properties” and change the name. Via command: rename oldvarname newvarname Eg. rename var1 id Format “g” general numeric format “s” character string

Copy Spreadsheet Data To copy data into Data Editor from an MS Excel spreadsheet: Open Spreadsheet with data. Highlight and copy cells of interest. Paste in Data Editor (via Edit menu, right-click, toolbar icon, or keyboard shortcut) in 1st cell (row and column), where you want the data to begin. To save datafile: Via drop-menu: File → Save As … Via command: save pathname/datafilename.dta

Import ASCII File Via drop- menu: File → Import → Unformatted ASCII data → (add variable names) Note: After importing data by clicking on icons, the commands for importing the file are in the review window. Via command: infile id age using "C:\Documents and Settings\kaltenla\My Documents\Desktop\SampleData.dat"

Import a Spreadsheet Via drop- menu: File → Import → ASCII data created by a spreadsheet. Browse to find your file and click on the type of delimiter (eg, tab, comma). Via command: Comma delimited file: insheet using "C:\Documents and Settings\kaltenla\Desktop\SampleData.csv", comma Tab delimited file: insheet using "C:\Documents and Settings\kaltenla\Desktop\SampleData.txt", tab

Opening an Existing Stata Datafile Via drop-menu: File → Open →Scroll to find data Via command: use "C:\Documents and Settings\kaltenla\My Documents\Work\pbc.dta“, clear Eg, The Primary Biliary Cirrhosis data set (available from the Dataset Archive on the StatLib website (http://www.stat.cmu.edu/)). The clear command is a default command that clears the memory before loading the requested datafile. This is necessary because Stata can have only one dataset in memory at a time!

Listing Data The describe command lists the variables, labels, formats, storage type, number of observations, and date file was created. describe The list command lists rows and columns of the data file. list id chol album bili in 1/6 Allows you to view only the variables id, chol, album, and bili for the first 6 observations only Suppose we are interested in looking at the histologic stage of disease (stage) and treatment (drug) for males. list varlist if condition lists the variables specified in varlist, restricting to those observations satisfying condition. list stage drug if sex==0 Eg, For males with cholesterol greater than or equal to 370: list stage drug if sex==0 & chol >= 370 The syntax for condition is: < less than <= less than or equal to == equal to >= greater than or equal to ~= not equal to & and | or

Data Management: Variable Labels For example: you want to label bili column as “Bilirubin mg/dl”. Via drop-menu: Data → Labels → Label variable Via command: label variable creates a variable label. label variable bili “Bilirubin mg/dl” To remove the label use command without the label: label variable bili

Data Management: Variable Value Labels Suppose the variable sex =0 for males, =1 for females. Whenever we list the variable sex, we see the levels 0 and 1. We can create labels for these data values so the output will display “male” for 0 and “female” for 1. Very convenient when dealing with variables that you are unfamiliar with, large data sets, or have many levels. A two step process: Create labels. Assign labels to values.

Data Management: Variable Value Labels [2] Create and assign variable value labels: Via command: label define creates labels for data values. label define sexlab 0 “male” 1 “female” Via command: label values assigns a label to the values of a variable. label values sex sexlab To remove variable value labels: label values sex Or label drop sexlab Via drop- menu: Data → Labels → Label values → Define or modify value labels Via drop- menu: Data → Labels → Label values → Assign value labels to variable

Data Management: Sorting Data Suppose we want to sort the data by age and serum cholesterol (mg/dl). The sort command allows you to sort the rows of a data set by one or more variables (columns). sort age chol Nice for listing and summarizing data. Eg, sort edema by edema: summarize

Data Management: Derive New Variables Suppose we want to create a new variable “anyedema” for the presence of any edema. In our data set the variable edema=0 if No edema and no diuretic therapy, 0.5 = Edema present without diuretics or edema resolved by diuretics, and 1 = Edema despite diuretic therapy. i.e. want to collapse to: 0 if edema=0 and 1 for edema= {.5, 1}. Use the generate and replace commands in conjunction: Give the new variable an initial value: generate anyedema=0 Replace the initial value where needed. replace anyedema=1 if edema==.5 | edema==1 Or replace anyedema=1 if edema>0 Good idea to check to make sure the new variable was coded correctly by cross-tabulation of anyedema by edema tabulate generates frequency distribution table. tabulate anyedema edema Via drop- menu: Statistics → Summaries, tables, & tests → Tables → Two- way tables with measures of association.

Data Management: Delete variables/observations The drop command deletes specified variables. drop bili chol drug Can also drop a subset of observations by incorporating a conditional expression. drop if sex==0

File Management: Using a Do-file A do-file is a text (also called batch) file with a series of commands to be executed in order by Stata. Also great for composing, revising, and saving Stata commands. To use a do-file: Click on Do-File Editor. Enter commands. Save file with .do extension. To execute a do-file: Via command: do pathoffile/filename.do. Via drop- menu: File → Do …

File Management: Log files Can be used to record (and print): Executed commands. Resulting output (except for graphs). Recommend that the first thing you do in Stata is open a log file. Two types of Log files: Unformatted Log files: Lacks formatting, but is simpler to use if you plan to insert and edit in text editor. Common file extension: .log. Formatted Log files: “Stata Markup and Control Language” file. Great for viewing and printing within Stata. Common file extension: .smcl. To open a Log file: Via drop-menu: File → Log → Begin… Via toolbar: Click on the 4th icon from left on menu bar (looks like a scroll)

Stata On-line If you are connected to the Internet, you are also connected to the Stata website (www.stata.com) whenever you run Stata. New Stata programs can be downloaded from their website onto your computer.

Stata Help Can use the help command command to open a window with documentation regarding command (eg, help reg). Via drop- menu: Help →Contents for a list of Stata commands in table-of- contents format. Help →Search… for a keyword search. Help →Stata Command…to search for specific Stata commands. If you are connected to the Internet and running Stata, when conducting a search you are searching both the Stata software and the Stata website.

Stata Resources Statistical Modeling for Biomedical Researchers by W. Dupont, An Introduction to Stata for Health Researchers by Svend Juul The Stata News is a quarterly publication containing announcements of new releases and updates, NetCourse schedules, new books, Users Group meetings, new products, and other announcements of interest to Stata users. Stata Press also publishes books about using Stata and about statistics topics for professional researchers of all disciplines. The Stata Journal is a quarterly publication containing articles about statistics, data analysis, teaching methods, and effective use of Stata’s language. http://www.stata-journal.com