1 Statistical Software Programming. STAT 6360 –Statistical Software Programming Data Input in SAS Many ways to get your data into SAS: –Through data entry.

Slides:



Advertisements
Similar presentations
The SAS ® System Additional Information on Statistical Analysis Programming.
Advertisements

The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!
Chapter 3: Editing and Debugging SAS Programs. Some useful tips of using Program Editor Add line number: In the Command Box, type num, enter. Save SAS.
Understanding Microsoft Excel
Introduction to SAS Programming Christina L. Ughrin Statistical Software Consulting Some notes pulled from SAS Programming I: Essentials Training.
Statistics in Science  Introducing SAS ® software Acknowlegements to David Williams Caroline Brophy.
Access - Project 1 l What Is a Database? –A Collection of Data –Organized in a manner to allow: »Access »Retrieval »Use of That Data.
XP New Perspectives on Microsoft Office Excel 2003, Second Edition- Tutorial 3 1 Microsoft Office Excel 2003 Tutorial 3 – Developing a Professional- Looking.
SUNY Morrisville-Norwich Campus- Week 7 CITA 130 Advanced Computer Applications II Spring 2005 Prof. Tom Smith.
XP New Perspectives on Microsoft Office Excel 2003, Second Edition- Tutorial 11 1 Microsoft Office Excel 2003 Tutorial 11 – Importing Data Into Excel.
Entering Data in Excel. Entering numbers, text, a date, or a time n 1Click the cell where you want to enter data. n 2Type the data and press ENTER or.
Pasewark & Pasewark 1 Access Lesson 6 Integrating Access Microsoft Office 2007: Introductory.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 4: Working with Forms.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
Into to SAS ®. 2 List the components of a SAS program. Open an existing SAS program and run it. Objectives.
Creating SAS® Data Sets
COMPREHENSIVE Excel Tutorial 8 Developing an Excel Application.
Technology Basics Creating Worksheet Formulas. 2 Understand Formulas Equations used to calculate values in cells are called formulas. Formulas consist.
Managing Business Data Lecture 8. Summary of Previous Lecture File Systems  Purpose and Limitations Database systems  Definition, advantages over file.
11 Chapter 2: Working with Data in a Project 2.1 Introduction to Tabular Data 2.2 Accessing Local Data 2.3 Importing Text Files 2.4 Editing Tables in the.
Chapter 2: Working with Data in a Project
Introduction to Access By Mary Ann Chaney and Alicia Harkleroad.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
European Computer Driving Licence Syllabus version 5.0 Module 4 – Spreadsheets Chapter 22 – Functions Pass ECDL5 for Office 2007 Module 4 Spreadsheets.
Introduction to SPSS Edward A. Greenberg, PhD
Lesson No:9 MS-Word Tools, Mail Merge and working with Tables CHBT-01 Basic Micro process & Computer Operation.
4/22/2017 5:36 PM EViews Training Creating Workfiles.
Creating a Web Site to Gather Data and Conduct Research.
Microsoft Office Excel 2003 Tutorial 3 – Developing a Professional-Looking Worksheet.
1 Performing Spreadsheet What-If Analysis Applications of Spreadsheets.
Selecting, Formatting, and Printing a finished Report…….
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Lesson 1 Introduction.
EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1.
Chapter 1: Introduction to SAS  SAS programs: A sequence of statements in a particular order  Rules for SAS statements: –Every SAS statement ends in.
BMTRY 789 Lecture 2 SAS Syntax, entering raw data, etc. Lecturer: Annie N. Simpson, MSc. Readings – Chapters 1, 2, 12, & 13 Lab Problems 1.1, 1.2, 1.3,
I OWA S TATE U NIVERSITY Department of Animal Science Getting Your Data Into SAS (Chapter 2 in the Little SAS Book) Animal Science 500 Lecture No. 3 September.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Copyright 2002, Paradigm Publishing Inc. CHAPTER 1 BACKNEXTEND 1-1 LINKS TO OBJECTIVES Worksheet Elements Worksheet Elements Worksheet Area Elements Worksheet.
XP New Perspectives on Microsoft Office Excel 2003, Second Edition- Tutorial 1 1 Microsoft Office Excel 2003 Using Excel To Manage Data.
1 Chapter 2: Working with Data in a Project 2.1 Introduction to Tabular Data 2.2 Accessing Local Data 2.3 Accessing Remote Data 2.4 Importing Text Files.
WHAT IS A DATABASE? A DATABASE IS A COLLECTION OF DATA RELATED TO A PARTICULAR TOPIC OR PURPOSE OR TO PUT IT SIMPLY A GENERAL PURPOSE CONTAINER FOR STORING.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
LOGO Chapter II Entering Excel Formulas and Formatting Data Friday, November 20, 2015.
Lesson 1 – Microsoft Excel * The goal of this lesson is for students to successfully explore and describe the Excel window and to create a new worksheet.
Here’s another problem (see section 2.13 on page 54). A file contains two different types of records (say A’s and B’s) and we only want to read in the.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
Microsoft Office 2013 ®® Calculating Data with Formulas and Functions.
T U T O R I A L  2009 Pearson Education, Inc. All rights reserved Student Grades Application Introducing Two-Dimensional Arrays and RadioButton.
Chapter 2 Getting Data into SAS Directly enter data into SAS data sets –use the ViewTable window. You can define columns (variables) with the Column Attributes.
Microsoft ® Excel ® 2013 Enhanced Excel Tutorial 3 Calculating Data with Formulas and Functions.
Microsoft Access Prepared by the Academic Faculty Members of IT.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Understanding Microsoft Excel
Excel Tutorial 8 Developing an Excel Application
Understanding Microsoft Excel
Lesson 2 Topic - Reading raw data into SAS
Microsoft Excel A Spreadsheet Program.
Introduction to SAS®.
Instructor: Raul Cruz-Cano 7/9/2012
Chapter 2: Getting Data into SAS
Microsoft Excel 2003 Illustrated Complete
Understanding Microsoft Excel
Chapter 1: Introduction to SAS
Access Tutorial 8 Sharing, Integrating, and Analyzing Data
Microsoft Excel 2007 – Level 2
Microsoft Excel 2007 – Level 1
Understanding Microsoft Excel
Day 1: Getting Started with Microsoft Excel 2010
Presentation transcript:

1 Statistical Software Programming

STAT 6360 –Statistical Software Programming Data Input in SAS Many ways to get your data into SAS: –Through data entry via viewtable and other tools –As part of your data step via datalines. –Reading from raw data files (ascii or plain text files, delimited files) –Converting the data files of other software into SAS data files. –Reading other software’s data files directly. We will cover all of these, but concentrate on how to read raw data using the input and infile statements. 2

STAT 6360 –Statistical Software Programming Using Viewtable to Enter Data An easy way to enter small datasets is via viewtable. Click Tools → Table Editor and a blank spreadsheet will pop up in viewtable. Now you can enter data. Columns are variables, rows are observations. –Right-click column names to create variable names and set variable properties. –Then save as a SAS dataset in any available library and use as usual. 3

STAT 6360 –Statistical Software Programming Example – My Stats Grades 4 1.Open viewtable. Create two character variables in columns A and B: class and grades. Use default properties. 2.Enter the statistics classes you have taken (e.g., STAT2000, STAT4210) and the grades you received (A, B+, etc.) 3.Save as dataset grades in WORK library. 4.Close viewtable and save changes.

STAT 6360 –Statistical Software Programming Example – My Stats Grades 5 5.Run the following code: Your grades data should print: proc print data=grades; run; Entering data in viewtable can be convenient for small tasks but is impractical for big datasets. Even for small jobs, better to have a source file for your data or have it permanently recorded in your code (e.g., in datalines).

STAT 6360 –Statistical Software Programming Using PROC IMPORT & Import Wizard SAS has a proc called PROC IMPORT that will import various file types and convert them to SAS datasets. –Excel files, Access files, comma separated values (.csv files), tab and other delimited ascii files, and files in SPSS, Stata, JMP and other formats. The import wizard is a convenient front-end to PROC IMPORT that can be used to generate PROC IMPORT code and/or to run PROC IMPORT in the background if we don’t care to see that code. –We will illustrate it to import some data on nutritional information for different types of cereal. The cereal types were sampled using stratified random sampling from four shelves at Dillon’s grocery store in Manhattan, KS, in January, Shelf is the stratum. The data are in cereal_data.xls, an Excel file, which should be placed in your “Data Files” subfolder on your USB drive. 6

STAT 6360 –Statistical Software Programming Using PROC IMPORT & Import Wizard Select File → Import Data…. This will give a dialog form which you can select the type of data source you want to import. Excel is default and we’ll use that, so click Next. 7 Now browse and select cereal_data.xls from your Data Files folder and press ok. Note that there is currently a bug in SAS for Windows X64 that generates an error if you’re trying to import from the 32 bit version of MS Office. If you encounter this, go back and change the data source to Microsoft Excel Workbook on PC Files Server, click Next, Browse to the file, and click OK (leave the info in the PC Files Server box unchanged).

STAT 6360 –Statistical Software Programming Using PROC IMPORT & Import Wizard Now you can select the worksheet within the Excel workbook if it has data in multiple worksheets (use Sheet1$ in our example) and, optionally, choose from some options that fine-tune the import (don’t change these options for this example). Click Next. Now you can select a library and name your dataset. Use the WORK library and name the dataset cereal. The wizard will generate PROC IMPORT code to import the Excel file. It now asks if you want that code written to a file so you have it for future use. Let’s do that. –Browse to the SAScode subfolder on your USB, set the filename to import_cereal.sas, and click Save (in the browse dialog). Now click Finish. This will import the data, which you can now view with viewtable or by using proc print. 8

STAT 6360 –Statistical Software Programming Using PROC IMPORT & Import Wizard Take a look at the PROC IMPORT code that was generated. –Open import_cereal.sas in the editor. Here’s the code: 9 PROC IMPORT OUT= WORK.cereal DATAFILE= "MyPath\Data Files\cereal_data.xls" DBMS= EXCEL REPLACE; RANGE="Sheet1$"; GETNAMES=YES; MIXED=NO; SCANTEXT=YES; USEDATE=YES; SCANTIME=YES; RUN; Dataset to be created Source file to import Type of source file* Variable names taken from first row by default. To change, use statement GETNAMES=NO; * See SAS/ACCESS(R) 9.3 Interface to PC Files: Reference ( ) for list of DBMS choices. Overwrite output dataset if it exists. These are file-format-specific statements. In this case, EXCEL-specific statements. See for documentation.

STAT 6360 –Statistical Software Programming Using PROC IMPORT & Import Wizard Always check the log file after trying to import a file. –Should look like this in our example: Always close the Excel file before trying to import it!!! Leaving it open may result in failure to import the file or (worse) undetected strange behavior when importing the file. 10

STAT 6360 –Statistical Software Programming Using PROC IMPORT & Import Wizard PROC IMPORT can also import.csv files. cereal_data.csv is a comma-separated version of the cereal dataset. –MS Office gives such files an Excel icon, but they are plain text files! Here is cereal_data.csv in Notepad: 11

STAT 6360 –Statistical Software Programming Using PROC IMPORT & Import Wizard To import cereal_data.csv –choose the.csv file type –browse to the file’s location –don’t change any options (for now) –save the dataset as cereal2 (in WORK) –and save the PROC IMPORT code as import_cereal2.sas. This seemed to work, but look at observation 25 (in viewtable). The cereal name is incomplete. By default, the length of character variables is determined by scanning the 1 st 20 records in the file. Add guessingrows=41; statement to the PROC IMPORT code and retry. It works! 12

STAT 6360 –Statistical Software Programming Reading Data from Plain Text Files Although some plain text files can be read with PROC IMPORT (delimited files), they are more flexibly input using the INFILE and INPUT statements in a data step. Types of input – list, column, formatted, delimited, and named. We’ll cover the first 4 of these. List Input: –We used this when using datalines (pets.sas) and in pets2.sas. –Data must be separated by spaces or delimiters. –You must read all the data in each record (line of data). –All missing values must be indicated with periods. –No spaces allowed in values of character variables. –Need length statement for character variables with length>8. –Can’t handle special data types like dates, times. 13

STAT 6360 –Statistical Software Programming Example: InputExamps.sas This file illustrates many different input modes. –Example 1: we use list input to read in data on tar, nicotine, CO content for 25 cigarette brands. –Similar to pets2.sas but one complication: brand is too long (>8 characters). SAS handles this by truncating brand to length 8. –This can be fixed with an informat or by adding a length statement to tell SAS brand has up to 16 characters. Must appear before input. –Note the use of (obs=5) in PROC PRINT. This is an example of a SAS data set option. Such options appear in parentheses next to the name of a dataset to be used in a PROC. They affect the dataset as it is brought into the PROC for manipulation/analysis. In this case, we are telling PROC PRINT to use dataset cigs, but only use the first 5 observations. 14

STAT 6360 –Statistical Software Programming Example: InputExamps.sas Example 2: Consumer Reports Data on 1993 Cars –26 variables, so each observation spans two lines in data file. –Two ways to handle this with input statement: 1.Insert slash (/) to tell SAS when to go to the next line. 2.Precede variables on line 1 with #1, vars on line 2 with #2, etc. –Again we used length statement because manuf, model are long. –This dataset has asterisks for missing values for some numeric variables. When SAS encounters * when expecting a number, it generates a NOTE in log window and sets the value to missing. This is ok in this example, but better to use periods in raw data file to avoid worrisome NOTE. 15

STAT 6360 –Statistical Software Programming Example: InputExamps.sas Example 3: TVs and Life Expectancy – Column Input Column Input –Useful for character or standard numeric data (no dates or special formatting) aligned in columns in raw data file. –Advantages over list input: Spaces not required between values Missing values can be left blank Character values can be long and can contain spaces Can skip unwanted values in raw data file –Easy to use. After each variable on the input statement specify the column range from which to read the variable. Comes after $ if a character variable. –E.g.: input name $ 1-10 shoesize 12-15; 16

STAT 6360 –Statistical Software Programming Example: InputExamps.sas Example 3: TVs and Life Expectancy – Column Input –Data from 40 largest countries on life expectancy, TVs and doctors per capita (inverted, actually) as of –Notice long country names with spaces handled without a length statement. –Blank values treated as missing (e.g., people per TV in Tanzania) 17

STAT 6360 –Statistical Software Programming Example: InputExamps.sas Example 4: Tour de France Winners – Formatted Input Formatted Input - Informats Q: Numeric data can’t contain characters (other than +,-,e) so how does SAS know to read 1,000 (one thousand) as 1000, or 07/04/1776 (Independence Day) as a numeric date? A: Informats An informat is a code to tell SAS the format to use when reading in data. –SAS also has formats, which tell it how to print or output a data value. You can think of them as “outformats”. 18

STAT 6360 –Statistical Software Programming Informats The three main types and their general form: –Every character informat begins with $ –The informat part is the name of the informat and, in part determines what it does. –The w part determines the total width –The d part determines how many digits follow the decimal point in a numeric format. –The period is an essential part of any informat. 19 CharacterNumericDate $informatw.informatw.dinformatw.

STAT 6360 –Statistical Software Programming Informats Character formats do things like –Specify the length of the character value ($w.) –Trim leading blanks –Converts to UPPER CASE Numeric formats do things like –Specify width & how many decimal places (w.d) –Handle percentages, numbers with commas 20

STAT 6360 –Statistical Software Programming Informats Date and time formats –Read in dates (12/07/1941, 06-jun-44) of different form and convert them to SAS date values, which are the number of days since Jan. 01, E.g., mmddyy10. reads 12/31/1959 and assigns it a SAS date value of -1 –Read in times of different form and convert them to SAS time values, which are the number of seconds since midnight of the current day. E.g., time10. reads 00:10:07.4 and assigns it a SAS time value of –Read in times on a particular day and convert them to SAS datetime values, which are the number of seconds since midnight, Jan. 01, These conversions allow calculations on dates and times (e.g., how much time has passed between two dates). 21

STAT 6360 –Statistical Software Programming Example: InputExamps.sas Example 4: Tour de France Winners – Formatted Input –In this example, we where n is a column number to tell the INPUT statement where to start reading each variable. –Then each variable is read in from that column using the informat given after the variable name on the INPUT statement. year start_town start_date mmddyy8. 22 Start from col 1 and read a number that has width 4 and no decimal places into the numeric variable year Then start in col 6 and read 18 columns into a character variable called start_town Then start at column 25 and read a date that is 8 columns wide and has format mmddyy (could be mm-dd-yy or mm/dd/yy) into the variable start_date. Treat as a SAS date value.

STAT 6360 –Statistical Software Programming Example: InputExamps.sas Example 4: Tour de France Winners – Formatted Input –The @25, …) is one type of pointer control. We are controlling where SAS points to when reading in the next data value. –Another pointer control is +n where n is the number of columns forward that you want SAS to move. –E.g., year start_town $18.; –This tells SAS to start at column 1 and read four columns into year. SAS will then be pointing to column 5. Then +1 tells it to move ahead one column and point to column 6. Then it reads start_town with the informat $18. 23

STAT 6360 –Statistical Software Programming Example: InputExamps.sas Example 5: Pizza Size – Delimited Input –Delimited (e.g.,.csv) data files can be read in with INFILE and INPUT statements instead of PROC IMPORT. –DLM=‘,’ tells SAS what kind of delimiter to expect Use DLM=‘05’X for tab-delimited data. –DSD option tells SAS 1.Consecutive delimiters should be interpreted as missing values. 2.Don’t read quotation marks as part of the data value 3.Treat delimiters (e.g., commas) inside quotes as part of the value, not a delimiter. –MISSOVER option tells SAS not to go to next line in input file if values of some variables not provided. Instead give those variables missing values. 24

STAT 6360 –Statistical Software Programming Example: InputExamps.sas Example 5: Pizza Size – Delimited Input –Colon (:) operator on a character informat (e.g., :$9. ) tells SAS to read 9 columns into the character variable, but stop if you reach a delimiter before all 9 columns have been read. –To see how DSD, MISSOVER, colon operator work, try reading the data in pizzasize_fake.csv with the code in inputexs.sas that is written for that purpose, but omitting MISSOVER, DSD, and the colon on the :$9. informat, each in turn. Finally, note that it is ok and sometimes convenient to mix input modes within a single input statement. –Good example of this section 2.9 of D&S. 25