Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)


Similar presentations
Summary Statistics/Simple Graphs in SAS/EXCEL/JMP.

The SAS ® System Additional Information on Statistical Analysis Programming.
The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Chapter 17 Read Raw Data in Fixed Format using Formatted Input Objectives Distinguish between standard and nonstandard numeric data Read standard fixed-field.
A guide to the unknown…  A dataset is longitudinal if it tracks the same type of information on the same subjects at multiple points in time or space.
Statistics in Science  Introducing SAS ® software Acknowlegements to David Williams Caroline Brophy.
The Information Delivery Process Data In Information Out ManageOrganizeExploit.
Quick Data Summaries in SAS Start by bringing in data –Use permanent data set for these examples Proc Tabulate –Produces summaries very quickly and easily.
1 Computer Applications in Epidemiology Dongmei Li Lecture 26 5/6/2009.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
Into to SAS ®. 2 List the components of a SAS program. Open an existing SAS program and run it. Objectives.
Creating SAS® Data Sets
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
Chapter 2: Working with Data in a Project
Chapter 21 Reading Hierarchical Files Reading Hierarchical Raw Data Files.
RTSUG 04Feb2014: Beyond Directory Listings in SAS By: Jim Worley.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Chapter 20 Creating Multiple Observations from a Single Record Objectives Create multiple observations from a single record containing repeating blocks.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
Using Advanced INPUT Techniques Peter Cosette Dave Hall Amy Dunn-Ruiz Eric Lyon.
Lesson 5 - Topics Formatting Output Working with Dates Reading: LSB:3:8-9; 4:1,5-7; 5:1-4.
SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze Data Using.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1.
Chapter 1: Introduction to SAS  SAS programs: A sequence of statements in a particular order  Rules for SAS statements: –Every SAS statement ends in.
BMTRY 789 Lecture 2 SAS Syntax, entering raw data, etc. Lecturer: Annie N. Simpson, MSc. Readings – Chapters 1, 2, 12, & 13 Lab Problems 1.1, 1.2, 1.3,
I OWA S TATE U NIVERSITY Department of Animal Science Getting Your Data Into SAS (Chapter 2 in the Little SAS Book) Animal Science 500 Lecture No. 3 September.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Introduction to SAS Essentials Mastering SAS for Data Analytics
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.
How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging.
BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.
Here’s another problem (see section 2.13 on page 54). A file contains two different types of records (say A’s and B’s) and we only want to read in the.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
MySQL Importing and creating a database. CSV (Comma Separated Values) file CSV = Comma Separated Values – they are simple text files containing data which.
YET ANOTHER TIPS, TRICKS, TRAPS, TECHNIQUES PRESENTATION: A Random Selection of What I Learned From 15+ Years of SAS Programming John Pirnat Kaiser Permanente.
1 Statistical Software Programming. STAT 6360 –Statistical Software Programming Data Input in SAS Many ways to get your data into SAS: –Through data entry.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
Lesson 12 More SGPLOT examples Exporting data Macro variables Table Generation - PROC TABULATE Miscellaneous Topics.
Lecture 4 Ways to get data into SAS Some practice programming
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
Early File I/O To help you get started with your final project 1. Definition of “high level” 2. Is using a High Level function appropriate? 3. xlsread()
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
Chapter 18 Reading Free-Format Data. 2 Objectives Read free-format data not recognized in fixed fields. Read free-format data separated by non-blank delimiters,
Chapter 2 Getting Data into SAS Directly enter data into SAS data sets –use the ViewTable window. You can define columns (variables) with the Column Attributes.
Microsoft Office 2013 Try It! Chapter 4 Storing Data in Access.
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
SAS Programming Training
Lesson 2 Topic - Reading raw data into SAS
SAS Programming Training
Introduction to SAS®.
Instructor: Raul Cruz-Cano 7/9/2012
Chapter 2: Getting Data into SAS
Chapter 1: Introduction to SAS
Match-Merge in the Data Step
Lesson 7 - Topics Reading SAS data sets
Introduction to SAS A SAS program is a list of SAS statements executed in order Every SAS statement ends with a semicolon! SAS statements can be in caps.
Introduction to DATA Step Programming: SAS Basics II
SAS Programming Training
Introduction to SAS Essentials Mastering SAS for Data Analytics
Introduction to SAS Essentials Mastering SAS for Data Analytics
Presentation transcript:

Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)

Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze Data Using Statistical Procedures Data Step PROCs

Raw Data Sources You type it in the SAS program Text file Spreadsheet (Excel) Database (Access, Oracle) SAS dataset

Data in Text Files Delimited data – variables are separated by a special character (e.g. a comma) Fixed position – data is organized into columns Text files are simple character files that you can create or view in a text editor like Notepad. They may also be created as “dumps” from spreadsheet files like excel.

Data delimited with spaces: C D A A C Note: Missing data is identified with a period.

Data delimited with commas C,84,138,93,143 D,89,150,91,140 A,78,116,100,162 A,.,.,86,155 C,81,145,86,140 Note: Missing data is identified with a period.

Data delimited by commas (.csv file) C,84,138,93,143 D,89,150,91,140 A,78,116,100,162 A,,,86,155 C,81,145,86,140 Note: Missing data is identified by multiple commas.

Column Data C D A A C Note: Missing data values are blank.

INFILE and INPUT Statements When you write a SAS program to read in raw data, you’ll use two key statements: The INFILE statement tells SAS where to find the data and how it is organized. The INPUT statement tells SAS which variables to read-in

Program 1 * List Directed Input: Reading data values separated by spaces; DATA bp; INFILE DATALINES; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES; C D A A C ; RUN ; TITLE 'Data Separated by Spaces'; PROC PRINT DATA=bp; RUN; Obs clinic dbp6 sbp6 dbpbl sbpbl 1 C D A A C

PARTIAL SASLOG 1 DATA bp; 2 INFILE DATALINES; 3 INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; 4 DATALINES; NOTE: The data set WORK.BP has 5 observations and 5 variables. NOTE: DATA statement used: real time 0.39 seconds cpu time 0.03 seconds

* List Directed Input: Reading data values separated by commas; DATA bp; INFILE DATALINES DLM = ',' ; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES; C,84,138,93,143 D,89,150,91,140 A,78,116,100,162 A,.,.,86,155 C,81,145,86,140 ; RUN ; TITLE 'Data separated by a comma'; PROC PRINT DATA=bp; RUN;

* List Directed Input: Reading.csv files DATA bp; INFILE DATALINES DLM = ',' DSD ; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES; C,84,138,93,143 D,89,150,91,140 A,78,116,100,162 A,,,86,155 C,81,145,86,140 ; TITLE 'Reading in Data using the DSD Option'; PROC PRINT DATA=bp; RUN; Consecutive commas indicate missing data

* List Directed Input: Reading data values separated by tabs (.txt files); DATA bp; INFILE DATALINES DLM = '09'x DSD; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES; C D A A86155 C ; TITLE 'Reading in Data separated by a tab'; PROC PRINT DATA=bp; RUN;

* Column Input: Data in fixed columns. DATA bp; INFILE DATALINES ; INPUT clinic $ 1-1 dbp6 2-4 sbp6 5-7 dbpbl 8-10 sbpbl ; DATALINES; C D A A C ; Title 'Reading in Data using Column Input'; PROC PRINT DATA=bp; Note: missing data is blank

* Reading data using Pointers and Informats DATA bp; INFILE DATALINES ; clinic dbp6 sbp6 dbpbl sbpbl 3. ; DATALINES; C D A A C ; Title 'Reading in Data using Point/Informats'; PROC PRINT DATA=bp; Informats must end with a period.

* Reading data using Informat Lists DATA quallife; INFILE DATALINES ; INPUT (QL1-QL35) (1.) ; DATALINES; ; Title 'Reading in Data using Informat Lists'; PROC PRINT DATA=quallife; VAR QL1-QL35; RUN; O Q Q Q Q Q Q Q Q Q L L L L L L L L L L L L L L L L L L L L L L L L L L b L L L L L L L L L s

Program 2 * Reading data from an external file DATA bp; INFILE ‘C:\SAS_Files\bp.csv' DSD FIRSTOBS = 2; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl ; TITLE 'Reading in Data from an External File'; PROC PRINT DATA=bp; clinic,dbp6,sbp6,dbpbl,sbpbl C,84,138,93,143 D,89,150,91,140 A,78,116,100,162 A,,,86,155 C,81,145,86,140 Content of bp.csv

PARTIAL SAS LOG 7 DATA bp; 8 INFILE 'C:\SAS_Files\bp.csv' DSD FIRSTOBS=2 ; 9 INPUT clinic $ dbp6 sbp6 dbpbl sbpbl ; NOTE: The infile 'C:\SAS_Files\bp.csv' is: File Name=C:\SAS_Files\bp.csv, RECFM=V,LRECL=256 NOTE: 5 records were read from the infile 'C:\SAS_Files\bp.csv'. The minimum record length was 10. The maximum record length was 16. NOTE: The data set WORK.BP has 5 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.10 seconds cpu time 0.01 seconds

*Reading data from an external file using a FILENAME statement; FILENAME bpdata ‘C:\SAS_Files\bp.csv'; DATA bp; INFILE bpdata DSD FIRSTOBS = 2; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl ; TITLE 'Reading in Data Using FILENAME'; PROC PRINT DATA=bp;

* Using PROC IMPORT to read in data ; * Can skip data step; * Can also try IMPORT Wizard; PROC IMPORT DATAFILE=‘C:\SAS_Files\bp.csv' OUT = bp DBMS = csv REPLACE ; GETNAMES = yes; GUESSINGROWS=9999; TITLE 'Reading in Data Using PROC IMPORT'; PROC PRINT DATA=bp; PROC CONTENTS DATA=bp; Uses first row for variable names

* PC SAS can read excel files directly; PROC IMPORT DATAFILE=‘C:\SAS_Files\bp.xls' OUT = bp DBMS = xls REPLACE ; GETNAMES = yes; TITLE 'Reading in Data from excel'; PROC PRINT DATA=bp; PROC CONTENTS; Uses first row for variable names

The CONTENTS Procedure Data Set Name WORK.BP Observations 5 Member Type DATA Variables 5 Alphabetic List of Variables and Attributes # Variable Type Len Format Informat 1 Clinic Char 1 $1. $1. 2 DBP6 Num 8 BEST12. BEST32. 4 DBPBL Num 8 BEST12. BEST32. 3 SBP6 Num 8 BEST12. BEST32. 5 SBPBL Num 8 BEST12. BEST32.

SOME INFILE OPTIONS OBS - limits number of observations read FIRSTOBS - start reading from this obs. MISSOVER and TRUNCOVER - used to read in data with short records TERMSTR= used when reading PC files on a UNIX machine (or vice versa) LRECL= needed when you have data with long records (> 256 characters)

Problem when reading past default logical record length; DATA temp; INFILE ‘C:\SAS_Files\tomhs.data' OBS=6 ; jntpain 2. ; TITLE 'Data not read in correctly because variable is past default LRECL of 256'; PROC PRINT; Obs jntpain NOTE: Invalid data for jntpain in line 2 NOTE: SAS went to a new line when INPUT statement reached past the end of a line

*Add LRECL option to fix problem ; DATA temp; INFILE ‘C:\…\tomhs.data' OBS=6 LRECL=500; jntpain 2. ; TITLE 'Data read in correctly using LRECL option'; PROC PRINT; Obs jntpain

Reading Special Data 04/11/1982Date 59,365Comma in number Long (>8) characters Informat 04/11/1982mmddyy10. 59,365comma $11.

* Reading special data with fixed position data; DATA info; INFILE DATALINES; ssn taxdate income comma6. ; DATALINES; /12/ , /15/ , /15/ ,999 ; TITLE 'Variables with Special Formats'; PROC PRINT DATA=info; FORMAT taxdate mmddyy10.; Obs ssn taxdate income /12/ /15/ /15/

* Reading special data with list input using colon modifier; DATA info; INFILE DATALINES DLM=“;” DSD; INPUT ssn : $11. taxdate : mmddyy10. income : comma6. ; DATALINES; ;04/12/2001;59, ;03/15/2002;26, ;04/15/2003;44,999 ; TITLE 'Variables with Special Formats'; PROC PRINT DATA=info; FORMAT taxdate mmddyy10.; Obs ssn taxdate income /12/ /15/ /15/

* Using INFORMAT statement to supply input formats; DATA info; INFILE DATALINES DLM=“;” DSD; INFORMAT ssn $11. taxdate mmddyy10. income comma6.; INPUT ssn taxdate income ; DATALINES; ;04/12/2001;59, ;03/15/2002;26, ;04/15/2003;44,999 ; TITLE 'Variables with Special Formats'; PROC PRINT DATA=info; FORMAT taxdate mmddyy10.; Obs ssn taxdate income /12/ /15/ /15/

Summary of Ways of Reading in Data List input - data is separated by a delimiter; must read in all variables. Column input - data is in fixed columns;must know where each variable starts and ends; can read in selected variables Pointers and Informats - alternative to column input; most flexible; must be used for special data PROC IMPORT