Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)

Similar presentations


Presentation on theme: "Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)"— Presentation transcript:

1 Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)

2 Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze Data Using Statistical Procedures Data Step PROCs

3 Raw Data Sources You type it in the SAS program Text file Spreadsheet (Excel) Database (Access, Oracle) SAS dataset

4 Data in Text Files Delimited data – variables are separated by a special character (e.g. a comma) Fixed position – data is organized into columns Text files are simple character files that you can create or view in a text editor like Notepad. They may also be created as “dumps” from spreadsheet files like excel.

5 Data delimited with spaces: C 84 138 93 143 D 89 150 91 140 A 78 116 100 162 A.. 86 155 C 81 145 86 140 Note: Missing data is identified with a period.

6 Data delimited with commas C,84,138,93,143 D,89,150,91,140 A,78,116,100,162 A,.,.,86,155 C,81,145,86,140 Note: Missing data is identified with a period.

7 Data delimited by commas (.csv file) C,84,138,93,143 D,89,150,91,140 A,78,116,100,162 A,,,86,155 C,81,145,86,140 Note: Missing data is identified by multiple commas.

8 Column Data C084138093143 D089150091140 A078116100162 A 086155 C081145086140 Note: Missing data values are blank.

9 INFILE and INPUT Statements When you write a SAS program to read in raw data, you’ll use two key statements: The INFILE statement tells SAS where to find the data and how it is organized. The INPUT statement tells SAS which variables to read-in

10 Program 1 * List Directed Input: Reading data values separated by spaces; DATA bp; INFILE DATALINES; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES; C 84 138 93 143 D 89 150 91 140 A 78 116 100 162 A.. 86 155 C 81 145 86 140 ; RUN ; TITLE 'Data Separated by Spaces'; PROC PRINT DATA=bp; RUN; Obs clinic dbp6 sbp6 dbpbl sbpbl 1 C 84 138 93 143 2 D 89 150 91 140 3 A 78 116 100 162 4 A.. 86 155 5 C 81 145 86 140

11 PARTIAL SASLOG 1 DATA bp; 2 INFILE DATALINES; 3 INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; 4 DATALINES; NOTE: The data set WORK.BP has 5 observations and 5 variables. NOTE: DATA statement used: real time 0.39 seconds cpu time 0.03 seconds

12 * List Directed Input: Reading data values separated by commas; DATA bp; INFILE DATALINES DLM = ',' ; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES; C,84,138,93,143 D,89,150,91,140 A,78,116,100,162 A,.,.,86,155 C,81,145,86,140 ; RUN ; TITLE 'Data separated by a comma'; PROC PRINT DATA=bp; RUN;

13 * List Directed Input: Reading.csv files DATA bp; INFILE DATALINES DLM = ',' DSD ; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES; C,84,138,93,143 D,89,150,91,140 A,78,116,100,162 A,,,86,155 C,81,145,86,140 ; TITLE 'Reading in Data using the DSD Option'; PROC PRINT DATA=bp; RUN; Consecutive commas indicate missing data

14 * List Directed Input: Reading data values separated by tabs (.txt files); DATA bp; INFILE DATALINES DLM = '09'x DSD; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl; DATALINES; C8413893143 D8915091140 A78116100162 A86155 C8114586140 ; TITLE 'Reading in Data separated by a tab'; PROC PRINT DATA=bp; RUN;

15 * Column Input: Data in fixed columns. DATA bp; INFILE DATALINES ; INPUT clinic $ 1-1 dbp6 2-4 sbp6 5-7 dbpbl 8-10 sbpbl 11-13 ; DATALINES; C084138093143 D089150091140 A078116100162 A 086155 C081145086140 ; Title 'Reading in Data using Column Input'; PROC PRINT DATA=bp; Note: missing data is blank

16 * Reading data using Pointers and Informats DATA bp; INFILE DATALINES ; INPUT @1 clinic $1. @2 dbp6 3. @5 sbp6 3. @8 dbpbl 3. @11 sbpbl 3. ; DATALINES; C084138093143 D089150091140 A078116100162 A 086155 C081145086140 ; Title 'Reading in Data using Point/Informats'; PROC PRINT DATA=bp; Informats must end with a period.

17 * Reading data using Informat Lists DATA quallife; INFILE DATALINES ; INPUT (QL1-QL35) (1.) ; DATALINES; 31232242414444223544545354455342324 21112353214352552525522662566553533 21122252241333356262662366266551525 ; Title 'Reading in Data using Informat Lists'; PROC PRINT DATA=quallife; VAR QL1-QL35; RUN; O Q Q Q Q Q Q Q Q Q L L L L L L L L L L L L L L L L L L L L L L L L L L b L L L L L L L L L 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 s 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 1 3 1 2 3 2 2 4 2 4 1 4 4 4 4 2 2 3 5 4 4 5 4 5 3 5 4 4 5 5 3 4 2 3 2 4 2 2 1 1 1 2 3 5 3 2 1 4 3 5 2 5 5 2 5 2 5 5 2 2 6 6 2 5 6 6 5 5 3 5 3 3 3 2 1 1 2 2 2 5 2 2 4 1 3 3 3 3 5 6 2 6 2 6 6 2 3 6 6 2 6 6 5 5 1 5 2 5

18 Program 2 * Reading data from an external file DATA bp; INFILE ‘C:\SAS_Files\bp.csv' DSD FIRSTOBS = 2; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl ; TITLE 'Reading in Data from an External File'; PROC PRINT DATA=bp; clinic,dbp6,sbp6,dbpbl,sbpbl C,84,138,93,143 D,89,150,91,140 A,78,116,100,162 A,,,86,155 C,81,145,86,140 Content of bp.csv

19 PARTIAL SAS LOG 7 DATA bp; 8 INFILE 'C:\SAS_Files\bp.csv' DSD FIRSTOBS=2 ; 9 INPUT clinic $ dbp6 sbp6 dbpbl sbpbl ; NOTE: The infile 'C:\SAS_Files\bp.csv' is: File Name=C:\SAS_Files\bp.csv, RECFM=V,LRECL=256 NOTE: 5 records were read from the infile 'C:\SAS_Files\bp.csv'. The minimum record length was 10. The maximum record length was 16. NOTE: The data set WORK.BP has 5 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.10 seconds cpu time 0.01 seconds

20 *Reading data from an external file using a FILENAME statement; FILENAME bpdata ‘C:\SAS_Files\bp.csv'; DATA bp; INFILE bpdata DSD FIRSTOBS = 2; INPUT clinic $ dbp6 sbp6 dbpbl sbpbl ; TITLE 'Reading in Data Using FILENAME'; PROC PRINT DATA=bp;

21 * Using PROC IMPORT to read in data ; * Can skip data step; * Can also try IMPORT Wizard; PROC IMPORT DATAFILE=‘C:\SAS_Files\bp.csv' OUT = bp DBMS = csv REPLACE ; GETNAMES = yes; GUESSINGROWS=9999; TITLE 'Reading in Data Using PROC IMPORT'; PROC PRINT DATA=bp; PROC CONTENTS DATA=bp; Uses first row for variable names

22 * PC SAS can read excel files directly; PROC IMPORT DATAFILE=‘C:\SAS_Files\bp.xls' OUT = bp DBMS = xls REPLACE ; GETNAMES = yes; TITLE 'Reading in Data from excel'; PROC PRINT DATA=bp; PROC CONTENTS; Uses first row for variable names

23 The CONTENTS Procedure Data Set Name WORK.BP Observations 5 Member Type DATA Variables 5 Alphabetic List of Variables and Attributes # Variable Type Len Format Informat 1 Clinic Char 1 $1. $1. 2 DBP6 Num 8 BEST12. BEST32. 4 DBPBL Num 8 BEST12. BEST32. 3 SBP6 Num 8 BEST12. BEST32. 5 SBPBL Num 8 BEST12. BEST32.

24 SOME INFILE OPTIONS OBS - limits number of observations read FIRSTOBS - start reading from this obs. MISSOVER and TRUNCOVER - used to read in data with short records TERMSTR= used when reading PC files on a UNIX machine (or vice versa) LRECL= needed when you have data with long records (> 256 characters)

25 Problem when reading past default logical record length; DATA temp; INFILE ‘C:\SAS_Files\tomhs.data' OBS=6 ; INPUT @260 jntpain 2. ; TITLE 'Data not read in correctly because variable is past default LRECL of 256'; PROC PRINT; Obs jntpain 1. 2. 3. NOTE: Invalid data for jntpain in line 2 NOTE: SAS went to a new line when INPUT statement reached past the end of a line

26 *Add LRECL option to fix problem ; DATA temp; INFILE ‘C:\…\tomhs.data' OBS=6 LRECL=500; INPUT @260 jntpain 2. ; TITLE 'Data read in correctly using LRECL option'; PROC PRINT; Obs jntpain 1 1 2 1 3 1 4 1 5 1 6 2

27 Reading Special Data 04/11/1982Date 59,365Comma in number 086-59-9054Long (>8) characters Informat 04/11/1982mmddyy10. 59,365comma6. 086-59-9054$11.

28 * Reading special data with fixed position data; DATA info; INFILE DATALINES; INPUT @1 ssn $11. @13 taxdate mmddyy10. @25 income comma6. ; DATALINES; 086-59-9054 04/12/2001 59,365 405-65-0987 03/15/2002 26,925 212-44-9054 04/15/2003 44,999 ; TITLE 'Variables with Special Formats'; PROC PRINT DATA=info; FORMAT taxdate mmddyy10.; Obs ssn taxdate income 1 086-59-9054 04/12/2001 59365 2 405-65-0987 03/15/2002 26925 3 212-44-9054 04/15/2003 44999

29 * Reading special data with list input using colon modifier; DATA info; INFILE DATALINES DLM=“;” DSD; INPUT ssn : $11. taxdate : mmddyy10. income : comma6. ; DATALINES; 086-59-9054;04/12/2001;59,365 405-65-0987;03/15/2002;26,925 212-44-9054;04/15/2003;44,999 ; TITLE 'Variables with Special Formats'; PROC PRINT DATA=info; FORMAT taxdate mmddyy10.; Obs ssn taxdate income 1 086-59-9054 04/12/2001 59365 2 405-65-0987 03/15/2002 26925 3 212-44-9054 04/15/2003 44999

30 * Using INFORMAT statement to supply input formats; DATA info; INFILE DATALINES DLM=“;” DSD; INFORMAT ssn $11. taxdate mmddyy10. income comma6.; INPUT ssn taxdate income ; DATALINES; 086-59-9054;04/12/2001;59,365 405-65-0987;03/15/2002;26,925 212-44-9054;04/15/2003;44,999 ; TITLE 'Variables with Special Formats'; PROC PRINT DATA=info; FORMAT taxdate mmddyy10.; Obs ssn taxdate income 1 086-59-9054 04/12/2001 59365 2 405-65-0987 03/15/2002 26925 3 212-44-9054 04/15/2003 44999

31 Summary of Ways of Reading in Data List input - data is separated by a delimiter; must read in all variables. Column input - data is in fixed columns;must know where each variable starts and ends; can read in selected variables Pointers and Informats - alternative to column input; most flexible; must be used for special data PROC IMPORT


Download ppt "Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)"

Similar presentations


Ads by Google