11 Chapter 3: Reading and Processing Data 3.1 Processing SAS Data Sets 3.2 Processing External Files.

Slides:



Advertisements
Similar presentations
Axio Research E-Compare A Tool for Data Review Bill Coar.
Advertisements

Examples from SAS Functions by Example Ron Cody
Chapter 9: Introducing Macro Variables 1 © Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Introduction to SAS Programming Christina L. Ughrin Statistical Software Consulting Some notes pulled from SAS Programming I: Essentials Training.
Statistics in Science  Introducing SAS ® software Acknowlegements to David Williams Caroline Brophy.
Chapter 1: Getting Started
Lecture-5 Though SQL is the natural language of the DBA, it suffers from various inherent disadvantages, when used as a conventional programming language.
Basic And Advanced SAS Programming
1 Chapter 3: Getting Started with Tasks 3.1 Introduction to Tasks and Wizards 3.2 Creating a Frequency Report 3.3 Generating HTML, PDF, and RTF Output.
PROC_CODEBOOK: An Automated, General Purpose Codebook Generator
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Creating SAS® Data Sets
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
“SAS macros are just text substitution!” “ARRRRGGHHH!!!”
1 Chapter 3: Macro Definitions 3.1 Defining and Calling a Macro 3.2 Macro Parameters 3.3 Macro Storage (Self-Study)
11 Chapter 2: Working with Data in a Project 2.1 Introduction to Tabular Data 2.2 Accessing Local Data 2.3 Importing Text Files 2.4 Editing Tables in the.
Chapter 2: Working with Data in a Project
Chapter 10:Processing Macro Variables at Execution Time 1 STAT 541 © Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
1 Chapter 5: Creating Summarized Output 5.1 Generating Summary Statistics 5.2 Creating a Summary Report with the Summary Tables Task 5.3 Creating and Applying.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
1 Chapter 9 Writing, Testing, and Debugging Access Applications.
1 Chapter 4: Creating Simple Queries 4.1 Introduction to Querying Data 4.2 Filtering and Sorting Data 4.3 Creating New Columns with an Expression 4.4 Grouping.
1 Chapter 5: Macro Programs 5.1 Conditional Processing 5.2 Parameter Validation 5.3 Iterative Processing 5.4 Global and Local Symbol Tables.
1 Chapter 1: Introduction 1.1 Course Logistics 1.2 Purpose of the Macro Facility 1.3 Program Flow.
An Animated Guide©: Sending SAS files to Excel Concentrating on a D.D.E. Macro.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
11 Chapter 4: Developing Reusable Macros 4.1 Introduction 4.2 Developing Macro Routines 4.3 Developing Macro Functions.
Chapter 6 SAS ® OLAP Cube Studio. Section 6.1 SAS OLAP Cube Studio Architecture.
SAS Macro: Some Tips for Debugging Stat St. Paul’s Hospital April 2, 2007.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
1 Back Up with Each Submit One approach for keeping a dynamic back up copy of your current work.
Chapter 1: Introduction to SAS  SAS programs: A sequence of statements in a particular order  Rules for SAS statements: –Every SAS statement ends in.
5/30/2010 SAS Macro Language Group 6 Pradnya Nimkar, Li Lin, Linsong Zhang & Loc Tran.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.
Macro Variable Resolution Enio Presutto York University, Toronto, Canada.
Introduction to SAS Macros Center for Statistical Consulting Short Course April 15, 2004.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
1 Chapter 6: Using Prompts in Tasks and Queries 6.1 Prompting in Projects 6.2 Creating and Using Prompts in Tasks 6.3 Creating and Using Prompts in Queries.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
SAS for Data Management and Analysis
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Chapter 14: Combining Data Vertically 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapter 25 By Tasha Chapman, Oregon Health Authority.
Copyright 2009 The Little Engine That Could: Using EXCEL LIBNAME Engine Options to Enhance Data Transfers between SAS® and Microsoft® Excel Files William.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 14 & 19 By Tasha Chapman, Oregon Health Authority.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Build your Metadata with PROC CONTENTS and ODS OUTPUT Louise S. Hadden Abt Associates Inc.
Better Metadata Through SAS® II: %SYSFUNC, PROC DATASETS, and Dictionary Tables.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
Hints and Tips SAUSAG Q SORTING – NOUNIQUEKEY The NOUNIQUEKEY option on PROC SORT is a useful way in 9.3 to easily retain only those records with.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Chapter 3: Getting Started with Tasks
Chapter 10: Accessing Relational Databases (Self-Study)
Two “identical” programs
Conditional Processing
Topics Introduction to File Input and Output
SAS Essentials How SAS Thinks
Creating Macro Variables in the DATA Step
Make Your Code File Driven Methods to let SAS collect file names in your system Lu Zhang Beijing, China 1.
Global and Local Symbol Tables
Retrieving Macro Variables in the DATA Step
3 Iterative Processing.
Introduction to DATA Step Programming: SAS Basics II
Dictionary Tables and Views, obtain information about SAS files
Topics Introduction to File Input and Output
Presentation transcript:

11 Chapter 3: Reading and Processing Data 3.1 Processing SAS Data Sets 3.2 Processing External Files

22 Chapter 3: Reading and Processing Data 3.1 Processing SAS Data Sets 3.2 Processing External Files

3 Objectives 3 Use SAS file I/O functions to manipulate SAS data sets. Retrieve metadata.

4 Managing SAS Data Sets The Orion Star programmers need macros to perform the following data management tasks: 1.Test the existence of a data set. 2.Determine the number of observations in a data set. 3.Determine the age of a data set. 4.Archive a data set. 5.Create a data set for every worksheet in an Excel workbook. They decided to use the SAS File I/O functions and metadata to accomplish these tasks. 4

5 Using Functions to Manipulate Files SAS supports different ways to manipulate and obtain information about SAS files and other files. Many of these techniques require a DATA step or PROC step to be part of the SAS code. Some functions, generally used in the DATA step and SCL, permit direct access to files. These functions, when used with the macro facility, enable the same direct access without introducing additional program steps. The functions can be categorized into two groups: SAS file I/O functions external file functions 5

6 SAS File I/O Functions Functions to access a SAS data set: EXIST OPEN CLOSE Functions to access data set descriptor information: DSNAME VARNUM ATTRC ATTRN Functions to access data library information: LIBREF PATHNAME 6

7 Task 1: Determine Data Set Existence Use the EXIST function to test for the existence of a data set before progressing further into a macro program. 7 %macro printds(dset); %if %sysfunc(exist(&dset))= 0 %then %do; %put ERROR: Data set &dset does not exist.; %put ERROR- Macro will terminate now.; %return; %end; proc print data=&dset (obs=10) noobs; title "First 10 Observations from &dset"; run; %mend printds; m203d01

8 Task 1: Determine Data Set Existence Partial SAS Log 8 %printds(orion.daily_sales) NOTE: There were 10 observations read from the data set ORION.DAILY_SALES. NOTE: PROCEDURE PRINT used: real time 0.01 seconds cpu time 0.00 seconds 29 %printds(orion.daily) ERROR: Data set orion.daily does not exist. Macro will terminate now. m203d01

9 Task 2: Obtain Attribute Information The Orion Star programmers found that many times a data set might exist but is empty. They want to verify that a data set is not empty before performing further processing. The following steps provide data set attribute information: 1.Open the data set using the OPEN function. 2.Retrieve a numeric attribute using the ATTRN function. 3.Retrieve a character attribute using the ATTRC function. 4.Close the data set using the CLOSE function. 9

10 Step 1: Open the SAS Data Set The OPEN function opens a SAS data set and returns a unique numeric data set identifier. The data set identifier, a nonzero positive number, is used in most other SAS File I/O functions. The OPEN function returns 0 if the data set cannot be opened. General form of the OPEN function: Partial SAS Log 10 OPEN(data-set-name) 4 %let dsid=%sysfunc(open(orion.daily_sales)); 5 %put dsid=&dsid; dsid=1

11 Step 2: Use the ATTRN Function The ATTRN function returns the value of a numeric attribute of a data set. General form of the ATTRN function: Selected attribute-name values and descriptions: 11 ATTRN(data-set-identifier, attribute-name) CRDTE creation date (SAS datetime value) MODTE the last modified date (SAS datetime value) NVARS number of variables ISINDEX whether a data set is indexed (0 or 1) NLOBS number of non-deleted observations

12 Step 3: Use the ATTRC Function The ATTRC function returns the value of a character attribute of a data set. General form of the ATTRC function: Selected attribute-name values and descriptions: 12 ATTRC(data-set-identifier, attribute-name) SORTEDBY BY variables (if data set is sorted) LABEL data set label MEM data set name LIB current libref for the data set

13 Step 4: Close the SAS Data Set The CLOSE function closes a SAS data set. The CLOSE function returns 0 if the operation was successful and returns a nonzero value if it was not successful. General form of the CLOSE function: Partial SAS Log It is important to close all SAS data sets as soon as they are no longer needed by the application. 13 CLOSE(data-set-identifier) 6 %let dsidc=%sysfunc(close(&dsid)); 7 %put dsidc=&dsidc; dsidc=0

14 Obtaining Number of Observations Use the NLOBS attribute to obtain the number of observations in a data set and assign this value to a macro variable. 14 %macro numobs(dsn); %local dsid nobs dsidc; %let dsn=%upcase(&dsn); %let dsid=%sysfunc(open(&dsn)); %let nobs=%sysfunc(attrn(&dsid,nlobs)); %let dsidc=%sysfunc(close(&dsid)); %if &nobs=0 %then %do; %put ERROR: &dsn contains 0 Observations.; %put ERROR- PROC PRINT will not execute.; %return; %end; proc print data=&dsn (obs=10) noobs; title "First 10 Observations"; title2 "&dsn Contains &nobs Observations"; run; %mend numobs; m203d02

15 Obtaining Number of Observations Partial SAS Log %numobs(orion.daily_sales) NOTE: There were 10 observations read from the data set ORION.DAILY_SALES. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 232 %numobs(orion.no_rows) ERROR: ORION.NO_ROWS contains 0 Observations. PROC PRINT will not execute. m203d02

16 Obtaining Number of Observations PROC PRINT Output 16 First 10 Observations ORION.DAILY_SALES Contains 58 Observations Total_ Product_ID Product_Name Retail_Price Pro Fit Gel Gt 2030 Women's Running Shoes $ Big Guy Men's Air Terra Sebec Shoes $ Bretagne Performance Tg Men's Golf Shoes L. $ Armadillo Road Dmx Women's Running Shoes $ Hardcore Men's Street Shoes Large $ Bretagne Stabilites 2000 Goretex Shoes $ Big Guy Men's Air Deschutz Viii Shoes $ Big Guy Men's Air Terra Reach Shoes $ Lulu Men's Street Shoes $ Bretagne Stabilities Tg Men's Golf Shoes $99.70 m203d02

17

Quiz 1.Open the program m203a01. 2.Add the syntax to create the macro variable SORTED that contains the SORTEDBY= attribute for the data set orion.staff. What is the value of &SORTED? 18 m203a01 %let dsn=orion.staff; %let openrc=%sysfunc(open(&dsn)); %let sorted= ; %let closerc=%sysfunc(close(&openrc)); %put Data set &dsn is sorted by &sorted..;

Quiz – Correct Answer 1.Open the program m203a01. 2.Add the syntax to create the macro variable SORTED that contains the SORTEDBY= attribute for the data set orion.staff. What is the value of &SORTED? Employee_ID 19 m203a01 %let dsn=orion.staff; %let openrc=%sysfunc(open(&dsn)); %let sorted=%sysfunc(attrc(&openrc,sortedby)); %let closerc=%sysfunc(close(&openrc)); %put Data set &dsn is sorted by &sorted..;

20 Task 3: Determine the Age of a SAS Data Set The Orion Star programmers need a way to determine when to refresh a data set. They decided to use the CRDTE attribute to calculate the age of a data set. 20 m203d03 %macro age(dsn); %local dsid crdate dsidc days; %let dsid=%sysfunc(open(&dsn)); %let crdate=%sysfunc(attrn(&dsid,crdte)); %let dsidc=%sysfunc(close(&dsid)); %let days=%sysevalf("&sysdate9"d -%sysfunc(datepart(&crdate))); %if &days > 0 %then %do; %put WARNING: &dsn is &days day(s) old. It is being recreated.; data &dsn; infile 'orders03.dat'; input Order_ID Order_Type Order_Date : date9.; format Order_Date date9.; run; %end; %else %put NOTE: &dsn is current.; %mend age;

21 Task 3: Determine the Age of a SAS Data Set Partial SAS Log 21 22%age(orion.orders03) WARNING: orion.orders03 is 1 day(s) old. It is being recreated. NOTE: The infile 'orders03.dat' is: Filename=C:\workshop\orders03.dat, RECFM=V,LRECL=256,File Size (bytes)=2496, Last Modified=31Jan2008:18:09:56, Create Time=16Jun2008:17:09:05 NOTE: 104 records were read from the infile 'orders03.dat'. The minimum record length was 22. The maximum record length was 22. NOTE: The data set ORION.ORDERS03 has 104 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.14 seconds cpu time 0.06 seconds 23%age(orion.orders03) NOTE: orion.orders03 is current. m203d03

22 Task 4: Archive a SAS Data Set Because many of Orion Star’s macro applications refresh SAS data sets, the programmers want to archive the current data set before the data set is refreshed. They decided to concatenate today’s date to the end of the data set name, using the RENAME and TODAY functions. General form of the RENAME function: General form of the TODAY function: 22 RENAME(old-name, new-name) TODAY( )

23 Task 4: Archive a SAS Data Set Partial PROC CONTENTS Output 23 m203d04 %let newname=daily_sales_%sysfunc(today(), date9.); %let rc=%sysfunc(rename(orion.daily_sales, &newname)); proc contents data=orion._all_ nods; run; Member File # Name Type Size Last Modified 1 COUNTRY DATA Jul08:23:11:48 COUNTRY INDEX Jul08:23:11:48 2 CUSTOMER DATA Jul08:22:28:42 3 CUSTOMER_DIM DATA Dec07:09:05:44 4 CUSTOMER_TYPE DATA Jul08:01:29:54 CUSTOMER_TYPE INDEX Jul08:01:29:54 5 DAILY_SALES_07OCT2008 DATA Aug08:14:18:18 6 ORDER_FACT DATA Jul08:19:45:26 7 SALES DATA Jul08:21:40:55

24 Task 5: Create Data Sets from Worksheets The Orion Star programmers need a macro to import every worksheet in a given Excel workbook. 24 %READXLS Australia$United States$ Sales.xls AustraliaUnitedStates

25 Task 5: Create Data Sets from Worksheets The programmers will use SAS session metadata that is available via PROC SQL DICTIONARY tables or Sashelp views. The metadata includes information on the following: SAS files external files macro variables system options, titles, and footnotes 25

26 Task 5: Create Data Sets from Worksheets The macro will incorporate these elements: SAS/ACCESS LIBNAME statement sashelp.vtable an iterative %DO loop indirect macro variable references 26

27 Reading Excel Files Using the LIBNAME Statement The SAS/ACCESS LIBNAME statement extends the LIBNAME statement to support assigning a library reference name (libref) to Microsoft Excel workbooks. This enables you to reference worksheets directly in a DATA step or SAS procedure. Each worksheet in the Excel workbook is treated as though it were a SAS data set. 27 libname xlsdata 's:\workshop\c3\sales.xls'; proc contents data=xlsdata._all_; run; m203d05

28 Partial PROC CONTENTS Output 28 The CONTENTS Procedure Directory Libref XLSDATA Engine EXCEL Physical Name sales.xls User Admin DBMS Member Member # Name Type Type 1 Australia$ DATA TABLE 2 UnitedStates$ DATA TABLE Reading Excel Files Using the LIBNAME Statement

29 Reading Excel Files Using the LIBNAME Statement All worksheets will be referenced with a SAS two-level name, that is, libref.data-set-name. If the worksheet name contains special characters, you must use the SAS name literal construct of "name"n. 29 data australia; set xlsdata.'Australia$'n; run; m203d05

30 Using SAS Session Metadata Use sashelp.vtable to create a series of macro variables that contain the member names. Partial SAS Log 30 data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname); if last then call symputx('n', _n_); run; 473 %put _user_; GLOBAL SHEET1 Australia$ GLOBAL SHEET2 UnitedStates$ GLOBAL N 2 m203d05

31

Quiz Open the program m203a02 and replace the question marks in the SYMPUTX routine so that it creates macro variables containing the names of all of the data sets in the ORION library. 32 data _null_; set sashelp.vtable; where libname='ORION'; call symputx(cats('dsn', _N_), ??????????); run; %put _user_; m203a02

Quiz – Correct Answer Open the program m203a02 and replace the question marks in the SYMPUTX routine so that it creates macro variables containing the names of all of the data sets in the ORION library. 33 data _null_; set sashelp.vtable end=last; where libname='ORION'; call symputx(cats('dsn', _N_), memname); run; m203a02

34 Iterative %DO Loops (Review) The iterative %DO statement executes a section of a macro repetitively, based on the value of an index variable. General form of the iterative %DO statement: 34 %DO index-variable=start %TO stop ; text %END; %DO index-variable=start %TO stop ; text %END; %macro putloop; %do i=1 %to &n; %put Sheet&i is &&sheet&i; %end; %mend putloop; m203d05

35 The indirect reference causes a second scan of the macro variable reference. Partial Symbol Table 35 Indirect Macro Variable References (Review) reference 1st scan &&sheet&i &sheet1 Australia$ 2nd scan VariableValue I1 SHEET1Australia$ SHEET2UnitedStates$

36

Quiz How would you use indirect references to refer to the macro variables created in m203a02 so that you can use them in the following DO loop? 37 %do i=1 %to &n; %put The values of the macro variables are __________ ; %end;

Quiz – Correct Answer How would you use indirect references to refer to the macro variables created in m203a02 so that you can use them in the following DO loop? 38 do i=1 %to &n; %put The names of the macro variables are &&dsn&i ; %end;

39 Processing a Data Library Use a %DO loop to generate a DATA step and a PROC PRINT step for every worksheet in an Excel workbook. 39 m203d06 %macro readxls(workbook); libname xlsdata "&workbook"; data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname,'L'); if last then call symputx('n',_n_,'L'); run; %do i=1 %to &n; %let len=%eval(%length(&&sheet&i)-1); %let dsn=%substr(&&sheet&i,1,&len); data work.&dsn; set xlsdata."&&sheet&i"n; run; proc print data=work.&dsn; run; %end; libname xlsdata clear; %mend readxls; %readxls(sales.xls)...

40 Processing a Data Library The %LENGTH function is used to return the number of characters in &&SHEET&I. 40 %macro readxls(workbook); libname xlsdata "&workbook"; data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname,'L'); if last then call symputx('n',_n_,'L'); run; %do i=1 %to &n; %let len=%eval(%length(&&sheet&i)-1); %let dsn=%substr(&&sheet&i,1,&len); data work.&dsn; set xlsdata."&&sheet&i"n; run; proc print data=work.&dsn; run; %end; libname xlsdata clear; %mend readxls; %readxls(sales.xls)... m203d06

41 Processing a Data Library The %EVAL function enables subtraction of 1 from that length to create a macro variable LEN that is the length of the spreadsheet name without the $. 41 %macro readxls(workbook); libname xlsdata "&workbook"; data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname,'L'); if last then call symputx('n',_n_,'L'); run; %do i=1 %to &n; %let len=%eval(%length(&&sheet&i)-1); %let dsn=%substr(&&sheet&i,1,&len); data work.&dsn; set xlsdata."&&sheet&i"n; run; proc print data=work.&dsn; run; %end; libname xlsdata clear; %mend readxls; %readxls(sales.xls)...

42 Processing a Data Library The %SUBSTR function creates a macro variable DSN, beginning at position 1, for the length number of characters in the macro variable LEN. 42 %macro readxls(workbook); libname xlsdata "&workbook"; data _null_; set sashelp.vtable end=last; where libname="XLSDATA"; call symputx(cats('sheet', _n_), memname,'L'); if last then call symputx('n',_n_,'L'); run; %do i=1 %to &n; %let len=%eval(%length(&&sheet&i)-1); %let dsn=%substr(&&sheet&i,1,&len); data work.&dsn; set xlsdata."&&sheet&i"n; run; proc print data=work.&dsn; run; %end; libname xlsdata clear; %mend readxls; %readxls(sales.xls)

43 Processing a Data Library Partial SAS Log 43 m203d06 %readxls(sales.xls) NOTE: There were 63 observations read from the data set XLSDATA.'Australia$'n. NOTE: The data set WORK.AUSTRALIA has 63 observations and 9 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds NOTE: There were 63 observations read from the data set WORK.AUSTRALIA. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds NOTE: There were 102 observations read from the data set XLSDATA.'UnitedStates$'n. NOTE: The data set WORK.UNITEDSTATES has 102 observations and 9 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds NOTE: There were 102 observations read from the data set WORK.UNITEDSTATES. NOTE: PROCEDURE PRINT used (Total process time): real time 0.01 seconds cpu time 0.01 seconds

44 Exercise This exercise reinforces the concepts discussed previously. 44

45 Chapter 3: Reading and Processing Data 3.1 Processing SAS Data Sets 3.2 Processing External Files

46 Objectives Use external file functions to examine files that are not SAS files. 46

47 Processing External Files The Orion Star programmers want to reduce redundant code when reading multiple external files into SAS data sets. The applications should be able to process the files in a given directory and subdirectory in order to accomplish the following tasks: 1.Process all DAT files. 2.Import all CSV files. 3.Read every worksheet in all of the Excel workbooks. They decided to use the external file functions to accomplish these three tasks. 47

48 External File Functions Functions to access a directory: DOPEN DNUM DREAD DCLOSE Functions to access an external file: FILEEXIST and FEXIST FILENAME FOPEN FCLOSE Functions to read from or write to an external file: FREAD FGET FPUT and FWRITE 48

49 Processing External Files 1.Use the FILENAME function to assign a fileref to the directory. 2.Use the DOPEN function to open the directory. 3.Use the DNUM function to identify how many members are in the directory. 4.Use the DREAD function to extract each member name. 5.Process the external files. 6.Use the DCLOSE function to close the directory. 49 %SYSFUNC is required to execute these functions within the macro facility. The DOPEN, DNUM, and DREAD functions enable access to all the external files found in a given directory. Use these steps for processing files from a directory:

50 Steps 1 and 2: Access a Directory For applications to extract information about a directory and its contents, it is necessary to first open the directory using the DOPEN function. If it is successful, the function returns a directory identifier. 50 m203d07 %macro direxist(dir); %local fileref rc did didc; %let rc=%sysfunc(filename(fileref,&dir)); %let did=%sysfunc(dopen(&fileref)); %if &did=0 %then %do; %put ERROR: Directory does not exist; %return; %end; %put NOTE: Directory ID is &did ; %let didc=%sysfunc(dclose(&did)); %let rc=%sysfunc(filename(fileref)); %mend direxist; %dirlist(s:\workshop)

51 Steps 1 and 2: Access a Directory Partial SAS Log %direxist(s:\workshop) NOTE: Directory ID is %direxist(s:\bad directory) ERROR: Directory does not exist m203d07

52 Steps 3 and 4: Identify Members in a Directory To extract a list of member names, use the DNUM and DREAD functions. 52 %macro dirlist(dir); %local fileref rc did dnum dmem memname didc; %let rc=%sysfunc(filename(fileref,&dir)); %let did=%sysfunc(dopen(&fileref)); %if &did=0 %then %do; %put ERROR: Directory does not exist; %return; %end; %let dnum=%sysfunc(dnum(&did)); %do dmem=1 %to &dnum; %let memname=%sysfunc(dread(&did,&dmem)); %put &memname; %end; %let didc=%sysfunc(dclose(&did)); %let rc=%sysfunc(filename(fileref)); %mend dirlist; %dirlist(s:\workshop) m203d08

53

Quiz Open the program m203d08, submit it, and investigate the log. 1.Are the extensions of the raw data files in uppercase or lowercase? 2.Are the extensions of the Excel workbooks in uppercase or lowercase? 54

Quiz – Correct Answer Open the program m203d08, submit it, and investigate the log. 1.Are the extensions of the raw data files in uppercase or lowercase? The extension is DAT in lowercase. 2.Are the extensions of the Excel workbooks in uppercase or lowercase? The extension is XLS in lowercase. 55

56 Steps 3 and 4: Identify Members in a Directory Partial SAS Log 56 m203d %dirlist(s:\workshop) age.sas attrc.sas attrn.sas between.sas C2 C3 C4 C5 charlist.sas club_members.sas7bdat country.sas7bdat country_lookup.sas7bdat customer.sas7bdat customer_dim.sas7bdat customer_type.sas7bdat daily_sales.sas7bdat daily_sales.xls delsql.sas delvars.sas

57 Task 1: Reading All DAT Files in a Directory This demonstration illustrates reading each raw data file in a directory into a SAS data set. 57 m203d09

58 Task 2: Read All Excel Workbooks The Orion Star programmers want a single macro to import all Excel files found in a given directory. 58 %READXLS order_type.xls customertype.xlsdaily_sales.xlsOrderFact.xlsSales.xls order_typecustomertypesalesorderfactdaily_sales

59 Task 2: Read All Excel Workbooks Currently the READXLS macro accepts a single workbook name as a parameter. The programmers want to enhance the macro to read all workbooks in a directory. Partial SAS Code 59 %macro readxls(dir); %local fileref rc did dnum dmem memname len dsn didc; %let rc=%sysfunc(filename(fileref,&dir)); %let did=%sysfunc(dopen(&fileref)); %if &did=0 %then %do; %put ERROR: Directory does not exist; %return; %end; %let dnum=%sysfunc(dnum(&did)); %do dmem=1 %to &dnum; %let memname=%sysfunc(dread(&did,&dmem)); %if %upcase(%scan(&memname,-1,.))=XLS %then %do; m203d10

60 Task 2: Reading All Excel Files in a Directory This demonstration illustrates reading all of the worksheets in all workbooks in a directory into a SAS data set. 60 m203d10

61 Task 3: Read All Excel Files in Subdirectories To implement subdirectory recursion, use the %SCAN function to extract the second word of the member name where the period is the delimiter. n If the second word is XLS, then read the Excel spreadsheet. n If the second word resolves to null, there is no extension, so the first word identifies a subdirectory. Therefore, call the macro again. 61 m203d11 %else %if %scan(&memname,2,.)= %then %readxls(&dir\&memname); Partial SAS Code

62 Task 3: Reading All Excel Files in Subdirectories This demonstration illustrates reading all of the worksheets in all workbooks in a directory and a subdirectory into a SAS data set. 62 m203d11

63 Exercise This exercise reinforces the concepts discussed previously. 63