ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University
ISQS 6347, Data & Text Mining 2 Outline An overview of data preparation for analytics SAS Programming Essentials Running SAS programs Mastering fundamental concepts SAS program debugging Make use of SAS Enterprise Guide for programming
ISQS 6347, Data & Text Mining 3 Structure and Components of Business Intelligence
ISQS 6347, Data & Text Mining 4 Overview: From Data Warehousing to Data Analysis Previous major topics in data warehousing (using SQL Server 2008) Dimensional model design ETL Cubes design and OLAP Data analysis topics (using SAS) Data preparation Analytic business questions Data format and data conversion Data cleansing Data exploratory Data analysis Data visualization
ISQS 6347, Data & Text Mining 5 Components of the SAS System Reporting And Graphics Data Access And Management User Interface Analytical Base SAS Application Development Visualization And Discovery Business Solutions Web Enablement
ISQS 6347, Data & Text Mining 6 SAS Programming Essentials Find more information from
ISQS 6347, Data & Text Mining 7 Data-driven Tasks The functionality of the SAS System is built around four data-driven tasks common to virtually any applications Data access Data management Data analysis Data presentation
ISQS 6347, Data & Text Mining 8 Turning Data into Information Process of delivery meaningful information 80% data-related Access Scrub Transform Mange Store and retrieve 20% analysis
ISQS 6347, Data & Text Mining 9 DATA Step SAS Data Sets Data PROC Steps Information Turning Data into Information
ISQS 6347, Data & Text Mining 10 PC Workstation / Servers/ Midrange Mainframe Super Computer 90% independent 10% dependent MultiVendor Architecture Design of the SAS System...
ISQS 6347, Data & Text Mining 11 MultiEngine Architecture Design of the SAS System DATA Teradata SYBASE Microsoft ExcelORACLE dBase SAP DB2
ISQS 6347, Data & Text Mining 12 SAS Programming – Level I Fundamentals (ch1-3) Producing list reports (ch4) Enhancing output (ch5) Creating data sets (ch6) Data step programming (ch7) Reading data Creating variables Conditional processing Keeping and dropping variables Reading Excel files Combining SAS data sets (ch8) Producing summary reports (ch9) SAS graphing (ch10)
ISQS 6347, Data & Text Mining 13 In this course, you work with business data from International Airlines (IA). The various kinds of data that IA maintains are listed below: flight data passenger data cargo data employee data revenue data Course Scenario
ISQS 6347, Data & Text Mining 14 The following are some tasks that you will perform: importing data creating a list of employees producing a frequency table of job codes summarizing data creating a report of salary information Course Scenario
ISQS 6347, Data & Text Mining 15 DATA steps are typically used to create SAS data sets. PROC steps are typically used to process SAS data sets (that is, generate reports and graphs, edit data, and sort data). A SAS program is a sequence of steps that the user submits for execution. Raw Data DATA Step Report SAS Data Set PROC Step SAS Programs
ISQS 6347, Data & Text Mining 16 data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff; run; proc means data=work.staff; class JobTitle; var Salary; run; DATA Step PROC Steps SAS Programs
ISQS 6347, Data & Text Mining 17 SAS steps begin with either of the following: DATA statement PROC statement SAS detects the end of a step when it encounters one of the following: a RUN statement (for most steps) a QUIT statement (for some procedures) the beginning of another step (DATA statement or PROC statement) Step Boundaries
ISQS 6347, Data & Text Mining 18 data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff; proc means data=work.staff; class JobTitle; var Salary; run; Step Boundaries
ISQS 6347, Data & Text Mining 19 You can invoke SAS in the following ways: interactive windowing mode (SAS windowing environment) interactive menu-driven mode (SAS Enterprise Guide, SAS/ASSIST, SAS/AF, or SAS/EIS software) batch mode noninteractive mode Running a SAS Program
ISQS 6347, Data & Text Mining 20 Preparation of SAS Programming Data sets: \SAS-Programming Create a user defined library reference Statement LIBNAME libref ‘SAS-data-library’ ; Example LIBNAME ia ‘c:\workshop\winsas\prog1’; Two-levels of SAS files names Libref.fielname
ISQS 6347, Data & Text Mining 21 SAS Programming Essentials Demon: c02s2d1 Exercise: c02ex1
ISQS 6347, Data & Text Mining 22 General form of the CONTENTS procedure: Example: PROC CONTENTS DATA=SAS-data-set; RUN; proc contents data=work.staff; run; Browsing the Descriptor Portion c02s3d1
ISQS 6347, Data & Text Mining 23 Numeric values Variable names Variable values LastName FirstName JobTitle Salary TORRES JAN Pilot LANGKAMM SARAH Mechanic SMITH MICHAEL Mechanic WAGSCHAL NADJA Pilot TOERMOEN JOCHEN Pilot The data portion of a SAS data set is a rectangular table of character and/or numeric data values. Variable names are part of the descriptor portion, not the data portion. Character values SAS Data Sets: Data Portion
ISQS 6347, Data & Text Mining 24 SAS Variable Values There are two types of variables: charactercontain any value: letters, numbers, special characters, and blanks. Character values are stored with a length of 1 to 32,767 bytes. One byte equals one character. numericstored as floating point numbers in 8 bytes of storage by default. Eight bytes of floating point storage provide space for 16 or 17 significant digits. You are not restricted to 8 digits.
ISQS 6347, Data & Text Mining 25 SAS names have these characteristics: can be 32 characters long. can be uppercase, lowercase, or mixed-case. are not case sensitive. must start with a letter or underscore. Subsequent characters can be letters, underscores, or numerals. SAS Data Set and Variable Names
ISQS 6347, Data & Text Mining 26 data5mon Select the valid default SAS names. Valid SAS Names...
ISQS 6347, Data & Text Mining 27 Select the valid default SAS names. Valid SAS Names... data5mon
ISQS 6347, Data & Text Mining 28 data5mon Select the valid default SAS names. data5mon 5monthsdata Valid SAS Names...
ISQS 6347, Data & Text Mining 29 data5mon Select the valid default SAS names. data5mon 5monthsdata Valid SAS Names...
ISQS 6347, Data & Text Mining 30 data5mon Select the valid default SAS names. data5mon 5monthsdata Valid SAS Names... data#5
ISQS 6347, Data & Text Mining 31 data5mon Select the valid default SAS names. data5mon 5monthsdata Valid SAS Names... data#5
ISQS 6347, Data & Text Mining 32 data5mon Select the valid default SAS names. data5mon 5monthsdata Valid SAS Names... data#5 five months data
ISQS 6347, Data & Text Mining 33 data5mon Select the valid default SAS names. data5mon 5monthsdata Valid SAS Names... data#5 five months data
ISQS 6347, Data & Text Mining 34 data5mon Select the valid default SAS names. data5mon 5monthsdata five months data data#5 Valid SAS Names... fivemonthsdata
ISQS 6347, Data & Text Mining 35 data5mon Select the valid default SAS names. data5mon 5monthsdata five months data data#5 Valid SAS Names... fivemonthsdata
ISQS 6347, Data & Text Mining 36 data5mon Select the valid default SAS names. data5mon 5monthsdata five months data data#5 Valid SAS Names... fivemonthsdata FiveMonthsData
ISQS 6347, Data & Text Mining 37 data5mon Select the valid default SAS names. data5mon 5monthsdata five months data data#5 Valid SAS Names... fivemonthsdata FiveMonthsData
ISQS 6347, Data & Text Mining 38 data5mon Select the valid default SAS names. data5mon 5monthsdata five months data data#5 Valid SAS Names... fivemonthsdata FiveMonthsData
ISQS 6347, Data & Text Mining 39 LastName FirstName JobTitle Salary TORRES JAN Pilot LANGKAMM SARAH Mechanic SMITH MICHAEL Mechanic. WAGSCHAL NADJA Pilot TOERMOEN JOCHEN A value must exist for every variable for each observation. Missing values are valid values. A numeric missing value is displayed as a period. A character missing value is displayed as a blank. Missing Data Values
ISQS 6347, Data & Text Mining 40 The PRINT procedure displays the data portion of a SAS data set. By default, PROC PRINT displays the following: all observations all variables an Obs column on the left side Browsing the Data Portion
ISQS 6347, Data & Text Mining 41 General form of the PRINT procedure: Example: PROC PRINT DATA=SAS-data-set; RUN; proc print data=work.staff; run; Browsing the Data Portion c02s3d1
ISQS 6347, Data & Text Mining 42 SAS documentation and text in the SAS windowing environment use the following terms interchangeably: SAS Data Set SAS Table Variable Column Observation Row SAS Data Set Terminology
ISQS 6347, Data & Text Mining 43 SAS statements have these characteristics: usually begin with an identifying keyword always end with a semicolon data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff; run; proc means data=work.staff; class JobTitle; var Salary; run; SAS Syntax Rules
ISQS 6347, Data & Text Mining 44 SAS statements are free-format. One or more blanks or special characters can be used to separate words. They can begin and end in any column. A single statement can span multiple lines. Several statements can be on the same line. Unconventional Spacing data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc means data=work.staff; class JobTitle; var Salary;run; SAS Syntax Rules...
ISQS 6347, Data & Text Mining 45 SAS statements are free-format. One or more blanks or special characters can be used to separate words. They can begin and end in any column. A single statement can span multiple lines. Several statements can be on the same line. Unconventional Spacing data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc means data=work.staff; class JobTitle; var Salary;run; SAS Syntax Rules...
ISQS 6347, Data & Text Mining 46 data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc means data=work.staff; class JobTitle; var Salary;run; SAS statements are free-format. One or more blanks or special characters can be used to separate words. They can begin and end in any column. A single statement can span multiple lines. Several statements can be on the same line. Unconventional Spacing SAS Syntax Rules...
ISQS 6347, Data & Text Mining 47 SAS statements are free-format. One or more blanks or special characters can be used to separate words. They can begin and end in any column. A single statement can span multiple lines. Several statements can be on the same line. Unconventional Spacing data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc means data=work.staff; class JobTitle; var Salary;run; SAS Syntax Rules...
ISQS 6347, Data & Text Mining 48 data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc means data=work.staff; class JobTitle; var Salary;run;... SAS statements are free-format. One or more blanks or special characters can be used to separate words. They can begin and end in any column. A single statement can span multiple lines. Several statements can be on the same line. Unconventional Spacing SAS Syntax Rules...
ISQS 6347, Data & Text Mining 49 data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc means data=work.staff; class JobTitle; var Salary;run;... SAS statements are free-format. One or more blanks or special characters can be used to separate words. They can begin and end in any column. A single statement can span multiple lines. Several statements can be on the same line. Unconventional Spacing SAS Syntax Rules
ISQS 6347, Data & Text Mining 50 Good spacing makes the program easier to read. Conventional Spacing data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff; run; proc means data=work.staff; class JobTitle; var Salary; run; SAS Syntax Rules
ISQS 6347, Data & Text Mining 51 Type /* to begin a comment. Type your comment text. Type */ to end the comment. /* Create work.staff data set */ data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; /* Produce listing report of work.staff */ proc print data=work.staff; run; SAS Comments c02s3d2
ISQS 6347, Data & Text Mining 52 daat work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff run; proc means data=work.staff average max; class JobTitle; var Salary; run; Syntax errors include the following: misspelled keywords missing or invalid punctuation invalid options Syntax Errors
ISQS 6347, Data & Text Mining 53 This demonstration illustrates how to submit a SAS program that contains errors, diagnose the errors, correct the errors, and save the corrected program. Debugging a SAS Program c02s4d1.sas userid.prog1.sascode(c02s4d1) c02s4d2.sas userid.prog1.sascode(c02s4d2)
ISQS 6347, Data & Text Mining 54 daat work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff run; proc means data=work.staff average max; class JobTitle; var Salary; run; data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff; run; proc means data=work.staff mean max; class Jobtitle; var Salary; run; Program statements accumulate in a recall buffer each time you issue a SUBMIT command. Submit Number 1 Submit Number 2 Recall a Submitted Program
ISQS 6347, Data & Text Mining 55 Submit Number 1 Submit Number 2 Issue RECALL once. Submit Number 2 statements are recalled. Issue the RECALL command once to recall the most recently submitted program. data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff; run; proc means data=work.staff mean max; class JobTitle; var Salary; run; Recall a Submitted Program
ISQS 6347, Data & Text Mining 56 daat work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff run; proc means data=work.staff average max; class JobTitle; var Salary; run; data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff; run; proc means data=work.staff mean max; class JobTitle; var Salary; run; Issue the RECALL command again to recall Submit Number 1 statements. Recall a Submitted Program Submit Number 1 Submit Number 2 Issue RECALL again.
ISQS 6347, Data & Text Mining 57 Exercise 8: Basic SAS Programming Define library IA and Out Go through all SAS programs in Chapter 2-5. Write a SAS program to read a dataset created by yourself or simply use Person0.txt in \\TechShare\coba\d\ISQS3358\OtherDatasets\. \\TechShare\coba\d\ISQS3358\OtherDatasets\ The dataset is output to your library Out. Try to apply whatever SAS features in Chapter 5 of Prog-I to general a nice looking report. Go through all exercises for Ch 2, 3, 4, 5, 6 (answer keys are available, so no need to submit the results)
Hands-on exercise Write a SAS program to calculate the number of dates passed in 2012 to 3/3/2012. The input is in the format: date9. 01JAN MAR2012 Answer: 62 days ISQS 6347, Data & Text Mining 58
ISQS 6347, Data & Text Mining 59 Making Use of SAS Enterprise Guide Code Import a text file Example: Orders.txt Import an Excel file Example: SupplyInfo.xls
ISQS 6347, Data & Text Mining 60 Learn from Examples SAS Help Contents -> Learning to use SAS -> Sample SAS Programs -> Base SAS “Base Usage Guide Examples” Chapter 3, 4
ISQS 6347, Data & Text Mining 61
ISQS 6347, Data & Text Mining 62 Import an Excel Sheet proc import out=work.commrex datafile ="C:\Lin\Shared\ISQS6339\Commrex_3358.xls" dbms=excel replace; sheet="Company"; getnames=yes; mixed=no; scantext=yes; usedate=yes; scantime=yes; run; proc print data=work.commrex; run;
ISQS 6347, Data & Text Mining 63 Excel SAS/ACCESS LIBNAME Engine libname xlsdata 'C:\Lin\Shared\ISQS6339\Commrex_3358.xls'; proc print data=xlsdata.New1; run;
ISQS 6347, Data & Text Mining 64 EG EX5: SAS Data Step Programming m m