ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University.

Slides:



Advertisements
Similar presentations
Preparing Data for Quantitative Analysis
Advertisements

Chapter 3: Editing and Debugging SAS Programs. Some useful tips of using Program Editor Add line number: In the Command Box, type num, enter. Save SAS.
Introduction to SAS Programming Christina L. Ughrin Statistical Software Consulting Some notes pulled from SAS Programming I: Essentials Training.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Statistics in Science  Introducing SAS ® software Acknowlegements to David Williams Caroline Brophy.
 2005 Pearson Education, Inc. All rights reserved Introduction.
The Information Delivery Process Data In Information Out ManageOrganizeExploit.
Chapter 3: System design. System design Creating system components Three primary components – designing data structure and content – create software –
Categorical Data Analysis using SAS. 2 List the components of a SAS program. Open an existing SAS program and run it. Discuss the Chi Square Test of Independence.
A Guide to SQL, Seventh Edition. Objectives Understand the concepts and terminology associated with relational databases Create and run SQL commands in.
SAS ® Regression Essentials. 2 List the components of a SAS program. Open an existing SAS program and run it. Objectives.
NonParametric Statistics using SAS. 2 List the components of a SAS program. Open an existing SAS program and run it. Objectives.
SAS Programming SAS Data Mart. Outline Access different format of data for SAS SAS data mart SAS data manipulation 2.
State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
Into to SAS ®. 2 List the components of a SAS program. Open an existing SAS program and run it. Objectives.
Creating SAS® Data Sets
SAS ® ANOVA Essentials. 2 List the components of a SAS program. Open an existing SAS program and run it. Objectives.
MS Access: Database Concepts Instructor: Vicki Weidler.
11 Chapter 2: Working with Data in a Project 2.1 Introduction to Tabular Data 2.2 Accessing Local Data 2.3 Importing Text Files 2.4 Editing Tables in the.
Copyright © 2006, SAS Institute Inc. All rights reserved. Enterprise Guide 4.2 : A Primer SHRUG : Spring 2010 Presented by: Josée Ranger-Lacroix SAS Institute.
Chapter 2: Working with Data in a Project
A Guide to SQL, Eighth Edition Chapter Three Creating Tables.
1 Chapter 5: Creating Summarized Output 5.1 Generating Summary Statistics 5.2 Creating a Summary Report with the Summary Tables Task 5.3 Creating and Applying.
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 13 Database Management Systems: Getting Data Together.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Copyright 2007, Paradigm Publishing Inc. EXCEL 2007 Chapter 7 BACKNEXTEND 7-1 LINKS TO OBJECTIVES Record & run a macro Record & run a macro Save as a macro-
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
Chapter 6 SAS ® OLAP Cube Studio. Section 6.1 SAS OLAP Cube Studio Architecture.
Math 3400 Computer Applications of Statistics Lecture 1 Introduction and SAS Overview.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction to Java Applications Part II. In this chapter you will learn:  Different data types( Primitive data types).  How to declare variables?
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley STARTING OUT WITH Python Python First Edition by Tony Gaddis Chapter 2 Input,
Chapter 1: Introduction to SAS  SAS programs: A sequence of statements in a particular order  Rules for SAS statements: –Every SAS statement ends in.
1 Chapter 2: Working with Data in a Project 2.1 Introduction to Tabular Data 2.2 Accessing Local Data 2.3 Accessing Remote Data 2.4 Importing Text Files.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
1 EPIB 698E Lecture 1 Notes Instructor: Raul Cruz 7/9/13.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
Internet & World Wide Web How to Program, 5/e © by Pearson Education, Inc. All Rights Reserved.
13-1 Sequential File Processing Chapter Chapter Contents Overview of Sequential File Processing Sequential File Updating - Creating a New Master.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Aliya Farheen October 29,2015.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System. Useful for the following types of task: 1. Data entry,
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
SAS for Data Management and Analysis
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
1 Chapter 3: Getting Started with Tasks 3.1 Introduction to Task Dialogs 3.2 Creating a Listing Report 3.3 Creating a Frequency Report 3.4 Creating a Two-Way.
CHAPTER 2 PROBLEM SOLVING USING C++ 1 C++ Programming PEG200/Saidatul Rahah.
ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University.
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
CHAPTER 3 COMPLETING THE PROBLEM- SOLVING PROCESS AND GETTING STARTED WITH C++ An Introduction to Programming with C++ Fifth Edition.
Slide 1 Chapter 3 Variables  A variable is a name for a value stored in memory.  Variables are used in programs so that values can be represented with.
Copyright 2009 The Little Engine That Could: Using EXCEL LIBNAME Engine Options to Enhance Data Transfers between SAS® and Microsoft® Excel Files William.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
1 Section 1 - Introduction to SQL u SQL is an abbreviation for Structured Query Language. u It is generally pronounced “Sequel” u SQL is a unified language.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Fundamentals & Ethics of Information Systems IS 201
Introduction to SAS®.
Chapter 1: Introduction to SAS
SAS® Regression Essentials
Instructor: Raul Cruz 9/4/13
Microsoft Excel 2007 – Level 2
Presentation transcript:

ISQS 6347, Data & Text Mining 1 ISQS 6339, Data Management & Business Intelligence Data Preparation for Analytics Using SAS Zhangxi Lin Texas Tech University

ISQS 6347, Data & Text Mining 2 Outline An overview of data preparation for analytics SAS Programming Essentials  Running SAS programs  Mastering fundamental concepts  SAS program debugging Make use of SAS Enterprise Guide for programming

ISQS 6347, Data & Text Mining 3 Structure and Components of Business Intelligence

ISQS 6347, Data & Text Mining 4 Overview: From Data Warehousing to Data Analysis Previous major topics in data warehousing (using SQL Server 2008)  Dimensional model design  ETL  Cubes design and OLAP Data analysis topics (using SAS)  Data preparation Analytic business questions Data format and data conversion  Data cleansing  Data exploratory  Data analysis  Data visualization

ISQS 6347, Data & Text Mining 5 US Car Theft The number of U.S. motor vehicle thefts decreased by 1.9 percent from 2003 to 2004, the first decrease since In 2004, the value of stolen motor vehicles was $7.6 billion, down from $8.6 billion in The average value of a motor vehicle reported stolen in 2004 was $6,143, compared with $6,797 in 2003.

ISQS 6347, Data & Text Mining Theft Statistics Every 26 seconds, a motor vehicle is stolen in the United States. The odds of a vehicle being stolen were 1 in 190 in The odds are highest in urban areas. U.S. motor vehicle thefts fell 1.9 percent in 2004 from 2003, according to the FBI's Uniform Crime Reports. In 2004, 1,237,114 motor vehicles were reported stolen. The West was the only region with an increase in motor vehicle thefts from 2003 to 2004, up 3.2 percent. Thefts fell 9.7 percent in the Northeast, 4.4 percent in the Midwest and 2.9 percent in the South. Nationwide, the 2004 motor vehicle theft rate per 100,000 people was 421.3, down 2.9 percent from in Only 13.0 percent of thefts were cleared by arrests in Carjackings occur most frequently in urban areas. They account for only 3.0 percent of all motor vehicle thefts. The average comprehensive insurance premium in the U.S. rose 11.2 percent from 1999 to 2003

ISQS 6347, Data & Text Mining 7 Business Question If the number of used Honda Accord thefts is ranked the top in auto theft, should the premium of insurance for Honda Accord be high enough than other brand of cars? Should the insurance for a user Honda higher than a brand new Honda? Why?

ISQS 6347, Data & Text Mining 8 Analytic Business Questions How do factors such as time, branch, promotion, and price influence the sale of a soft drink? Which customers have a high cancellation risk in the next month? How can customers be segmented based on their purchase behavior? Statistics showed that an online recommendation system may increase the sale 20%, and the accuracy rate of the system is 40%. A newer algorithm can increase the accuracy rate to 50%. Should the sale be promoted to 20%*125% = 25%? The airline companies are considering allowing seats over-booked because certain percentage of customers will cancel their flight at the last minute. If the average cancellation rate is 10%, should the over-booking rate be 10% as well? If a cancellation is charged 5% of the fare and how much should the penalty for sold-out situation with over-booking?

ISQS 6347, Data & Text Mining 9 Analysis Process Selecting an analysis method Identify data source Prepare the data (collecting, cleansing, reorganizing, extracting transforming, loading) Execute the analysis Interpret the analysis Automate data preparation and execution of analysis, if the business question has to be answered more than once  ETL  Stored procedures The above steps can also be iterated, not necessarily performed in sequential order We focus on the data preparation step

ISQS 6347, Data & Text Mining 10 Characteristics of Analytic Business Questions Analysis complexity: real analysis or reporting Analysis paradigm: statistics or data mining Data preparation paradigm: as much as data as possible or business knowledge first Analysis method: supervised or unsupervised analysis Scoring needed – yes/no Periodicity of analysis: one-shot or re-run Historic data needed, yes/no Data structure: one row or multiple rows per subject Complexity of the analysis team

ISQS 6347, Data & Text Mining 11 Components of the SAS System Reporting And Graphics Data Access And Management User Interface Analytical Base SAS Application Development Visualization And Discovery Business Solutions Web Enablement

ISQS 6347, Data & Text Mining 12 SAS Programming Essentials Find more information from 

ISQS 6347, Data & Text Mining 13 Data-driven Tasks The functionality of the SAS System is built around four data-driven tasks common to virtually any applications  Data access  Data management  Data analysis  Data presentation

ISQS 6347, Data & Text Mining 14 Turning Data into Information Process of delivery meaningful information  80% data-related Access Scrub Transform Mange Store and retrieve  20% analysis

ISQS 6347, Data & Text Mining 15 DATA Step SAS Data Sets Data PROC Steps Information Turning Data into Information

ISQS 6347, Data & Text Mining 16 PC Workstation / Servers/ Midrange Mainframe Super Computer 90% independent 10% dependent MultiVendor Architecture Design of the SAS System...

ISQS 6347, Data & Text Mining 17 MultiEngine Architecture Design of the SAS System DATA Teradata SYBASE Microsoft ExcelORACLE dBase SAP DB2

ISQS 6347, Data & Text Mining 18 SAS Programming – Level I Fundamentals (ch1-3) Producing list reports (ch4) Enhancing output (ch5) Creating data sets (ch6) Data step programming (ch7)  Reading data  Creating variables  Conditional processing  Keeping and dropping variables  Reading Excel files Combining SAS data sets (ch8) Producing summary reports (ch9) SAS graphing (ch10)

ISQS 6347, Data & Text Mining 19 In this course, you work with business data from International Airlines (IA). The various kinds of data that IA maintains are listed below:  flight data  passenger data  cargo data  employee data  revenue data Course Scenario

ISQS 6347, Data & Text Mining 20 The following are some tasks that you will perform:  importing data  creating a list of employees  producing a frequency table of job codes  summarizing data  creating a report of salary information Course Scenario

ISQS 6347, Data & Text Mining 21 DATA steps are typically used to create SAS data sets. PROC steps are typically used to process SAS data sets (that is, generate reports and graphs, edit data, and sort data). A SAS program is a sequence of steps that the user submits for execution. Raw Data DATA Step Report SAS Data Set PROC Step SAS Programs

ISQS 6347, Data & Text Mining 22 data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff; run; proc means data=work.staff; class JobTitle; var Salary; run; DATA Step PROC Steps SAS Programs

ISQS 6347, Data & Text Mining 23 SAS steps begin with either of the following: DATA statement PROC statement SAS detects the end of a step when it encounters one of the following: a RUN statement (for most steps) a QUIT statement (for some procedures) the beginning of another step (DATA statement or PROC statement) Step Boundaries

ISQS 6347, Data & Text Mining 24 data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff; proc means data=work.staff; class JobTitle; var Salary; run; Step Boundaries

ISQS 6347, Data & Text Mining 25 You can invoke SAS in the following ways: interactive windowing mode (SAS windowing environment) interactive menu-driven mode (SAS Enterprise Guide, SAS/ASSIST, SAS/AF, or SAS/EIS software) batch mode noninteractive mode Running a SAS Program

ISQS 6347, Data & Text Mining 26 Preparation of SAS Programming Data sets: \SAS-Programming Create a user defined library reference Statement LIBNAME libref ‘SAS-data-library’ ; Example LIBNAME ia ‘c:\workshop\winsas\prog1’; Two-levels of SAS files names Libref.fielname

ISQS 6347, Data & Text Mining 27 SAS Programming Essentials Demon: c02s2d1 Exercise: c02ex1

ISQS 6347, Data & Text Mining 28 General form of the CONTENTS procedure: Example: PROC CONTENTS DATA=SAS-data-set; RUN; proc contents data=work.staff; run; Browsing the Descriptor Portion c02s3d1

ISQS 6347, Data & Text Mining 29 Numeric values Variable names Variable values LastName FirstName JobTitle Salary TORRES JAN Pilot LANGKAMM SARAH Mechanic SMITH MICHAEL Mechanic WAGSCHAL NADJA Pilot TOERMOEN JOCHEN Pilot The data portion of a SAS data set is a rectangular table of character and/or numeric data values. Variable names are part of the descriptor portion, not the data portion. Character values SAS Data Sets: Data Portion

ISQS 6347, Data & Text Mining 30 SAS Variable Values There are two types of variables: charactercontain any value: letters, numbers, special characters, and blanks. Character values are stored with a length of 1 to 32,767 bytes. One byte equals one character. numericstored as floating point numbers in 8 bytes of storage by default. Eight bytes of floating point storage provide space for 16 or 17 significant digits. You are not restricted to 8 digits.

ISQS 6347, Data & Text Mining 31 SAS names have these characteristics:  can be 32 characters long.  can be uppercase, lowercase, or mixed-case.  are not case sensitive.  must start with a letter or underscore. Subsequent characters can be letters, underscores, or numerals. SAS Data Set and Variable Names

ISQS 6347, Data & Text Mining 32  data5mon Select the valid default SAS names. Valid SAS Names...

ISQS 6347, Data & Text Mining 33 Select the valid default SAS names. Valid SAS Names...  data5mon

ISQS 6347, Data & Text Mining 34  data5mon Select the valid default SAS names.  data5mon  5monthsdata Valid SAS Names...

ISQS 6347, Data & Text Mining 35  data5mon Select the valid default SAS names.  data5mon  5monthsdata Valid SAS Names...

ISQS 6347, Data & Text Mining 36  data5mon Select the valid default SAS names.  data5mon  5monthsdata Valid SAS Names...  data#5

ISQS 6347, Data & Text Mining 37  data5mon Select the valid default SAS names.  data5mon  5monthsdata Valid SAS Names...  data#5

ISQS 6347, Data & Text Mining 38  data5mon Select the valid default SAS names.  data5mon  5monthsdata Valid SAS Names...  data#5  five months data

ISQS 6347, Data & Text Mining 39  data5mon Select the valid default SAS names.  data5mon  5monthsdata Valid SAS Names...  data#5  five months data

ISQS 6347, Data & Text Mining 40  data5mon Select the valid default SAS names.  data5mon  5monthsdata  five months data  data#5 Valid SAS Names...  fivemonthsdata

ISQS 6347, Data & Text Mining 41  data5mon Select the valid default SAS names.  data5mon  5monthsdata  five months data  data#5 Valid SAS Names...  fivemonthsdata

ISQS 6347, Data & Text Mining 42  data5mon Select the valid default SAS names.  data5mon  5monthsdata  five months data  data#5 Valid SAS Names...  fivemonthsdata  FiveMonthsData

ISQS 6347, Data & Text Mining 43  data5mon Select the valid default SAS names.  data5mon  5monthsdata  five months data  data#5 Valid SAS Names...  fivemonthsdata  FiveMonthsData

ISQS 6347, Data & Text Mining 44  data5mon Select the valid default SAS names.  data5mon  5monthsdata  five months data  data#5 Valid SAS Names...  fivemonthsdata  FiveMonthsData

ISQS 6347, Data & Text Mining 45 LastName FirstName JobTitle Salary TORRES JAN Pilot LANGKAMM SARAH Mechanic SMITH MICHAEL Mechanic. WAGSCHAL NADJA Pilot TOERMOEN JOCHEN A value must exist for every variable for each observation. Missing values are valid values. A numeric missing value is displayed as a period. A character missing value is displayed as a blank. Missing Data Values

ISQS 6347, Data & Text Mining 46 The PRINT procedure displays the data portion of a SAS data set. By default, PROC PRINT displays the following:  all observations  all variables  an Obs column on the left side Browsing the Data Portion

ISQS 6347, Data & Text Mining 47 General form of the PRINT procedure: Example: PROC PRINT DATA=SAS-data-set; RUN; proc print data=work.staff; run; Browsing the Data Portion c02s3d1

ISQS 6347, Data & Text Mining 48 SAS documentation and text in the SAS windowing environment use the following terms interchangeably: SAS Data Set SAS Table Variable Column Observation Row SAS Data Set Terminology

ISQS 6347, Data & Text Mining 49 SAS statements have these characteristics: usually begin with an identifying keyword always end with a semicolon data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff; run; proc means data=work.staff; class JobTitle; var Salary; run; SAS Syntax Rules

ISQS 6347, Data & Text Mining 50 SAS statements are free-format. One or more blanks or special characters can be used to separate words. They can begin and end in any column. A single statement can span multiple lines. Several statements can be on the same line. Unconventional Spacing data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc means data=work.staff; class JobTitle; var Salary;run; SAS Syntax Rules...

ISQS 6347, Data & Text Mining 51 SAS statements are free-format. One or more blanks or special characters can be used to separate words. They can begin and end in any column. A single statement can span multiple lines. Several statements can be on the same line. Unconventional Spacing data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc means data=work.staff; class JobTitle; var Salary;run; SAS Syntax Rules...

ISQS 6347, Data & Text Mining 52 data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc means data=work.staff; class JobTitle; var Salary;run; SAS statements are free-format. One or more blanks or special characters can be used to separate words. They can begin and end in any column. A single statement can span multiple lines. Several statements can be on the same line. Unconventional Spacing SAS Syntax Rules...

ISQS 6347, Data & Text Mining 53 SAS statements are free-format. One or more blanks or special characters can be used to separate words. They can begin and end in any column. A single statement can span multiple lines. Several statements can be on the same line. Unconventional Spacing data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc means data=work.staff; class JobTitle; var Salary;run; SAS Syntax Rules...

ISQS 6347, Data & Text Mining 54 data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc means data=work.staff; class JobTitle; var Salary;run;... SAS statements are free-format. One or more blanks or special characters can be used to separate words. They can begin and end in any column. A single statement can span multiple lines. Several statements can be on the same line. Unconventional Spacing SAS Syntax Rules...

ISQS 6347, Data & Text Mining 55 data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc means data=work.staff; class JobTitle; var Salary;run;... SAS statements are free-format. One or more blanks or special characters can be used to separate words. They can begin and end in any column. A single statement can span multiple lines. Several statements can be on the same line. Unconventional Spacing SAS Syntax Rules

ISQS 6347, Data & Text Mining 56 Good spacing makes the program easier to read. Conventional Spacing data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff; run; proc means data=work.staff; class JobTitle; var Salary; run; SAS Syntax Rules

ISQS 6347, Data & Text Mining 57 Type /* to begin a comment. Type your comment text. Type */ to end the comment. /* Create work.staff data set */ data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; /* Produce listing report of work.staff */ proc print data=work.staff; run; SAS Comments c02s3d2

ISQS 6347, Data & Text Mining 58 daat work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff run; proc means data=work.staff average max; class JobTitle; var Salary; run; Syntax errors include the following: misspelled keywords missing or invalid punctuation invalid options Syntax Errors

ISQS 6347, Data & Text Mining 59 This demonstration illustrates how to submit a SAS program that contains errors, diagnose the errors, correct the errors, and save the corrected program. Debugging a SAS Program c02s4d1.sas userid.prog1.sascode(c02s4d1) c02s4d2.sas userid.prog1.sascode(c02s4d2)

ISQS 6347, Data & Text Mining 60 daat work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff run; proc means data=work.staff average max; class JobTitle; var Salary; run; data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff; run; proc means data=work.staff mean max; class Jobtitle; var Salary; run; Program statements accumulate in a recall buffer each time you issue a SUBMIT command. Submit Number 1 Submit Number 2 Recall a Submitted Program

ISQS 6347, Data & Text Mining 61 Submit Number 1 Submit Number 2 Issue RECALL once. Submit Number 2 statements are recalled. Issue the RECALL command once to recall the most recently submitted program. data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff; run; proc means data=work.staff mean max; class JobTitle; var Salary; run; Recall a Submitted Program

ISQS 6347, Data & Text Mining 62 daat work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff run; proc means data=work.staff average max; class JobTitle; var Salary; run; data work.staff; infile 'raw-data-file'; input LastName $ 1-20 FirstName $ JobTitle $ Salary 54-59; run; proc print data=work.staff; run; proc means data=work.staff mean max; class JobTitle; var Salary; run; Issue the RECALL command again to recall Submit Number 1 statements. Recall a Submitted Program Submit Number 1 Submit Number 2 Issue RECALL again.

ISQS 6347, Data & Text Mining 63 Exercise 8: Basic SAS Programming Define library IA and Out Go through all SAS programs in Chapter 2-5. Write a SAS program to read a dataset created by yourself or simply use Person0.txt in \\TechShare\coba\d\ISQS3358\OtherDatasets\. \\TechShare\coba\d\ISQS3358\OtherDatasets\ The dataset is output to your library Out. Try to apply whatever SAS features in Chapter 5 of Prog-I to general a nice looking report. Go through all exercises for Ch 2, 3, 4, 5, 6 (answer keys are available, so no need to submit the results)

Hands-on exercise Write a SAS program to calculate the number of dates passed in 2012 to 3/3/2012. The input is in the format: date9. 01JAN MAR2012 Answer: 62 days ISQS 6347, Data & Text Mining 64

ISQS 6347, Data & Text Mining 65 Making Use of SAS Enterprise Guide Code Import a text file  Example: Orders.txt Import an Excel file  Example: SupplyInfo.xls

ISQS 6347, Data & Text Mining 66 Learn from Examples SAS Help  Contents -> Learning to use SAS -> Sample SAS Programs -> Base SAS  “Base Usage Guide Examples” Chapter 3, 4

ISQS 6347, Data & Text Mining 67 Import an Excel Sheet proc import out=work.commrex datafile ="C:\Lin\Shared\ISQS6339\Commrex_3358.xls" dbms=excel replace; sheet="Company"; getnames=yes; mixed=no; scantext=yes; usedate=yes; scantime=yes; run; proc print data=work.commrex; run;

ISQS 6347, Data & Text Mining 68 Excel SAS/ACCESS LIBNAME Engine libname xlsdata 'C:\Lin\Shared\ISQS6339\Commrex_3358.xls'; proc print data=xlsdata.New1; run;

ISQS 6347, Data & Text Mining 69 Exercise 9: SAS Data Step Programming