1 Chapter 1: Introduction 1.1: Course Logistics 1.2: Measuring Efficiencies 1.3: SAS DATA Step Processing.

Slides:



Advertisements
Similar presentations
Chapter 9. Performance Management Enterprise wide endeavor Research and ascertain all performance problems – not just DBMS Five factors influence DB performance.
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Part IV: Memory Management
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
Managing Data Resources
Chapter 1: Getting Started
COSC 120 Computer Programming
File Management Systems
Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
Physical Database Monitoring and Tuning the Operational System.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Chapter 12 File Management Systems
File Management Chapter 12.
SAS: Managing Memory and Optimizing System Performance Jacek Czajkowski 09/29/2008.
Creating SAS® Data Sets
1 Chapter 1: Getting Started 1.1 Introducing SAS Enterprise Guide 1.2 Course Scenarios.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Introducing Enterprise Technologies David Dischiave Syracuse University School of Information Studies “The original iSchool” June 3, 2013 Information School,
Chapter 2: Working with Data in a Project
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
1 Chapter 4: Creating Simple Queries 4.1 Introduction to Querying Data 4.2 Filtering and Sorting Data 4.3 Creating New Columns with an Expression 4.4 Grouping.
Operating Systems  By the end of this session, you will know: What an Operating System is. The functions it performs.
Chapter pages1 File Management Chapter 12.
The McGraw-Hill Companies, Inc Information Technology & Management Thompson Cats-Baril Chapter 3 Content Management.
1 Chapter 1: Introduction 1.1 Course Logistics 1.2 Purpose of the Macro Facility 1.3 Program Flow.
Introduction to the Orion Star Data
1 Chapter 4: Introduction to Lookup Techniques 4.1 Introduction to Lookup Techniques 4.2 In-Memory Lookup Techniques 4.3 Disk Storage Techniques.
Segmentation & O/S Input/Output Chapter 4 & 5 Tuesday, April 3, 2007.
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Copyright © 2010, Scryer Analytics, LLC. All rights reserved. Optimizing SAS System Performance − A Platform Perspective Patrick McDonald Scryer Analytics,
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
INTRODUCTION SOFTWARE HARDWARE DIFFERENCE BETWEEN THE S/W AND H/W.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Introduction to Using the Data Step Hash Object with Large Data Sets Richard Allen Peak Stat.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Methodology – Physical Database Design for Relational Databases.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Operating System Principles And Multitasking
Principles of Information Systems, Sixth Edition Software: Systems and Application Software Chapter 4.
Application Software System Software.
Chapter 19: Introduction to Efficient SAS Programming 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
IT System Administration Lesson 3 Dr Jeffrey A Robinson.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
Week1: Introduction to Computer Networks. Copyright © 2012 Cengage Learning. All rights reserved.2 Objectives 2 Describe basic computer components and.
Oracle Architecture - Structure. Oracle Architecture - Structure The Oracle Server architecture 1. Structures are well-defined objects that store the.
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
I am Xinyuan Niu I am here because I love to give presentations. Data Warehousing.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
1 Module 3: Processes Reading: Chapter Next Module: –Inter-process Communication –Process Scheduling –Reading: Chapter 4.5, 6.1 – 6.3.
Chapter 2 Memory and process management
Chapter 3: Process Concept
Chapter 2: System Structures
SQL Server Monitoring Overview
Chapter 19: Introduction to Efficient SAS Programming
Chapter 12: Query Processing
Database Performance Tuning and Query Optimization
Chapter 5: Using DATA Step Arrays
Introduction of Week 3 Assignment Discussion
Chapter 15 QUERY EXECUTION.
Introduction to Essbase
Program Testing and Performance
Chapter 2: Operating-System Structures
Chapter 11 Database Performance Tuning and Query Optimization
Presentation transcript:

1 Chapter 1: Introduction 1.1: Course Logistics 1.2: Measuring Efficiencies 1.3: SAS DATA Step Processing

2 Chapter 1: Introduction 1.1: Course Logistics 1.2: Measuring Efficiencies 1.3: SAS DATA Step Processing

3 Objectives List the tasks in the SAS Programming 3 course. Explain the naming convention that is used for the course files. Compare the three levels of exercises that are used in the course. Describe, at a high level, how data is used and stored at Orion Star Sports & Outdoors. Navigate to the Help facility.

4 Tasks in the SAS Programming 3 Course The course topics include techniques for the following data management tasks: compressing SAS data sets creating indexes for a quick retrieval of subsets performing table lookups using arrays, hash objects, or formats combining data by merging, using the SQL procedure, or using multiple SET statements combining summary and detail data sorting and grouping data developing a program quickly

5 Resource Utilization As programmers, you want to perform these tasks as efficiently as possible and optimize the use of the following resources: programmer time I/O CPU memory data storage space network bandwidth

6 Business Scenarios The business scenarios are opportunities to compare multiple techniques for performing the tasks. For example: Task: Table Lookups Possible Techniques: –DATA step MERGE statement –PROC SQL joins –Formats in PUT functions or in FORMAT statements –DATA step arrays –DATA step hash objects

7

Multiple Answer Poll What type(s) of SAS programs do you write? a.Data manipulation with the DATA step b.Data analysis with procedures c.Report writing d.A combination of the above e.SAS training only; no programs written f.Other

9 Filename Conventions p304a01 p304a02 p304a02s p304d01 p304d02 p304e01 p304e02 p304s01 p304s02 p304d01x course IDchapter #item #typeplaceholder Example: The SAS Programming 3 course ID is p3, so p304d01 = SAS Programming 3, Chapter 4, Demo 1. CodeType aActivity dDemo eExercise sSolution

10 Three Levels of Exercises Level 1The exercise mimics an example presented in the section. Level 2Less information and guidance are provided in the exercise instructions. Level 3Only the task you are to perform or the results to be obtained are provided. Typically, you will need to use the Help facility.  You are not expected to complete all of the exercises in the time allotted. Choose the exercise or exercises that are at the level with which you are most comfortable.

11 Orion Star Sports & Outdoors Orion Star Sports & Outdoors is a fictitious global sports and outdoors retailer with traditional stores, an online store, and a large catalog business. The corporate headquarters is located in the United States with offices and stores in many countries throughout the world. Orion Star has about 1,000 employees and 90,000 customers, processes approximately 150,000 orders annually, and purchases products from 64 suppliers.

12 Orion Star Data As is the case with most organizations, Orion Star has a large amount of data about its customers, suppliers, products, and employees. Much of this information is stored in transactional systems in various formats. Using applications and processes such as SAS Data Integration Studio, this transactional information was extracted, transformed, and loaded into a data warehouse. Data marts were created to meet the needs of specific departments such as Marketing.

13 The SAS Help Facility

14

Quiz Start your SAS session. Open the Help facility. Determine the path to use to obtain information about the SAS component objects.

Quiz – Correct Answer Information relevant to this course can be found by following these paths in the SAS Help facility: Contents tab  SAS Products  Base SAS  SAS 9.2 Language Reference Dictionary  Dictionary of Component Object Language Elements Determine the path to use to obtain information about the SAS component objects.

17 SAS OnlineDoc You can also obtain information from SAS OnlineDoc. Information relevant to this course can be found by following these paths in SAS OnlineDoc: Contents tab  Products Documentation A-Z  Base SAS  SAS 9.2 Language Reference Dictionary  Dictionary of Component Object Language Elements

18

19 Chapter 1: Introduction 1.1: Course Logistics 1.2: Measuring Efficiencies 1.3: SAS DATA Step Processing

20 Objectives Identify the resources used by a SAS program. Report computer resource usage using SAS system options. Interpret resource usage statistics in your operating environment. Benchmark resource usage.

21 Running a SAS Program What resources are required to run a SAS program? The programmer must perform the following tasks: determine program specifications write the program test the program execute the program maintain the program

22 Running a SAS Program The computer must perform the following actions: load the required SAS software into memory compile the program read the data execute the compiled program store output data files store output reports

23 What Resources Are Used? programmer time network bandwidth CPU I/O memory data storage space resources used

24

Multiple Answer Poll Which of the following resources do you need to conserve? a.CPU b.I/O c.Memory d.Data storage space e.Network bandwidth f.Your time

26 Understanding Efficiency Trade-offs When you decrease the use of one resource, the use of other resources might increase. Resource usage is dependent on your data. A specific technique might be more efficient with one data set and less efficient with another.

27 Understanding Efficiency Trade-offs Decreasing the size of a SAS data set can result in an increase in CPU usage. Data Space CPU Often Implies...

28 Understanding Efficiency Trade-offs Decreasing the number of I/O operations comes at the expense of increased memory usage. I/O Memory Often Implies

29 Deciding What Is Important for Efficiency Your Site Your Programs Your Data

30 Understanding Efficiency at Your Site SAS Environment Hardware Operating Environment System Load

31

Multiple Choice Poll This class uses SAS 9.2. What is the latest version of SAS that are you running? a.SAS 8.2 b.SAS 9.1 c.SAS 9.2 d.Other

33 Knowing How Your Program Will Be Used The importance of efficiency increases with the following: the complexity of the program and/or the size of the files being processed the number of times that the program will be executed

34 Knowing Your Data

35

Multiple Answer Poll What type(s) of data do you use? a.SAS data sets b.External files c.Data from a relational database – for example, Oracle, Teradata, or SQL Server d.Excel spreadsheets e.OLAP cubes f.Information maps g.Other

37 Considering Trade-Offs In this class, many tasks are performed using one or more techniques. To decide which technique is most efficient for a given task, benchmark, or measure and compare, the resource usage of each technique. You should benchmark with the actual data to determine which technique is the most efficient. The effectiveness of any efficiency technique depends greatly on the data with which you use the technique.

38 Running Benchmarks: Guidelines To benchmark your programming techniques, do the following: Turn on the appropriate options to report resource usage. Test each technique in a separate SAS session. Test only one technique or change at a time, with as little additional code as possible. Run your tests under the conditions that your final program will use (for example, batch execution, large data sets, and so on). continued...

39 Running Benchmarks: Guidelines Run each program several times and base your conclusions on averages, not on a single execution. (This is more critical when you benchmark elapsed time.) Exclude outliers from the analysis because that data might lead you to tune your program to run less efficiently than it should. Turn off the options that report resource usage after testing is finished, because they consume resources. In a multi-user environment, other computer activities might affect the running of your program.

40

Multiple Choice Poll Which of the following SAS programs should be benchmarked? a.A report that shows all the customers in the United Kingdom in March 2006 b.A report that calculates trends in sales at the end of every day for every department c.A report showing the projected total cost of a 5% cost-of-living increase in employee salaries for a Human Resources project conducted on January 1, 2007 d.A yearly report that calculates the average sales of a line of apparel for the clothing manager

Multiple Choice Poll – Correct Answer Which of the following SAS programs should be benchmarked? a.A report that shows all the customers in the United Kingdom in March 2006 b.A report that calculates trends in sales at the end of every day for every department c.A report showing the projected total cost of a 5% cost-of-living increase in employee salaries for a Human Resources project conducted on January 1, 2007 d.A yearly report that calculates the average sales of a line of apparel for the clothing manager

43 Tracking Resource Usage MEMRPT (z/OS only) STATS (z/OS only) SAS Options STIMER FULLSTIMER

44 Tracking Resources with SAS Options Windows, UNIX z/OS »Invocation option only OPTIONS NOFULLSTIMER | FULLSTIMER; OPTIONS STIMER | NOSTIMER; OPTIONS STATS | NOSTATS; OPTIONS MEMRPT | NOMEMRPT; STIMER | NOSTIMER OPTIONS NOFULLSTIMER | FULLSTIMER;

45 Business Scenario You should benchmark to determine the most efficient technique for creating a new variable based on a condition. The following methods can be used: IF-THEN with an assignment statement IF-THEN/ELSE with an assignment statement SELECT/WHEN with an assignment statement

46

Quiz 1.Open and submit p301a01a. Record the user CPU: ____________ Exit SAS. 2.Start SAS. Open and submit p301a01b. Record the user CPU: ____________ Exit SAS. 3.Start SAS. Open and submit p301a01c. Record the user CPU: ____________ 4.Which technique is most efficient? In z/OS, record the CPU.

48 Sample Windows Log Partial SAS Log p301a01a 5 options fullstimer; 6 data _null_; 7 length var $ 30; 8 retain var2-var50 0 var51-var100 'ABC'; 9 do x=1 to ; 10 var1= *ranuni(x); 11 if var1> then var='Greater than 1,000,000'; 12 if <=var1<= then var='Between 500,000 and 1,000,000'; 13 if <=var1< then var='Between 100,000 and 500,000'; 14 if 10000<=var1< then var='Between 10,000 and 100,000'; 15 if 1000<=var1<10000 then var='Between 1,000 and 10,000'; 16 if var1<1000 then var='Less than 1,000'; 17 end; 18 run; NOTE: DATA statement used (Total process time): real time 1.26 seconds user cpu time 0.98 seconds system cpu time 0.04 seconds Memory 278k OS Memory 4976k Timestamp 6/29/ :39:21 PM

49 Sample UNIX Log Partial SAS Log 1 options fullstimer; 2 data _null_; 3 length var $30; 4 retain var2-var50 0 var51-var100 'ABC'; 5 do x=1 to ; 6 var1= *ranuni(x); 7 if var1> then var='Greater than 1,000,000'; 8 if <=var1<= then var='Between 500,000 and 1,000,000'; 9 if <=var1< then var='Between 100,000 and 500,000'; 10 if 10000<=var1< then var='Between 10,000 and 100,000'; 11 if 1000<=var1<10000 then var='Between 1,000 and 10,000'; 12 if var1<1000 then var='Less than 1,000'; 13 end; 14 run; NOTE: DATA statement used (Total process time): real time 6.62 seconds user cpu time 5.14 seconds system cpu time 0.01 seconds Memory 526k OS Memory 5680k Timestamp 6/29/ :55:32 AM Page Faults 82 Page Reclaims 0 Page Swaps 0 Voluntary Context Switches 91 Involuntary Context Switches 48 Block Input Operations 91 Block Output Operations 0 p301a01a

50 Sample z/OS Log Partial SAS Log p301a01a

51

52 Chapter 1: Introduction 1.1: Course Logistics 1.2: Measuring Efficiencies 1.3: SAS DATA Step Processing

53 Objectives List the attributes of a data set page and define how it relates to the structure of SAS data sets. Describe how SAS reads and writes data.

54 SAS Data Set Pages A SAS data set page has the following attributes: It is the unit of data transfer between the operating system buffers and SAS buffers in memory. It includes the number of bytes used by the descriptor portion, the data values, and any operating system overhead. It is fixed in size when the data set is created, either to a default value or to a value specified by the programmer.

55 Using PROC CONTENTS to Report Page Size Partial PROC CONTENTS Output Engine/Host Dependent Information Data Set Page Size Number of Data Set Pages 18 First Data Page 1 Max Obs per Page 92 Obs in First Data Page 72 Number of Data Set Repairs 0 File Name S:\workshop\sales_history.sas7bdat Release Created M0 Host Created XP_PRO 16,384*18= 294,912 bytes proc contents data=orion.sales_history; run;

56

Quiz Use one of the following to determine the page size of the orion.customer_dim SAS data set: the CONTENTS procedure the DATASETS procedure the SAS Explorer window What is the page size of the SAS data set orion.customer_dim? p301a02

Quiz – Correct Answer Use one of the following to determine the page size of the orion.customer_dim SAS data set: the CONTENTS procedure the DATASETS procedure the SAS Explorer window What is the page size of the SAS data set orion.customer_dim? 16,384 bytes in Windows 24,576 bytes in UNIX 18,432 bytes in z/OS p301a02

59 Reading External Files Input Raw Data memory...

60 Reading External Files Input Raw Data I/O measured here Buffers memory Caches... Data might be cached in storage devices. On UNIX and Windows, data can also be cached by the OS file system.

61 Reading External Files Input Buffer Input Raw Data I/O measured here Buffers memory Caches...

62 Reading External Files Input Buffer PDV Input Raw Data I/O measured here Buffers memory Data is converted from external format to SAS format. Caches...

63 Reading External Files PDV Input Buffer Input Raw Data I/O measured here Buffers memory Data is converted from external format to SAS format. Caches... Buffers

64 Reading External Files PDV Input Buffer I/O measured here Output SAS Data Input Raw Data I/O measured here Buffers memory Data is converted from external format to SAS format. Caches

65 Reading a SAS Data Set with a SET Statement Input SAS Data memory...

66 Reading a SAS Data Set with a SET Statement Input SAS Data I/O measured here Buffers memory Data might be cached in storage devices. On UNIX and Windows, data can also be cached by the OS file system.... Caches

67 Reading a SAS Data Set with a SET Statement Input SAS Data... I/O measured here Caches memory

68 Reading a SAS Data Set with a SET Statement Input SAS Data PDV... I/O measured here No data conversion is necessary. Caches memory

69 Reading a SAS Data Set with a SET Statement Input SAS Data PDV... memory No data conversion is necessary. I/O measured here Caches

70 Reading a SAS Data Set with a SET Statement Input SAS Data PDV... No data conversion is necessary. I/O measured here Caches memory

71 Reading a SAS Data Set with a SET Statement Output SAS Data Input SAS Data PDV... No data conversion is necessary. I/O measured here Caches memory Caches

72 Reading a SAS Data Set with a SET Statement Input SAS Data memory PDV I/O measured here Sequential processing continues until the pointer reaches the end of the file. Output SAS Data I/O measured here

73

74 Exercise These exercises reinforce the concepts discussed previously.

75 Chapter Review 1.What are the six resources consumed by SAS programs? 2.What is the correct way to benchmark SAS programs? 3.What is a SAS data set page size?

76 Chapter Review Answers 1.What are the six resources consumed by SAS programs? programmer time network bandwidth CPU Memory I/O disk storage space continued...

77 Chapter Review Answers 2.What is the correct way to benchmark SAS programs? a.Turn on the system options to report resource usage. b.Test each technique in a separate SAS session. c.Test only one technique or change at a time. d.Run the test under final conditions. e.Run each program three to five times and average the results. f.Exclude outliers. g.Turn off the resource usage reporting options. continued...

78 Chapter Review Answers 3.What is a SAS data set page size? The size of the SAS data set page is the unit of data transfer between the system buffers and the SAS buffers in memory. The default transfer is one data set page at a time. The page size determines the amount of memory that is used when data is read and written. The number of pages effects the I/O.