An Exercise in Improving SAS Performance on Mainframe Processors

Slides:



Advertisements
Similar presentations
Effecting Efficiency Effortlessly Daniel Carden, Quanticate.
Advertisements

Operating Systems Components of OS
Storing Data: Disk Organization and I/O
Disk Storage SystemsCSCE430/830 Disk Storage Systems CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U. Maine) Fall,
Overview of Mass Storage Structure
Disk Storage, Basic File Structures, and Hashing
13.2 Disks Gaurav Sharma Class ID Mechanics of Disks 2 Moving Principal Moving pieces of Disk are: 1. Disk assembly & 2. Head Assembly The.
Databasteknik Databaser och bioinformatik Data structures and Indexing (II) Fang Wei-Kleiner.
- Dr. Kalpakis CMSC Dr. Kalpakis 1 Outline In implementing DBMS we need to answer How should the system store and manage very large amounts of data?
Chapter 4 Memory Management Basic memory management Swapping
Module 10: Virtual Memory
Chapter 10: Virtual Memory
Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Processes Management.
Chapter 10 Operating Systems.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
Disk Access Model. Using Secondary Storage Effectively In most studies of algorithms, one assumes the “RAM model”: –Data is in main memory, –Access to.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Transaction Processing IS698 Min Song. 2 What is a Transaction?  When an event in the real world changes the state of the enterprise, a transaction is.
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
Introduction of z/OS Basics © 2006 IBM Corporation Chapter 5: Working with data sets.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
Operating Systems COMP 4850/CISG 5550 Disks, Part II Dr. James Money.
Disk and I/O Management
AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems.
External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Disks Chapter 5 Thursday, April 5, Today’s Schedule Input/Output – Disks (Chapter 5.4)  Magnetic vs. Optical Disks  RAID levels and functions.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. Chapter 4: Working with data sets.
VIRTUAL MEMORY By Thi Nguyen. Motivation  In early time, the main memory was not large enough to store and execute complex program as higher level languages.
Free Space Management.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
1 External Sorting. 2 Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing gpa order.
Lecture 3 Page 1 CS 111 Online Disk Drives An especially important and complex form of I/O device Still the primary method of providing stable storage.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
CS333 Intro to Operating Systems Jonathan Walpole.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 15 A External Methods. © 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the.
Jeffrey Ellak CS 147. Topics What is memory hierarchy? What are the different types of memory? What is in charge of accessing memory?
1 Query Processing Exercise Session 1. 2 The system (OS or DBMS) manages the buffer Disk B1B2B3 Bn … … Program’s private memory An application program.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
System Components Operating System Services System Calls.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 10: Mass-Storage Systems.
Chapter 10: Mass-Storage Systems
Jonathan Walpole Computer Science Portland State University
Chapter 2 Memory and process management
Copyright ©: Nahrstedt, Angrave, Abdelzaher
I/O Resource Management: Software
Operating System I/O System Monday, August 11, 2008.
9/12/2018.
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Overview Continuation from Monday (File system implementation)
Chapter 2: Operating-System Structures
Introduction to Operating Systems
Chapter 2: Operating-System Structures
COMP755 Advanced Operating Systems
CPS216: Advanced Database Systems Notes 04: Data Access from Disks
External Sorting Chapter 13
Presentation transcript:

An Exercise in Improving SAS Performance on Mainframe Processors SAS BLKSIZE and BUFSIZE Options

Forward At the last KCASUG meeting, George Hurley presented “Customizing Your SAS Initialization II.” In this presentation, George suggested that it is possible to save CPU in SAS jobs by tuning the BUFSIZE parameter. With our current interest in saving CPU and stretching the life of mainframe equipment, I decided to investigate what kind of savings were possible in our environment.

Background In the 1990s and earlier, disk storage for mainframes consisted of a stack of 14” platters arranged in what was called a disk drive. There was a separate read/write head for each surface All read/write heads were aligned at the same relative position and moved together Disk drives were organized into tracks and cylinders. A track represented the data that could be accessed from one surface with one revolution of the disk A cylinder was all the tracks that could be accessed from the same relative location of the read/writes heads. Data was stored with gaps between records in a CKD format

Background 3390s were the final generation of IBM classical disk drives Each track could hold up to 56,664 bytes The largest size record was 32,767 bytes While records could be larger, records were rarely larger than 27,998 bytes This is the largest record size that allowed 2 records per track Record sizes approaching 27,998 bytes provided optimal use of disk storage on 3390 devices This is commonly referred to as a “half track” record size

Background When modern storage controllers started replacing classical mainframe storage, the storage controllers emulated classical storage devices, particularly the 3390 While data is actually stored in stripes with multiple layers of virtualization, access to the data still follows the protocol of classical mainframe storage

Mainframe SAS Files Two factors have the most influence on the performance of I/Os for SAS datasets BLKSIZE – the size of the block (physical record) BUFSIZE – the size of the storage buffer Should be a multiple of BLKSIZE

SAS BLKSIZE BLKSIZE Larger block sizes are more efficient With smaller block sizes, there is additional overhead in SAS to manage each block SAS files can have any BLKSIZE up to 32,760 The optimal BLKSIZE for SAS files is 27,648 Largest “half-track” size for SAS files Provides optimal balance of performance and disk storage utilization

SAS BUFSIZE BUFSIZE When SAS schedules an I/O for a SAS dataset, it builds the I/O command to transfer as much data as will fit in the buffer as a single I/O command This saves the operating system overhead related to managing multiple I/Os SAS uses its own channel programming (EXCPs) for SAS files, not normal operating system access methods For example, with a BLKSIZE of 27,648 and a BUFSIZE of 110,592, SAS would build I/O commands to transfer 4 blocks with each I/O command

SAS BUFSIZE BUFSIZE Buffer sizes of between 110,592 and 221,184 tend to be fairly efficient MEMSIZE may need to be increased when BUFSIZE is increased

Controlled Tests Performed some controlled tests One controlled test Wrote 250,000 records to a SAS file (each about 1.6K of data) In separate step, read the records (in a _NULL_ data step, SET the input to the file just created) Varied BLKSIZE and BUFSIZE in each run

Controlled Tests Tests showed that a BLKSIZE of 27,648 performed better than a BLKSIZE of 6,144 for similar buffer sizes A BLKSIZE of 6,144 was the old standard in our shop Tests also suggested limited improvements in CPU and run times with buffer sizes above 110,592 to 221,184 In fact, sometimes performance appeared to deteriorate with larger buffer sizes

Production Pilots Identified the jobs that were using the largest total amount of CPU Ran pilots on 2 of the top 5 jobs to explore potential benefits with real jobs Changed BKLSIZE from 6,144 to 27,648 Increased BUFSIZE to 221,184 Ran several parallel runs of the MXG job with various BLKSIZE and BUFSIZE (MXG is a common SAS-based mainframe tool to capture and manage mainframe performance data) Experimented with various block sizes Have not placed changes to MXG in production yet Rewrote one job in another language

Pilot Results Pilot results were quite favorable Job using largest amount of CPU (runs many times each day) – see charts for Job 1 6% reduction in CPU 25% improvement in run time Job using 5th largest amount of CPU (runs many times each day) – see charts for Job 2 9% reduction in CPU 43% improvement in run time MXG (2nd largest user of CPU – runs once daily) 5% reduction in CPU ~ 10% improvement in run time

Production Implementation Changed BLKSIZE to 27,648 Changed both CONFIG member and SAS PROC Changed BUFSIZE to 221,184 Changed CONFIG member Made changes to ensure jobs would not fail with memory issues MEMSIZE parameter removed from CONFIG Defaults to 0 (no limitation on memory) Changed REGION to 0M in SAS PROC Made mass change to production SAS jobs to remove REGION parameter overrides

Implementation Results Measured results based on production jobs that ran daily Compared results on job / weekday basis For jobs that ran during the day: 10% average reduction in CPU Varied from no gain to 15-20% improvement 30% average improvement in run times Varied considerably from job to job For jobs that ran at night 3% reduction in CPU 10% improvement in run times

Issues and Opportunities Many production jobs reuse same SAS files without ever deleting and recreating them BLKSIZE remains smaller size Many production jobs use their own customized SAS PROCs or CONFIG members Cannot easily take advantage of changes Will need to look for opportunities to tune these jobs later

Thinking Outside the Box One very large SAS job runs daily Job would read 10-12 million rows Sort data on 4 keys Summarize 32 columns using PROC UNIVARIATE Rewrote job in another language Took advantage of partial natural order of data and used hashing algorithm to organize data Initial level summary done in summary program Summarized data was then input to SAS

Changes in Rewritten Job Reduced CPU 95% Improved run time 97% It is worth noting that I could find only two large SAS jobs that could take advantage of this technique. All other SAS jobs that I looked at were far too complex to consider doing this.