Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support.

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

Visit : Call Us: US: , India:
Visit : Call Us: US: , India:
Copyright © 2003, SAS Institute Inc. All rights reserved. Developing Client/Server Applications to Maximize SAS® 9 Parallel Capabilities Cheryl Doninger.
Big Data Working with Terabytes in SQL Server Andrew Novick
Multiprocessing with SAS ® Software Now Bill Fehlner, Kathleen Wong, Kifah Mansour SAS Toronto.
1 Using SAS Parallel-Processing Features To Reduce Program Execution Time Presented by Berwick Chan Kaiser Permanente Vaccine Study Center NCAL Division.
1 DB2 Access Recording Services Auditing DB2 on z/OS with “DBARS” A product developed by Software Product Research.
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Peter Plevka, BMC Software Managing IT and Your Business – Optimizing Mainframe Cost and Performance.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
Data Structures Hash Tables
5 Creating the Physical Model. Designing the Physical Model Phase IV: Defining the physical model.
1 I/O Management in Representative Operating Systems.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Simplify your Job – Automatic Storage Management Angelo Session id:
Andrew Holdsworth Director Real World and ISV Performance Oracle Corporation Howard Plemmons Senior Software Manager SAS Institute Inc.
© 2011 IBM Corporation 11 April 2011 IDS Architecture.
SSIS Over DTS Sagayaraj Putti (139460). 5 September What is DTS?  Data Transformation Services (DTS)  DTS is a set of objects and utilities that.
Copyright © 2006, SAS Institute Inc. All rights reserved. Enterprise Guide 4.2 : A Primer SHRUG : Spring 2010 Presented by: Josée Ranger-Lacroix SAS Institute.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
Oracle on Windows Server Introduction to Oracle10g on Microsoft Windows Server.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Copyright © 2010, Scryer Analytics, LLC. All rights reserved. Optimizing SAS System Performance − A Platform Perspective Patrick McDonald Scryer Analytics,
Grid The Evolution from Parallel Processing to Modern Day Computing Greg McLean Vecdet Mehmet-Ali.
Oracle9i Performance Tuning Chapter 1 Performance Tuning Overview.
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Copyright © 2005, SAS Institute Inc. All rights reserved. Installing and Configuring the BI Platform on z/OS Tony Valmassoi Systems Developer z/OS Host.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Copyright © 2008, SAS Institute Inc. All rights reserved. Hash Objects – Why Use Them? Carolyn Cunnison SAS Technical Training Specialist.
Session objectives Discuss whether or not virtualization makes sense for Exchange 2013 Describe supportability of virtualization features Explain sizing.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. Chapter 12 Understanding database managers on z/OS.
The Client/Server Database Environment Ployphan Sornsuwit KPRU Ref.
Database Architectures Database System Architectures Considerations – Data storage: Where do the data and DBMS reside? – Processing: Where.
DAT 360: DTS in SQL Server 2000 Best Practices Euan Garden Group Manager, SQL Server Microsoft Corporation.
File System Implementation
Programming Logic and Design Fourth Edition, Comprehensive Chapter 8 Arrays.
VMware vSphere Configuration and Management v6
Doug Haigh, SAS Institute Inc.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
7 Strategies for Extracting, Transforming, and Loading.
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS ® Using the SAS Grid.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Copyright © Curt Hill Operating Systems An Introductory Overview.
Text TCS INTERNAL Oracle PL/SQL – Introduction. TCS INTERNAL PL SQL Introduction PLSQL means Procedural Language extension of SQL. PLSQL is a database.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
Introduction to threads
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Chapter 3: Process Concept
Operating System.
Chapter 2: System Structures
File System Implementation
The Client/Server Database Environment
Software Architecture in Practice
Ch > 28.4.
Evaluation of Relational Operations: Other Operations
SQL 2014 In-Memory OLTP What, Why, and How
Software models - Software Architecture Design Patterns
Introduction to Teradata
Specialized Cloud Architectures
Making Remote Processing Less Remote
Chapter 3: Processes.
Evaluation of Relational Operations: Other Techniques
LO2 – Understand Computer Software
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Copyright © 2005, SAS Institute Inc. All rights reserved. Improving Batch Application Service Through Tuning and Parallelism Dan Squillace Mainframe Support Manager SAS Institute Cary, NC USA

Copyright © 2005, SAS Institute Inc. All rights reserved. Some Business Drivers for Performance Improvement Acidities  Increasing data volumes More customers More data about each customer needed for increasingly sophisticated analytics which aid better and more timely decision-making.  Decreasing processing window Improve BI application availability by shortening ETL elapsed time.  Increasing pressure to reduce costs Lower resource requirements Improve competetive position

Copyright © 2005, SAS Institute Inc. All rights reserved. Session Overview  This session focuses on processing improvements beneficial to handling large data volumes.  Performance improvement areas CPU optimization Reducing I/O Improved overlap and parallelism Elapsed time optimization (Not the same  Focus Areas DATA STEP tuning New SAS9 features

Copyright © 2005, SAS Institute Inc. All rights reserved. Session Outline  Don’t forget the basics! - A ShortTuning Case Study  DATA Step Views  PROC SUMMARY w/DATA Step View  DATA Step hash table functions  SAS Parallel Data Engine (SPDE)  SAS/Connect Pipes  Wrap-up

Copyright © 2005, SAS Institute Inc. All rights reserved. Back to Basics: High-Volume DATA Step Optimization  Before implementing parallel operations, make sure basic processing flow is efficient  When processing high volumes of data, even apparently small changes can have a large effect  The following customer case study illustrates several points.

Copyright © 2005, SAS Institute Inc. All rights reserved. Program processes 36 million MXG TYPE74 records (436 CPU seconds 9672 G6) DATA FILE.A; SET INFILE1.TYPE74; KOUNT = 1 ; IF VOLSER = '.' OR VOLSER = ' ' THEN DELETE ; IF SYSTEM = '888K' OR SYSTEM = '888Z' OR SYSTEM = '888Q' OR SYSTEM = '888V' OR SYSTEM = '888P' THEN DO ; IF DATEPART(SYNCTIME) < '03APR04'D THEN SYNCTIME = SYNCTIME - '06:00:00.00'T ; IF DATEPART(SYNCTIME) > '02APR04'D THEN SYNCTIME = SYNCTIME - '05:00:00.00'T ; END ; SYMNUM = 0 ; IF DATEPART(SYNCTIME) < '17MAY04'D THEN DO ; IF DEVNR > 58FFX AND DEVNR < 5FFFX THEN SYMNUM = 111; IF DEVNR > 6FFFX AND DEVNR < 7FFFX THEN SYMNUM = 456; IF DEVNR > 7FFFX THEN SYMNUM = 234; IF DEVNR => 5000X AND DEVNR < 5200X THEN SYMNUM = 234; IF DEVNR => 5FFFX AND DEVNR < 7000X THEN SYMNUM = 876; END; IF DATEPART(SYNCTIME) > '17MAY04'D THEN DO ; IF DEVNR > 4FFFX AND DEVNR < 7000X THEN SYMNUM = 223; IF DEVNR > 6FFFX AND DEVNR < 7FFFX THEN SYMNUM = 456; IF DEVNR > 7FFFX THEN SYMNUM = 234; END; TIPPCT = (IORATE * (AVGCONMS +AVGDISMS))/10 ; FORMAT TIPPCT 5.2 ; IF SYMNUM = 0 THEN DELETE ; IO_1111 = 0 ; IO_4563 = 0 ; IO_234 = 0 ; IO_8765 = 0 ; IO_22355 = 0 ; IF SYMNUM = 1111 THEN IO_1111 = IORATE ; IF SYMNUM = 4563 THEN IO_4563 = IORATE ; IF SYMNUM = 234 THEN IO_234 = IORATE ; IF SYMNUM = 8765 THEN IO_8765 = IORATE ; IF SYMNUM = THEN IO_22355 = IORATE ; DATE = DATEPART(SYNCTIME) ; FORMAT DATE DATE7. ; INTE = TIMEPART(SYNCTIME) ; FORMAT INTE TIME19.2 ; EMCTYPE = 'ESCON' ; IF SYMNUM = THEN EMCTYPE = 'FICON' ; IF IORATE < 10 THEN DELETE ; KEEP VOLSER DEVNR TIPPCT DATE INTE SYMNUM IO_1111 IO_4563 IO_234 IO_8765 SYNCTIME IO_22355 EMCTYPE IORATE AVGRSPMS AVGIOQMS AVGPNDMS AVGCONMS AVGDISMS AVGPNCHA AVGPNCUB AVGPNDEV AVGPNDIR PCTDVCON PCTDVUSE KOUNT ;

Copyright © 2005, SAS Institute Inc. All rights reserved. Do filtering as early as possible TIPPCT = (IORATE * (AVGCONMS +AVGDISMS))/10 ; FORMAT TIPPCT 5.2 ; IF SYMNUM = 0 THEN DELETE ; IO_1111 = 0 ; IO_4563 = 0 ; IO_234 = 0 ; IO_8765 = 0 ; IO_22355 = 0 ; IF SYMNUM = 1111 THEN IO_1111 = IORATE ; IF SYMNUM = 4563 THEN IO_4563 = IORATE ; IF SYMNUM = 234 THEN IO_234 = IORATE ; IF SYMNUM = 8765 THEN IO_8765 = IORATE ; IF SYMNUM = THEN IO_22355 = IORATE ; DATE = DATEPART(SYNCTIME) ; FORMAT DATE DATE7. ; INTE = TIMEPART(SYNCTIME) ; FORMAT INTE TIME19.2 ; EMCTYPE = 'ESCON' ; IF SYMNUM = THEN EMCTYPE = 'FICON' ; IF IORATE < 10 THEN DELETE; KEEP VOLSER DEVNR TIPPCT DATE INTE SYMNUM IO_1111 IO_4563 IO_234 IO_8765 SYNCTIME IO_22355 EMCTYPE IORATE AVGRSPMS AVGIOQMS AVGPNDMS AVGCONMS AVGDISMS AVGPNCHA AVGPNCUB AVGPNDEV AVGPNDIR PCTDVCON PCTDVUSE KOUNT ;  Move to top of DATA Step  CPU Time reduction 67%

Copyright © 2005, SAS Institute Inc. All rights reserved. Additional Steps  Put KEEP= as DATA SET option to bring in fewer variables into the DATA step. Note: This decreases CPU time, but not I/O time.  Use IF-THEN-ELSE or SELECT instead of just IF-THEN.  Eliminated redundant DATEPART function calls.  Cumulative CPU time reduction: 80%

Copyright © 2005, SAS Institute Inc. All rights reserved. Final Step  Move filtering of blank VOLSER and IORATE <10 to WHERE clause DATA SET option.  Total cumulative CPU time reduction: 86% Net savings of 368 CPU seconds

Copyright © 2005, SAS Institute Inc. All rights reserved. The Value of CPU Time Reduction  Always important on the mainframe because it is inherently a multi-workload beast.  Often considered unimportant (or less so anyway) on Windows and UNIX platforms because of dedicated nature of systems. Elapsed time is often more important.  Changing with increasing use of server virtualization. Affects how many virtual servers can run on a physical platform. Logical Partitions or Domains on UNIX systems Virtual Machines on Windows and Linux systems

Copyright © 2005, SAS Institute Inc. All rights reserved. Some General Strategies for Improving Processing of Large Data Volumes  Reduce volume of data passed (e.g. keep only required variables in intermediate files)  Reduce number of data basses  Eliminate or reduce use of non-linearly scalable techniques such as sorting.  Exploit memory  Exploit processing overlap and parallelism

Copyright © 2005, SAS Institute Inc. All rights reserved. Exploiting New SAS Features  We’ll use two scenarios from common processing challenges encountered when processing transaction data for performance and service level reporting purposes.  The improvements made to the processing strategy for these scenarios ….. Reduce number of data basses Eliminate or reduce use of non-linearly scalable techniques such as sorting. Exploit memory Exploit processing overlap and parallelism

Copyright © 2005, SAS Institute Inc. All rights reserved. General Scenario Chrematistics  Very high data volumes (millions of records, tens or hundreds of Gigabytes  Multiple summarizations desired  Detail records retained only for exceptional cases.

Copyright © 2005, SAS Institute Inc. All rights reserved. Scenario One  High-volume transaction data, say from web log, CICS, DB2, SAP  Desired summarized file for service level management, accounting, performance and capacity management.  Not interested in keeping every detail transaction record.

Copyright © 2005, SAS Institute Inc. All rights reserved. DATA Step Views  Can be used to eliminate a data passes  Runs two tasks in parallel, but does not multi- process  In this case, eliminates one pass of the data. data lib.a / view=lib.a; infile ……; input x ……; run; proc sort data=lib.a; by x; run;

Copyright © 2005, SAS Institute Inc. All rights reserved. SAS DATA Step View caveats  Can inhibit use of indexed I/O; Data Set Option WHERE clause cannot use index with a DATA Step view.  DATA Step views are sensitive not only to SAS release and version levels, but sometimes to maintenance levels.

Copyright © 2005, SAS Institute Inc. All rights reserved. DATA Step Views with Proc Summary  Eliminate data passes and saves disk space.  Eliminate sort  Can produce multiple summarization data sets in one pass  Benefits from large region size (enough to hold crossings)  SUMMARY in SAS 9.1 Multithreaded Does not keep n-way in memory unless needed. data lib.a / view=lib.a; infile ……; input a b x y……; run; proc summary data=lib.a; CLASS statement; TYPES statement; OUTPUT statement(s); run;

Copyright © 2005, SAS Institute Inc. All rights reserved. SAS9 Threaded Procedures  SORT  SUMMARY/MEANS  TABULATE  REPORT  SQL  REG, GLM, LOESS, DMREG,DMINE

Copyright © 2005, SAS Institute Inc. All rights reserved. Scenario Two  High Volume Event data (time-oriented (e.g. ARM log)  Transactions must be constructed from multiple event records Type S – transaction start ( ID, start time, code, ) Type E – transaction end ( ID, end time, CPU time)

Copyright © 2005, SAS Institute Inc. All rights reserved. Data arrival pattern Start 1 Start 2 End 1 (write out 1) Start 3 End 2 (write out 2) Start 4 Start 5 End 4 (write out 4) End 5 (write out 5) End 3 (write out 3)

Copyright © 2005, SAS Institute Inc. All rights reserved. DATA Step Hash Table Support (New in SAS9)  Can replace lookup formats  Can have entries dynamically added, modified, and removed  For this Scenario, use a Hash Table to accumulate transaction records from start and end events.

Copyright © 2005, SAS Institute Inc. All rights reserved. data transactions view=transactions; declare hash transactions(); transactions.defineKey("tr_id"); transactions.defineData("tr_start", "tr_code“); transactions.defineDone(); input if type = 'S' then do; input tr_id tr_code tr_start; rc=transactions.add(); end; else if type='E' then do; input tr_id tr_end tr_cpu; rc = transactions.find(); response = tr_end - tr_start; output; rc = transactions.remove(); end;

Copyright © 2005, SAS Institute Inc. All rights reserved. The Scalable Parallel Data Engine (SPDE)  New in SAS 9.1  Included with BASE  Available on all 9.1 platforms  Advantages Parallel data loading and index creation Parallel reads and searches Uses multiple indices to resolve a search

Copyright © 2005, SAS Institute Inc. All rights reserved. S PDE – Scalable Performance Data Engine SAS ® System Scalable Performance Data Engine data index metadata data1 data2 data3 data4 Bitmap/B-tree Hybrid index Bitmap/B-tree

Copyright © 2005, SAS Institute Inc. All rights reserved. SAS SPDE implementation on z/OS  USS thread services  USS directory-based file systems zFS hFS NFS file systems  Exploitation Define file system Change LIBNAME engine specification

Copyright © 2005, SAS Institute Inc. All rights reserved. SPDE data set allocation on z/OS  NFS – follow same guidelines as for Open Systems  HFS – Use separate HFS file systems for DATA and INDEX components; perhaps multiple for DATA. Spread HFS’s across Shark (ESS 2105) loops.  zFS - No special considerations! Use multi- volume zFS particularly if Storage system has Parallel Access Volumes (PAV) ESS has Arrays Across Loops feature

Copyright © 2005, SAS Institute Inc. All rights reserved. Scalable SAS/ACCESS OracleDB2SybaseTeradata Scalable Performance Data Access CPU 1Remote Host CPU 2 SAS CONNECT SAS CONNECT SAS CONNECT THREAD 1 THREAD 2 Threaded Procedures THREAD N… Piping Scalability – SAS 9.1 SAS Scalable Architecture in SAS Foundation

Copyright © 2005, SAS Institute Inc. All rights reserved. MP Connect Pipes  New in SAS9  Uses TCP/IP socket engine  Superior to DATA Step View approach  Provides true multi-processing

Copyright © 2005, SAS Institute Inc. All rights reserved. /* DATA STEP - PROCESS P */ SIGNON P1 SASCMD='!SASCMD'; RSUBMIT P1 WAIT=NO; LIBNAME OUTLIB SASESOCK ":PIPE1"; data outlib.transactions; declare hash transactions(); transactions.defineKey("tr_id"); transactions.defineData("tr_start", "tr_code“); transactions.defineDone(); input if type = 'S' then do; input tr_id tr_code tr_start; rc=transactions.add(); end; else if type='E' then do; input tr_id tr_end tr_cpu; rc = transactions.find(); response = tr_end - tr_start; output; rc = transactions.remove(); end; ENDRSUBMIT; /* ---- SUMMARY - PROCESS P */ SIGNON P2 SASCMD='!SASCMD'; RSUBMIT P2 WAIT=NO; LIBNAME INLIB SASESOCK ":PIPE1"; proc summary data=inlib.transactions; CLASS statement; TYPES statement; OUTPUT statement(s); run; PROC PRINT;RUN; ENDRSUBMIT; WAITFOR _ALL_ P1 P2;

Copyright © 2005, SAS Institute Inc. All rights reserved. In Summary……  Remember the importance of basic SAS program tuning skills which have been well- known for years.  Take advantage of the significant SAS9 features which can help you Improve response and turnaround times Improve availability times for BI applications by shortening the batch window. Reduce costs by cutting resource consumption and utilizing the most effective combination of CPU, memory, and I/O resources

Copyright © 2005, SAS Institute Inc. All rights reserved. 31