Copyright © 2005, SAS Institute Inc. All rights reserved. Getting the Best Performance from V9 Threaded PROC SORT Scott Mebust System Developer Base Information.

Slides:



Advertisements
Similar presentations
An Exercise in Improving SAS Performance on Mainframe Processors
Advertisements

P3- Represent how data flows around a computer system
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
External Sorting R & G Chapter 13 One of the advantages of being
External Sorting R & G Chapter 11 One of the advantages of being disorderly is that one is constantly making exciting discoveries. A. A. Milne.
External Sorting Access to secondary storage is orders of magnitude slower than memory access. Minimize access to secondary storage (tape or disk).
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
Processes and Resources
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
SAS: Managing Memory and Optimizing System Performance Jacek Czajkowski 09/29/2008.
Computer Architecture Part III-C: Memory Access and Management.
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
Chapter 8 File Processing and External Sorting. Primary vs. Secondary Storage Primary storage: Main memory (RAM) Secondary Storage: Peripheral devices.
Chapter 3 Memory Management: Virtual Memory
Operating System A program that controls the execution of application programs An interface between applications and hardware 1.
Lecture 11: DMBS Internals
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
ICS 321 Fall 2011 Overview of Storage & Indexing (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 11/9/20111Lipyeow.
Chapter 9: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating Kernel.
Sorting.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
1 External Sorting. 2 Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing gpa order.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
ICT IGCSE Theory – Revision Presentation 1.2 The Main Components of Computer Systems Chapter 1: Types and components of computer systems
CPSC 461 Final Review I Hessam Zakerzadeh Dina Said.
Computer Architecture 2 nd year (computer and Information Sc.)
Lecture 6 : External Sorting Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University.
Chapter 19: Introduction to Efficient SAS Programming 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Efficient SAS programming with Large Data Aidan McDermott Computing Group, March 2007.
FALL 2005CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
1 Computer Memory System Overview. Objectives  Discuss the overview of the memory elements of a computer  Describe the characteristics of the computer.
Logical & Physical Address Nihal Güngör. Logical Address In simplest terms, an address generated by the CPU is known as a logical address. Logical addresses.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapters 13: 13.1—13.5.
Memory Management Program must be brought (from disk) into memory and placed within a process for it to be run Main memory and registers are only storage.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
External Sorting Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
Chapter 3 Getting Started. Copyright © 2005 Pearson Addison-Wesley. All rights reserved. Objectives To give an overview of the structure of a contemporary.
GCSE OCR Computing A451 The CPU Computing hardware 1.
Chapter 2 Memory and process management
Lecture 16: Data Storage Wednesday, November 6, 2006.
Lecture 11: DMBS Internals
Database Management Systems (CS 564)
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
CS222P: Principles of Data Management UCI, Fall 2018 Notes #09 External Sorting Instructor: Chen Li.
Logical Computer System
CS222: Principles of Data Management Lecture #10 External Sorting
Chapter 5 Computer Organization
External Sorting.
CS222P: Principles of Data Management Lecture #10 External Sorting
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #09 External Sorting Instructor: Chen Li.
4. Computer system.
Presentation transcript:

Copyright © 2005, SAS Institute Inc. All rights reserved. Getting the Best Performance from V9 Threaded PROC SORT Scott Mebust System Developer Base Information Technology

Copyright © 2005, SAS Institute Inc. All rights reserved. 2 The (Unofficial) SAS Skydiving Team

Copyright © 2005, SAS Institute Inc. All rights reserved. 3 Keys to Sorting Performance  Know the conditions  Observe actual performance  Understand theoretical performance  Make adjustments

Copyright © 2005, SAS Institute Inc. All rights reserved. 4 Know the Conditions  System  SAS  Sort job

Copyright © 2005, SAS Institute Inc. All rights reserved. 5 System Conditions  Operating System Size of Virtual Memory Swap file location Load −Computational −Memory −Input/Output  Hardware Number of processors Size of RAM Storage Devices −Sustained Transfer Rate −Average positional latency −Average rotational latency

Copyright © 2005, SAS Institute Inc. All rights reserved. 6 SAS Conditions  Library Assignments LIBNAME to logical location Logical location to physical location  System Options Sort Choice −SORTPGM −SORTCUT −SORTCUTP −THREADS −CPUCOUNT  System Options Memory Group −MEMSIZE −REALMEMSIZE −SORTSIZE Other −UBUFSIZE −WORK −UTILLOC −SORTDUP −STIMER −MSGLEVEL

Copyright © 2005, SAS Institute Inc. All rights reserved. 7 Sort Job Conditions  Dataset (Input, Output) Location Dimensions −Size −# of observations −Observation length Compression Subsetting options  Sort key Length Value characteristics  Procedure Options THREADS DETAILS TAGSORT PSIZE NODUPREC NODUPKEY  Utility file location

Copyright © 2005, SAS Institute Inc. All rights reserved. 8 Observe Actual Performance  Monitor System Activity  Examine the SAS Log  Measure System Capabilities

Copyright © 2005, SAS Institute Inc. All rights reserved. 9 Identify and Observe Sorting Phases Sort Phase Merge Phase I/O Bound, External, Single-Threaded

Copyright © 2005, SAS Institute Inc. All rights reserved. 10 Identify and Observe Sorting Phases CPU Bound, Internal, Single-ThreadedCPU Bound, External, Single-Threaded Sort Phase Merge Phase

Copyright © 2005, SAS Institute Inc. All rights reserved. 11 Examine the SAS Log mrgcount = 1 mempage=16896 alocsize=24 isa=16896 osa=16896 xmisa=0 holds=2 nway=24789 sortsize= memoryuse= keylen=16 reclen=8184 dkin=0 inrec= outrec= yieldobs=0 nruns=6 xcbpage=16896 npages= diskuse= NOTE: SAS sort was used. NOTE: PROCEDURE SORT used (Total process time): real time 5:35.68 cpu time seconds NOTE: 6 sorted runs written to utility file. NOTE: Utility file contains pages of size bytes for a total of KB. NOTE: SAS threaded sort was used. NOTE: PROCEDURE SORT used (Total process time): real time 5:43.06 cpu time 1:27.49

Copyright © 2005, SAS Institute Inc. All rights reserved. 12 Measure Storage Device Sequential Transfer Rates From Within SAS  Create a large dataset (e.g. 4xRAM)  Read dataset, dumping to _NULL_  Ensure Real time » CPU time  Compute transfer rates ( R ) Where F: size of the dataset (bytes) t: real time (seconds)

Copyright © 2005, SAS Institute Inc. All rights reserved. 13 Measure In-Core Sorting Costs and Extrapolate CPU Time (seconds) Normalized CPU Time CPU Time (seconds) NActualln(N)ActualPredicted E E E E E E E E E

Copyright © 2005, SAS Institute Inc. All rights reserved. 14 Measure In-Core Sorting Costs Small job overhead

Copyright © 2005, SAS Institute Inc. All rights reserved. 15 Understand Theoretical Performance  Classify the job  Estimate SORT running time  Consider estimation hazards

Copyright © 2005, SAS Institute Inc. All rights reserved. 16 Classify the Job Performance Limitation  Compute Bound  I/O Bound  Mixed

Copyright © 2005, SAS Institute Inc. All rights reserved. 17 Classify the Job Size Where F: size of input dataset O: size of internal sorting overhead M: size of RAM B: utility file page (block) size

Copyright © 2005, SAS Institute Inc. All rights reserved. 18 Internal (in-core) Sorting Random Access Memory Sorting Overhead Data Input Output RAM

Copyright © 2005, SAS Institute Inc. All rights reserved. 19 External (out-of-core) Sorting Random Access Memory Sorting Overhead Data

Copyright © 2005, SAS Institute Inc. All rights reserved. 20 External Sorting – Data Flow RAM Output Input Temp Single-Pass RAM Output Input 2 nd Half Double-Pass 1 st Half Temp

Copyright © 2005, SAS Institute Inc. All rights reserved. 21 Estimate the Running Time Internal Sort, I/O Bound Input Output RAM Sequential Read Sequential Write Where t: real time (sec) F: dataset size (bytes) R: transfer rate (bytes/sec)

Copyright © 2005, SAS Institute Inc. All rights reserved. 22 Estimate the Running Time Single-Pass External, I/O Bound Output Input Sequential Read Sequential WriteRandom Read Sequential Write RAM U: utility file size (bytes) Temp Where

Copyright © 2005, SAS Institute Inc. All rights reserved. 23 Utility File Read Time Single-threaded: File Size Multi-threaded: Number of Pages Best Case (Sequential) Read TimeWorst Case (Random) Read Time where B: utility file page (block) size where F: size of input dataset o: # of observations × sort key length s: average positional latency r: average rotational latency

Copyright © 2005, SAS Institute Inc. All rights reserved. 24 Multi-Pass External Sorting Number of Sorted RunsNumber of Utility File Passes is the Maximum External Merge Order where and F: size of input dataset O: size of internal sorting overhead M: SORTSIZE B: utility file page (block) size

Copyright © 2005, SAS Institute Inc. All rights reserved. 25 Estimate the Running Time Single-Pass External, Compute Bound Output Input Temp Sequential WriteRandom Read RAM

Copyright © 2005, SAS Institute Inc. All rights reserved. 26 Single-Pass External, Compute Bound Utility File Creation Time where Where n obs is the total number of observations in the dataset t run is the time required to perform an in-memory sort the number of observations in a single run Utility File Merge Time, Compute Bound As previously described for I/O bound Utility File Read Time Utility File Merge Time, I/O Bound Worst Case: Best Case:

Copyright © 2005, SAS Institute Inc. All rights reserved. 27 Consider Estimation Hazards  File cache effects  Pseudo-internal sorting ( thrashing )  Pseudo-external sorting ( file cache )  Limitations within each sorting phase

Copyright © 2005, SAS Institute Inc. All rights reserved. 28 Pseudo-Internal Sorting Random Access Memory Virtual Memory (RAM+swap) Sorting Overhead Data SORTSIZE

Copyright © 2005, SAS Institute Inc. All rights reserved. 29 Pseudo-External Sorting Random Access Memory Overhead Data SORTSIZE File Cache Utility File

Copyright © 2005, SAS Institute Inc. All rights reserved. 30 Make adjustments  Determine if there is a problem  Identify the problem  Alter the conditions  Re-evaluate

Copyright © 2005, SAS Institute Inc. All rights reserved. 31 Identify the Problem  Processing speed  Memory  External Storage

Copyright © 2005, SAS Institute Inc. All rights reserved. 32 Alter the Conditions  Memory settings  Library to storage device mappings  Utility file location  Utility file page size

Copyright © 2005, SAS Institute Inc. All rights reserved. 33 Memory Group Option Settings Random Access Memory Virtual Memory (RAM+swap) SORTSIZE MEMSIZE REALMEMSIZE SAS Other Active Processes Operating System

Copyright © 2005, SAS Institute Inc. All rights reserved. 34 Copyright © 2005, SAS Institute Inc. All rights reserved. 34