Copyright © 2005, SAS Institute Inc. All rights reserved. Getting the Best Performance from V9 Threaded PROC SORT Scott Mebust System Developer Base Information Technology
Copyright © 2005, SAS Institute Inc. All rights reserved. 2 The (Unofficial) SAS Skydiving Team
Copyright © 2005, SAS Institute Inc. All rights reserved. 3 Keys to Sorting Performance Know the conditions Observe actual performance Understand theoretical performance Make adjustments
Copyright © 2005, SAS Institute Inc. All rights reserved. 4 Know the Conditions System SAS Sort job
Copyright © 2005, SAS Institute Inc. All rights reserved. 5 System Conditions Operating System Size of Virtual Memory Swap file location Load −Computational −Memory −Input/Output Hardware Number of processors Size of RAM Storage Devices −Sustained Transfer Rate −Average positional latency −Average rotational latency
Copyright © 2005, SAS Institute Inc. All rights reserved. 6 SAS Conditions Library Assignments LIBNAME to logical location Logical location to physical location System Options Sort Choice −SORTPGM −SORTCUT −SORTCUTP −THREADS −CPUCOUNT System Options Memory Group −MEMSIZE −REALMEMSIZE −SORTSIZE Other −UBUFSIZE −WORK −UTILLOC −SORTDUP −STIMER −MSGLEVEL
Copyright © 2005, SAS Institute Inc. All rights reserved. 7 Sort Job Conditions Dataset (Input, Output) Location Dimensions −Size −# of observations −Observation length Compression Subsetting options Sort key Length Value characteristics Procedure Options THREADS DETAILS TAGSORT PSIZE NODUPREC NODUPKEY Utility file location
Copyright © 2005, SAS Institute Inc. All rights reserved. 8 Observe Actual Performance Monitor System Activity Examine the SAS Log Measure System Capabilities
Copyright © 2005, SAS Institute Inc. All rights reserved. 9 Identify and Observe Sorting Phases Sort Phase Merge Phase I/O Bound, External, Single-Threaded
Copyright © 2005, SAS Institute Inc. All rights reserved. 10 Identify and Observe Sorting Phases CPU Bound, Internal, Single-ThreadedCPU Bound, External, Single-Threaded Sort Phase Merge Phase
Copyright © 2005, SAS Institute Inc. All rights reserved. 11 Examine the SAS Log mrgcount = 1 mempage=16896 alocsize=24 isa=16896 osa=16896 xmisa=0 holds=2 nway=24789 sortsize= memoryuse= keylen=16 reclen=8184 dkin=0 inrec= outrec= yieldobs=0 nruns=6 xcbpage=16896 npages= diskuse= NOTE: SAS sort was used. NOTE: PROCEDURE SORT used (Total process time): real time 5:35.68 cpu time seconds NOTE: 6 sorted runs written to utility file. NOTE: Utility file contains pages of size bytes for a total of KB. NOTE: SAS threaded sort was used. NOTE: PROCEDURE SORT used (Total process time): real time 5:43.06 cpu time 1:27.49
Copyright © 2005, SAS Institute Inc. All rights reserved. 12 Measure Storage Device Sequential Transfer Rates From Within SAS Create a large dataset (e.g. 4xRAM) Read dataset, dumping to _NULL_ Ensure Real time » CPU time Compute transfer rates ( R ) Where F: size of the dataset (bytes) t: real time (seconds)
Copyright © 2005, SAS Institute Inc. All rights reserved. 13 Measure In-Core Sorting Costs and Extrapolate CPU Time (seconds) Normalized CPU Time CPU Time (seconds) NActualln(N)ActualPredicted E E E E E E E E E
Copyright © 2005, SAS Institute Inc. All rights reserved. 14 Measure In-Core Sorting Costs Small job overhead
Copyright © 2005, SAS Institute Inc. All rights reserved. 15 Understand Theoretical Performance Classify the job Estimate SORT running time Consider estimation hazards
Copyright © 2005, SAS Institute Inc. All rights reserved. 16 Classify the Job Performance Limitation Compute Bound I/O Bound Mixed
Copyright © 2005, SAS Institute Inc. All rights reserved. 17 Classify the Job Size Where F: size of input dataset O: size of internal sorting overhead M: size of RAM B: utility file page (block) size
Copyright © 2005, SAS Institute Inc. All rights reserved. 18 Internal (in-core) Sorting Random Access Memory Sorting Overhead Data Input Output RAM
Copyright © 2005, SAS Institute Inc. All rights reserved. 19 External (out-of-core) Sorting Random Access Memory Sorting Overhead Data
Copyright © 2005, SAS Institute Inc. All rights reserved. 20 External Sorting – Data Flow RAM Output Input Temp Single-Pass RAM Output Input 2 nd Half Double-Pass 1 st Half Temp
Copyright © 2005, SAS Institute Inc. All rights reserved. 21 Estimate the Running Time Internal Sort, I/O Bound Input Output RAM Sequential Read Sequential Write Where t: real time (sec) F: dataset size (bytes) R: transfer rate (bytes/sec)
Copyright © 2005, SAS Institute Inc. All rights reserved. 22 Estimate the Running Time Single-Pass External, I/O Bound Output Input Sequential Read Sequential WriteRandom Read Sequential Write RAM U: utility file size (bytes) Temp Where
Copyright © 2005, SAS Institute Inc. All rights reserved. 23 Utility File Read Time Single-threaded: File Size Multi-threaded: Number of Pages Best Case (Sequential) Read TimeWorst Case (Random) Read Time where B: utility file page (block) size where F: size of input dataset o: # of observations × sort key length s: average positional latency r: average rotational latency
Copyright © 2005, SAS Institute Inc. All rights reserved. 24 Multi-Pass External Sorting Number of Sorted RunsNumber of Utility File Passes is the Maximum External Merge Order where and F: size of input dataset O: size of internal sorting overhead M: SORTSIZE B: utility file page (block) size
Copyright © 2005, SAS Institute Inc. All rights reserved. 25 Estimate the Running Time Single-Pass External, Compute Bound Output Input Temp Sequential WriteRandom Read RAM
Copyright © 2005, SAS Institute Inc. All rights reserved. 26 Single-Pass External, Compute Bound Utility File Creation Time where Where n obs is the total number of observations in the dataset t run is the time required to perform an in-memory sort the number of observations in a single run Utility File Merge Time, Compute Bound As previously described for I/O bound Utility File Read Time Utility File Merge Time, I/O Bound Worst Case: Best Case:
Copyright © 2005, SAS Institute Inc. All rights reserved. 27 Consider Estimation Hazards File cache effects Pseudo-internal sorting ( thrashing ) Pseudo-external sorting ( file cache ) Limitations within each sorting phase
Copyright © 2005, SAS Institute Inc. All rights reserved. 28 Pseudo-Internal Sorting Random Access Memory Virtual Memory (RAM+swap) Sorting Overhead Data SORTSIZE
Copyright © 2005, SAS Institute Inc. All rights reserved. 29 Pseudo-External Sorting Random Access Memory Overhead Data SORTSIZE File Cache Utility File
Copyright © 2005, SAS Institute Inc. All rights reserved. 30 Make adjustments Determine if there is a problem Identify the problem Alter the conditions Re-evaluate
Copyright © 2005, SAS Institute Inc. All rights reserved. 31 Identify the Problem Processing speed Memory External Storage
Copyright © 2005, SAS Institute Inc. All rights reserved. 32 Alter the Conditions Memory settings Library to storage device mappings Utility file location Utility file page size
Copyright © 2005, SAS Institute Inc. All rights reserved. 33 Memory Group Option Settings Random Access Memory Virtual Memory (RAM+swap) SORTSIZE MEMSIZE REALMEMSIZE SAS Other Active Processes Operating System
Copyright © 2005, SAS Institute Inc. All rights reserved. 34 Copyright © 2005, SAS Institute Inc. All rights reserved. 34