Download presentation
Presentation is loading. Please wait.
Published byVirgil Higgins Modified over 9 years ago
1
Hash vs Join A case study evaluating the use of the data step hash object to replace a SQL join Geoff Ness Sep 2014
2
The Hash Object Effectively a lookup table which resides in memory – key/value pairs Similar to associative arrays, dictionaries in other programming languages Fast lookup (O(1)), no sorting required Can offer a faster alternative to traditional data step merge or SQL join, at a price: –The syntax is unfamiliar to a lot of SAS programmers –There’s more code to write –Requires more memory than a join (sometimes much more)
3
Using Hash to replace a SQL Join Fact table Dimension 1 Dimension 2 Dimension 3 Dimension 4
4
SQL Join
5
Alternative using the Hash Object Replacing the join typically requires 3 steps to be coded: 1 - Create variables by ‘faking’ a set statement:
6
2 - Then declare hash objects for each dimension:
7
3 - Finally, join rows from the fact to rows in the dimensions by calling the hash.find() method: The.find() method returns 0 when a matching row is found in the column from.definekey(), and the values from.definedata() are populated
8
Performance Comparison When joining 2 dimensions, small fact (100K rows):
9
Joining 2 dimensions, large fact (~10M rows):
10
Joining 9 dimensions, small fact (100K rows):
11
Joining 9 dimensions, large fact (~10M rows):
12
Stuff we haven’t considered Outer joins (yes these are possible) When proc sql will use the hash object ‘under the covers’ Performance against RDBMS tables (as opposed to SAS datasets) Hash iterators Other things that can be done with the hash object (sorting, summarisation, de-duplication)
13
Summary Implementing a join using the hash object can provide a considerable saving in terms of time, usually at the expense of memory The code is a little more involved but breaks down to a reasonably simple process to implement Things to consider: –The number and size of tables involved –The memory required to load all the hash objects into memory
14
References The SAS® Hash Object in Action http://support.sas.com/resources/papers/proceedings09/153- 2009.pdf Introduction to SAS® Hash Objects http://www.scsug.org/wp-content/uploads/2013/11/Introduction-to- SAS%C2%AE-Hash-Objects-Chris-Schacherer.pdf A Hash Alternative to the PROC SQL Left Join http://www.nesug.org/proceedings/nesug06/dm/da07.pdf Using the Hash Object – SAS® Language Reference: Concepts http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/defa ult/viewer.htm#a002585310.htm
15
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.