Download presentation
Presentation is loading. Please wait.
1
OASUS Spring or Fall YYYY
Thursday, September-20-18 SAS Hash Object Improving the performance of your SAS programs using the Hash Object First & last name Company name
2
OASUS Spring or Fall YYYY
Thursday, September-20-18 Outline Introduction What is Hashing? Why use hash objects? Ex. & benchmark – Simple join Ex. & benchmark – Trigram matching Other examples Conclusion References Thursday, September-20-18 Vincent Martin Statistics Canada First & last name Company name
3
OASUS Spring or Fall YYYY
Thursday, September-20-18 Introduction The Hash Object was the first ever data step object and was introduced in SAS 9.0 It denoted a major change in SAS philosophy. What is a Hash Object? In short: A lookup table All stored in memory Providing nearly O(1) searches Thursday, September-20-18 Vincent Martin Statistics Canada First & last name Company name
4
OASUS Spring or Fall YYYY
Thursday, September-20-18 What is hashing Thursday, September-20-18 Vincent Martin Statistics Canada First & last name Company name
5
OASUS Spring or Fall YYYY
Thursday, September-20-18 What is hashing Thursday, September-20-18 Vincent Martin Statistics Canada First & last name Company name
6
OASUS Spring or Fall YYYY
Thursday, September-20-18 Why use Hash Objects? Huge time savings on routine tasks Improve code readability Improve vertical processing In-memory sorting of newly created dataset Maintain hierarchical data structure logic through nested hashing Flexible data driven programming Thursday, September-20-18 Vincent Martin Statistics Canada First & last name Company name
7
Hash Object Syntax and Example (inner)
OASUS Spring or Fall YYYY Thursday, September-20-18 Hash Object Syntax and Example (inner) Data want; length ID $8. age 8.; if _N_=1 then do; declare hash myhash(dataset: ‘work.masterfile’); myhash.defineKey(‘ID’); myhash.defineData(‘age’); myhash.defineDone(); end; set work.transactions; if myhash.find(key: patient_id)=0 then output; drop id; run; Proc sql; create table want as select t1.*, t2.age from work.transactions as t1 inner join work.masterfile(keep=id age) as t2 on t1.patient_id = t2.id ; Quit; Thursday, September-20-18 Vincent Martin Statistics Canada First & last name Company name
8
Hash Object Syntax and Example (left)
OASUS Spring or Fall YYYY Thursday, September-20-18 Hash Object Syntax and Example (left) Data want; length ID $8. age 8.; if _N_=1 then do; declare hash myhash(dataset: ‘work.masterfile’); myhash.defineKey(‘ID’); myhash.defineData(‘age’); myhash.defineDone(); end; set work.transactions; rc=myhash.find(key: patient_id); drop id rc; run; Proc sql; create table want as select t1.*, t2.age from work.transactions as t1 left join work.masterfile(keep=id age) as t2 on t1.patient_id = t2.id ; Quit; Thursday, September-20-18 Vincent Martin Statistics Canada First & last name Company name
9
OASUS Spring or Fall YYYY
Thursday, September-20-18 Example – Inner join Thursday, September-20-18 Vincent Martin Statistics Canada First & last name Company name
10
OASUS Spring or Fall YYYY
Thursday, September-20-18 Example – Inner join Thursday, September-20-18 Vincent Martin Statistics Canada First & last name Company name
11
Example – Trigram search
OASUS Spring or Fall YYYY Thursday, September-20-18 Example – Trigram search Recent thread on SAS-L Web search against a business name dictionary Trigram matching for scoring measure IML implementation (Cartesian product) Load all trigrams for a given dictionary entry in a single row of a matrix Do the same for the search string Use the element() function to count the number of matched elements. Thursday, September-20-18 Vincent Martin Statistics Canada First & last name Company name
12
OASUS Spring or Fall YYYY
Thursday, September-20-18 Trigram search What is a trigram? A decomposition of a string into its subset of 3 adjacent characters. E.G. Dictionary entry: NOKIA -> {NOK, OKI, KIA} E.G. Search entry: NOKEA -> {NOK, OKE, KEA} Match score of 1 trigram Thursday, September-20-18 Vincent Martin Statistics Canada First & last name Company name
13
OASUS Spring or Fall YYYY
Thursday, September-20-18 Trigram search My suggestion: Trigram searching algorithm using multiple Hash Objects Hash object(1) keyed by trigram with the dictionary string id as data Hash object(2) keyed by dictionary string id with the actual string as data Hash object(3) keyed by dictionary string id with a counter as data (initialized empty) Increased data load time Significantly decreased search time Thursday, September-20-18 Vincent Martin Statistics Canada First & last name Company name
14
Example – Trigram search
OASUS Spring or Fall YYYY Thursday, September-20-18 Example – Trigram search 6 fold increase to load time Scales only with dictionary size 200 fold decrease to search time (or match time or coding time) Scales with the number of strings to code Thursday, September-20-18 Vincent Martin Statistics Canada First & last name Company name
15
Example – Keyword counting
OASUS Spring or Fall YYYY Thursday, September-20-18 Example – Keyword counting set work.long_string_to_analyze end=last; goodwords_count=0; badwords_count=0; do i=1 to countw(text); word=scan(text, i); if 0=goodwords.find() then goodwords_count+1; else if 0=badwords.find() then badwords_count+1; else do; rc=otherwords.find(); if rc=0 then do; count=count+1; otherwords.replace(); end; else do; count=1; otherwords.add(); end; drop i word; if last then otherwords.output(dataset: “work.outputhash”); run; data want; length word $30.; if _N_=1 then do; declare hash goodwords(dataset: "work.goodwords"); goodwords.definekey("word"); goodwords.definedone(); declare hash badwords(dataset: "work.badwords"); badwords.definekey("word"); badwords.definedone(); declare hash otherwords(ordered: ‘a’); otherwords.definekey(“word”); otherwords.definedata(“word”, “count”); otherwords.definedone(); end; Thursday, September-20-18 Vincent Martin Statistics Canada First & last name Company name
16
OASUS Spring or Fall YYYY
Thursday, September-20-18 Other examples Multiple file linkage Multiple sorted outputs without macros nor proc sort Programming efficiency through FCMP hashes (9.3+) Alternate name Thursday, September-20-18 Vincent Martin Statistics Canada First & last name Company name
17
OASUS Spring or Fall YYYY
Thursday, September-20-18 Conclusion Powerful tool to gain in efficiency Only limited by memory SAS Institute constantly improving the hash object functionality with new releases Rethink existing processes in terms of I/O efficiency Thursday, September-20-18 Vincent Martin Statistics Canada First & last name Company name
18
OASUS Spring or Fall YYYY
Thursday, September-20-18 References NESUG2010: Blackwell, John, Find() the power of Hash – How, Why and When to use the SAS® Hash Object Nesug2010: Dorfman, Paul and Fridman, Marina, Black Belt Hashigana SUGI28: Dorfman, Paul and Snell, Greg, Hashing: Generations Ray, Robert and Secosky, Jason, Better Hashing in SAS 9.2 SGF2013: Hendrick, Andrew, Erdman, Donald and Christian, Stacey, Hashing in Proc FCMP to Enhance your Productivity SUGI30: Lavery, Ross, The SQL Optimizer Project: _Method and _Tree in SAS 9.1 My (613) Thursday, September-20-18 Vincent Martin Statistics Canada First & last name Company name
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.