Download presentation
Presentation is loading. Please wait.
Published byRoderick Gilbert Modified over 9 years ago
1
Copyright © 2008, SAS Institute Inc. All rights reserved. Hash Objects – Why Use Them? Carolyn Cunnison SAS Technical Training Specialist
2
Copyright © 2008, SAS Institute Inc. All rights reserved. Agenda What are a Hash objects? When should I use them? Some sample code.
3
Copyright © 2008, SAS Institute Inc. All rights reserved. 3 What are HASH objects? Keys Data... Hash object can be thought of as rows of keys and data loaded into memory.
4
Copyright © 2008, SAS Institute Inc. All rights reserved. Advantages of Hash Objects Values can be hard-coded or loaded from a SAS data set. Keys and data can be a mixture of character and numeric. Provides in-memory data storage and retrieval. Does not require that data be sorted. Is sized dynamically.
5
Copyright © 2008, SAS Institute Inc. All rights reserved. When to Use Hash Objects (1) Joining tables I cut my processing time by 90% using hash tables - You can do it too!; Jennifer K. Warner- Freeman http://www.nesug.info/Proceedings/nesug07/bb/bb1 6.pdfhttp://www.nesug.info/Proceedings/nesug07/bb/bb1 6.pdf Jennifer took an existing Proc SQL join which took between 2 and 4 hours to run. When she rewrote the program to use Hash tables, the program ran in 11 minutes.
6
Copyright © 2008, SAS Institute Inc. All rights reserved. When to Use Hash Objects (2) Summary-less summarization Hash-Crash and Beyond; Paul Dortman et al http://www2.sas.com/proceedings/forum2008/037- 2008.pdfhttp://www2.sas.com/proceedings/forum2008/037- 2008.pdf Compared PROC SUMMARY with NWAY option to Hash Object proc summary data = input nway ; class k1 k2 ; var num ; output out = summ_sum (drop = _:) sum = sum ; The Hash Object did “the job more than twice as fast at the same time utilizing ⅓ the memory”
7
Copyright © 2008, SAS Institute Inc. All rights reserved. When to Use Hash Objects (3) Dynamically output to multiple files Paul Dortman paper (continued) Use a Hash table instead of the following: Data out1 out2; Set tablein;… If id = 1 then output out1; Else if id = 2 then output out2;
8
Copyright © 2008, SAS Institute Inc. All rights reserved. When to Use Hash Objects ( 4) Removing data extremes Knowledge Base Sample 25990 http://support.sas.com/kb/25/990.html Removes top and bottom 10% of data values.
9
Copyright © 2008, SAS Institute Inc. All rights reserved. When to Use Hash Objects (5) Perform data sampling without Proc Surveyselect Better Hashing in SAS9.2; Robert Ray and Jason Secosky http://support.sas.com/rnd/base/datastep/dot/better- hashing-sas92.pdfhttp://support.sas.com/rnd/base/datastep/dot/better- hashing-sas92.pdf Select observations from a table without replacement. Perform sampling and data manipulation in one step.
10
Copyright © 2008, SAS Institute Inc. All rights reserved. 10 Terminology Partial list of methods: ObjectsMethods HASHDefinedata Add - a row Definekey Remove - a row Definedone Replace - data for key Find Delete - hash table HITERFirst Last Next Prev
11
11 Business Scenario You need to read orion.product_list and then look up information in the orion.supplier table. Structure of orion.product_list Structure of orion.supplier
12
12 Loading Data from a SAS Data Set data supplier_info; drop rc; length Supplier_Name $40 Supplier_Address $ 45 Country $ 2; if _N_=1 then do; declare hash S(dataset:'orion.supplier'); S.definekey('Supplier_ID'); S.definedata('Supplier_Name', 'Supplier_Address','Country'); S.definedone(); call missing(Supplier_Name, Supplier_Address,Country); end; set orion.product_list; rc=S.find(); if rc=0; run; p305d02
13
13... Partial PDV Supplier_ Name Supplier_ Address Country Product _ID Product_ Name Supplier _ID.. rc_N_. 1... D Partial HASH Object S KEY: Supplier _ID DATA: Supplier_ Name DATA: Supplier_ Address DATA: Country 50 Scandinavian Clothing A/S Kr. Augusts Gate 13 NO 109Petterson AB Blasieh- olmstorg 1 SE 316 Prime Sports Ltd 9 Carlisle Place GB........................ 3298A Team Sports 2687 Julie Ann Ct US data supplier_info; drop rc; length Supplier_Name $40 Supplier_Address $ 45 Country $ 2; if _N_=1 then do; declare hash S(dataset:'orion.supplier'); S.definekey('Supplier_ID'); S.definedata('Supplier_Name', 'Supplier_Address', 'Country'); S.definedone(); call missing(Supplier_Name, Supplier_Address, Country); end; set orion.product_list; rc=S.find(); if rc=0; run; Execution
14
14... Partial PDV Supplier_ Name Supplier_ Address Country Product _ID Product_ Name Supplier _ID 210200100009 Kids Sweat Round Neck,Large Logo 3298 rc_N_ 06... D Partial HASH Object S KEY: Supplier _ID DATA: Supplier_ Name DATA: Supplier_ Address DATA: Country 50 Scandinavian Clothing A/S Kr. Augusts Gate 13 NO 109Petterson AB Blasieh- olmstorg 1 SE 316 Prime Sports Ltd 9 Carlisle Place GB........................ 3298A Team Sports 2687 Julie Ann Ct US data supplier_info; drop rc; length Supplier_Name $40 Supplier_Address $ 45 Country $ 2; if _N_=1 then do; declare hash S(dataset:'orion.supplier'); S.definekey('Supplier_ID'); S.definedata('Supplier_Name', 'Supplier_Address', 'Country'); S.definedone(); call missing(Supplier_Name, Supplier_Address, Country); end; set orion.product_list; rc=S.find(); if rc=0; run; Execution
15
15... Partial PDV Supplier_ Name Supplier_ Address Country Product _ID Product_ Name Supplier _ID A Team Sports 2687 Julie Ann Ct US 210200100009 Kids Sweat Round Neck,Large Logo 3298 rc_N_ 06... D Partial HASH Object S KEY: Supplier _ID DATA: Supplier_ Name DATA: Supplier_ Address DATA: Country 50 Scandinavian Clothing A/S Kr. Augusts Gate 13 NO 109Petterson AB Blasieh- olmstorg 1 SE 316 Prime Sports Ltd 9 Carlisle Place GB........................ 3298A Team Sports 2687 Julie Ann Ct US data supplier_info; drop rc; length Supplier_Name $40 Supplier_Address $ 45 Country $ 2; if _N_=1 then do; declare hash S(dataset:'orion.supplier'); S.definekey('Supplier_ID'); S.definedata('Supplier_Name', 'Supplier_Address', 'Country'); S.definedone(); call missing(Supplier_Name, Supplier_Address, Country); end; set orion.product_list; rc=S.find(); if rc=0; run; Execution
16
16... Partial PDV Supplier_ Name Supplier_ Address Country Product _ID Product_ Name Supplier _ID A Team Sports 2687 Julie Ann Ct US 210200100009 Kids Sweat Round Neck,Large Logo 3298 rc_N_ 06... D Partial HASH Object S KEY: Supplier _ID DATA: Supplier_ Name DATA: Supplier_ Address DATA: Country 50 Scandinavian Clothing A/S Kr. Augusts Gate 13 NO 109Petterson AB Blasieh- olmstorg 1 SE 316 Prime Sports Ltd 9 Carlisle Place GB........................ 3298A Team Sports 2687 Julie Ann Ct US data supplier_info; drop rc; length Supplier_Name $40 Supplier_Address $ 45 Country $ 2; if _N_=1 then do; declare hash S(dataset:'orion.supplier'); S.definekey('Supplier_ID'); S.definedata('Supplier_Name', 'Supplier_Address', 'Country'); S.definedone(); call missing(Supplier_Name, Supplier_Address, Country); end; set orion.product_list; rc=S.find(); if rc=0; run; Execution True
17
17... Partial PDV Supplier_ Name Supplier_ Address Country Product _ID Product_ Name Supplier _ID A Team Sports 2687 Julie Ann Ct US 210200100009 Kids Sweat Round Neck,Large Logo 3298 rc_N_ 06... D Partial HASH Object S KEY: Supplier _ID DATA: Supplier_ Name DATA: Supplier_ Address DATA: Country 50 Scandinavian Clothing A/S Kr. Augusts Gate 13 NO 109Petterson AB Blasieh- olmstorg 1 SE 316 Prime Sports Ltd 9 Carlisle Place GB........................ 3298A Team Sports 2687 Julie Ann Ct US data supplier_info; drop rc; length Supplier_Name $40 Supplier_Address $ 45 Country $ 2; if _N_=1 then do; declare hash S(dataset:'orion.supplier'); S.definekey('Supplier_ID'); S.definedata('Supplier_Name', 'Supplier_Address', 'Country'); S.definedone(); call missing(Supplier_Name, Supplier_Address, Country); end; set orion.product_list; rc=S.find(); if rc=0; run; Execution Implicit OUTPUT; Implicit RETURN;
18
18... Partial PDV Supplier_ Name Supplier_ Address Country Product _ID Product_ Name Supplier _ID A Team Sports 2687 Julie Ann Ct US 210200100009 Kids Sweat Round Neck,Large Logo 3298 rc_N_ 06... D Partial HASH Object S KEY: Supplier _ID DATA: Supplier_ Name DATA: Supplier_ Address DATA: Country 50 Scandinavian Clothing A/S Kr. Augusts Gate 13 NO 109Petterson AB Blasieh- olmstorg 1 SE 316 Prime Sports Ltd 9 Carlisle Place GB........................ 3298A Team Sports 2687 Julie Ann Ct US data supplier_info; drop rc; length Supplier_Name $40 Supplier_Address $ 45 Country $ 2; if _N_=1 then do; declare hash S(dataset:'orion.supplier'); S.definekey('Supplier_ID'); S.definedata('Supplier_Name', 'Supplier_Address', 'Country'); S.definedone(); call missing(Supplier_Name, Supplier_Address, Country); end; set orion.product_list; rc=S.find(); if rc=0; run; Execution Continue until EOF
19
19 Results proc print data=supplier_info(obs=10); var Product_ID Supplier_ID Supplier_Name Supplier_Address Country; title "Product Information"; run; Product Information Obs Product_ID Supplier_ID Supplier_Name Supplier_Address Country 1 210200100009 3298 A Team Sports 2687 Julie Ann Ct US 2 210200100017 3298 A Team Sports 2687 Julie Ann Ct US 3 210200200022 6153 Nautlius SportsWear Inc 56 Bagwell Ave US 4 210200200023 6153 Nautlius SportsWear Inc 56 Bagwell Ave US 5 210200300006 1303 Eclipse Inc 1218 Carriole Ct US 6 210200300007 1303 Eclipse Inc 1218 Carriole Ct US 7 210200300052 1303 Eclipse Inc 1218 Carriole Ct US 8 210200400020 1303 Eclipse Inc 1218 Carriole Ct US 9 210200400070 1303 Eclipse Inc 1218 Carriole Ct US 10 210200500002 772 AllSeasons Outdoor Clothing 553 Cliffview Dr US Partial PROC PRINT Output
20
20 Could I do the same thing with a MERGE ? Yes. But …… Would have to sort both tables. Reading from disk is slower than reading from memory.
21
21 What about data size ? Scalability of Table Lookup Techniques, Rick Langston http://support.sas.com/resources/papers/proceedi ngs09/037-2009.pdf http://support.sas.com/resources/papers/proceedi ngs09/037-2009.pdf Compared Hash table, Sort/Merge, Indexing, Proc SQL and Proc Format as table lookup techniques. Hash object processing was successful up to around 1,900,000 rows and then ran out of memory.
22
22 Did you know that…. PROC SQL sometimes uses hashing to join tables. Possible processing methods are: sqxjsl - Step Loop Join (Cartesian product) sqxjm - Merge Join sqxjndx- Index Join sqxjhsh- Hash Join To view the method used: Proc sql _method;
23
23 The HITER object The HITER object must point to a HASH object. Read the HITER using the following methods.
24
Copyright © 2008, SAS Institute Inc. All rights reserved. Conclusion Hash and Hiter objects are very flexible. Data has to fit into memory. Results will depend on your data, your environment, and what you are trying to do. You have to benchmark.
25
Copyright © 2008, SAS Institute Inc. All rights reserved. 25 Want to know more? SAS Programming III: Advanced Techniques and Efficiencies https://support.sas.com/edu/schedules.html?ctry=ca&id=279 Also available as Live Web course.
26
Copyright © 2008, SAS Institute Inc. All rights reserved. Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.