1 Chapter 4: Introduction to Lookup Techniques 4.1 Introduction to Lookup Techniques 4.2 In-Memory Lookup Techniques 4.3 Disk Storage Techniques.

Slides:



Advertisements
Similar presentations
Copyright © 2006, SAS Institute Inc. All rights reserved. Think FAST! Use Memory Tables (Hashing) for Faster Merging Gregg P. Snell Data Savant Consulting.
Advertisements

Introduction to arrays
12-1 Structured COBOL Programming Nancy Stern Hofstra University Robert A. Stern Nassau Community College James P. Ley University of Wisconsin-Stout John.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
Chapter 9: Advanced Array Manipulation
Processing Data in External Storage CS Data Structures Mehmet H Gunes Modified from authors’ slides.
Programming with Microsoft Visual Basic 2005, Third Edition
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. 1 Chapter 6 - Arrays Outline 6.1Introduction 6.2Arrays.
An Introduction to Programming with C++ Fifth Edition
Arrays-Part 1. Objectives Declare and initialize a one-dimensional array Store data in a one-dimensional array Display the contents of a one-dimensional.
Microsoft Visual Basic 2005: Reloaded Second Edition Chapter 8 Arrays.
Arrays.
Chapter 6 C Arrays Acknowledgment The notes are adapted from those provided by Deitel & Associates, Inc. and Pearson Education Inc. Arrays are data structures.
C++ for Engineers and Scientists Third Edition
Hash vs Join A case study evaluating the use of the data step hash object to replace a SQL join Geoff Ness Sep 2014.
Programming Logic and Design Fourth Edition, Comprehensive
 2007 Pearson Education, Inc. All rights reserved C Arrays.
PROC SQL – Select Codes To Master For Power Programming Codes and Examples from SAS.com Nethra Sambamoorthi, PhD Northwestern University Master of Science.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
Chapter 6Java: an Introduction to Computer Science & Programming - Walter Savitch 1 l Array Basics l Arrays in Classes and Methods l Programming with Arrays.
1 Chapter 5: Creating Summarized Output 5.1 Generating Summary Statistics 5.2 Creating a Summary Report with the Summary Tables Task 5.3 Creating and Applying.
Chapter 14: Generating Data with Do Loops OBJECTIVES Understand iterative DO loops. Construct a DO loop to perform repetitive calculations Use DO loops.
Introduction to Databases Chapter 7: Data Access and Manipulation.
 2004 Prentice Hall, Inc. All rights reserved. 1 Chapter 11 - JavaScript: Arrays Outline 11.1 Introduction 11.2 Arrays 11.3 Declaring and Allocating Arrays.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. 1 Chapter 6 - Arrays Outline 6.1Introduction 6.2Arrays.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Extended Prelude to Programming Concepts & Design, 3/e by Stewart Venit and.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Chapter 16 Processing Variables with Arrays Objectives Group variables into one- and two-dimensional arrays Perform an action on array elements Create.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Chapter 15: Combining Data Horizontally 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Copyright © 2008, SAS Institute Inc. All rights reserved. Hash Objects – Why Use Them? Carolyn Cunnison SAS Technical Training Specialist.
Chapter 6 Arrays Associate Prof. Yuh-Shyan Chen Dept. of Computer Science and Information Engineering National Chung-Cheng University.
Algorithm and Programming Array Dr. Ir. Riri Fitri Sari MM MSc International Class Electrical Engineering Dept University of Indonesia 15 March 2009.
Hello.java Program Output 1 public class Hello { 2 public static void main( String [] args ) 3 { 4 System.out.println( “Hello!" ); 5 } // end method main.
CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.
Introduction to Using the Data Step Hash Object with Large Data Sets Richard Allen Peak Stat.
An Object-Oriented Approach to Programming Logic and Design Fourth Edition Chapter 12 Manipulating Larger Quantities of Data.
SQL Chapter Two. Overview Basic Structure Verifying Statements Specifying Columns Specifying Rows.
An Object-Oriented Approach to Programming Logic and Design Chapter 3 Using Methods and Parameters.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Chapter 16: Using Lookup Tables to Match Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
An Introduction to Programming with C++ Fifth Edition Chapter 11 Arrays.
Computer Programming TCP1224 Chapter 11 Arrays. Objectives Using Arrays Declare and initialize a one-dimensional array Manipulate a one-dimensional array.
Pascal Programming Today Chapter 11 1 Chapter 11.
 2007 Pearson Education, Inc. All rights reserved C Arrays.
Controlling Input and Output
Programming with Microsoft Visual Basic 2012 Chapter 9: Arrays.
An Introduction to Programming with C++ Sixth Edition Chapter 12 Two-Dimensional Arrays.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. 1 Arrays Outline 6.1Introduction 6.2Arrays 6.3Declaring.
Copyright © 2016 Pearson Education, Inc. CHAPTER 7: ADVANCED SQL (PART I) Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki.
Chapter 11 Reading SAS Data
Chapter 6: Set Operators
An Introduction to Programming with C++ Sixth Edition
Chapter 6: Modifying and Combining Data Sets
Microsoft Visual Basic 2005: Reloaded Second Edition
Chapter 4: Using Lookup Tables to Match Data: Arrays
Chapter 5: Using DATA Step Arrays
Introduction to Execution Plans
David M. Kroenke and David J
Chapter 6 - Arrays Outline 6.1 Introduction 6.2 Arrays
Combining Data Sets in the DATA step.
Introduction to Execution Plans
Chapter 8 Advanced SQL.
SQL set operators and modifiers.
Introduction to Execution Plans
Introduction to Execution Plans
Presentation transcript:

1 Chapter 4: Introduction to Lookup Techniques 4.1 Introduction to Lookup Techniques 4.2 In-Memory Lookup Techniques 4.3 Disk Storage Techniques

2 Chapter 4: Introduction to Lookup Techniques 4.1 Introduction to Lookup Techniques 4.2 In-Memory Lookup Techniques 4.3 Disk Storage Techniques

3 Objectives Define table lookup. List table lookup techniques.

4 Table Lookups Lookup values for a table lookup can be stored in the following: array hash object format data set Lookup techniques include the following: array subscript value hash object key value FORMAT statement, PUT function MERGE, SET/SET, join Data Values Lookup Values lookup

5

Multiple Choice Poll Which of these is an example of a table lookup? a.You have the data for January sales in one data set, February sales in a second data set, and March sales in a third. You need to create a report for the entire first quarter. b.You want to send birthday cards to employees. The employees’ names and addresses are in one data set and their birthdates are in another. c.You need to calculate the amount each customer owes for his purchases. The price per item and the number of items purchased are stored in the same data set.

Multiple Choice Poll – Correct Answer Which of these is an example of a table lookup? a.You have the data for January sales in one data set, February sales in a second data set, and March sales in a third. You need to create a report for the entire first quarter. b.You want to send birthday cards to employees. The employees’ names and addresses are in one data set and their birthdates are in another. c.You need to calculate the amount each customer owes for his purchases. The price per item and the number of items purchased are stored in the same data set.

8 Overview of Table Lookup Techniques Arrays, hash objects, and formats provide an in-memory lookup table. The DATA step MERGE statement, multiple SET statements in the DATA step, and SQL procedure joins use lookup values that are stored on disk.

9 Chapter 4: Introduction to Lookup Techniques 4.1 Introduction to Lookup Techniques 4.2 In-Memory Lookup Techniques 4.3 Disk Storage Techniques

10 Objectives Describe arrays as a lookup technique. Describe hash objects as a lookup technique. Describe formats as a lookup technique.

11

Multiple Answer Poll Which techniques do you currently use when you perform table lookups with a single data set? a.Arrays b.Hash object c.Formats d.None of the above

13 Overview of Arrays An array is similar to a numbered row of buckets

14 Overview of Arrays An array is similar to a numbered row of buckets. SAS puts a value in a bucket based on the bucket number

15 Overview of Arrays An array is similar to a numbered row of buckets. SAS puts a value in a bucket based on the bucket number. A value is retrieved from a bucket based on the bucket number. 1234

16 DATA data-set-name; ARRAY array-name { subscript } ; new-variable=array-name{subscript-value}; RUN; DATA data-set-name; ARRAY array-name { subscript } ; new-variable=array-name{subscript-value}; RUN; Overview of Arrays General form of the ARRAY statement:  The READ statement can be the SET, MERGE or INFILE/INPUT statement. The ARRAY statement associates variables or initial values to be retrieved using the array name and a subscript value. The assignment statement retrieves values from the array based on the value of the subscript.

17 Overview of Arrays data country_info; array Cont_Name{91:96} $ 30 _temporary_ ('North America', ' ', 'Europe', 'Africa', 'Asia', 'Australia/Pacific'); set orion.country; Continent=Cont_Name{Continent_ID}; run; The ARRAY statement associates variables or initial values to be retrieved using the array name and a subscript value. The assignment statement retrieves values from the array based on the value of the subscript. p304d01

18

19 Setup for the Poll data country_info; array Cont_Name{91:96} $ 30 _temporary_ ('North America', ' ', 'Europe', 'Africa', 'Asia', 'Australia/Pacific'); set orion.country; Continent=Cont_Name{Continent_ID}; run; p304d01

Multiple Choice Poll In p304d01, how many elements are in the array Cont_name? a.0 b.5 c.6 d.unknown

Multiple Choice Poll – Correct Answer In p304d01, how many elements are in the array Cont_name? a.0 b.5 c.6 d.unknown

22 Overview of a Hash Object A hash object is similar to rows of buckets that are identified by the value of a key. KeyData...

23 Overview of a Hash Object A hash object is similar to rows of buckets that are identified by the value of a key. SAS puts value(s) in the data bucket(s) based on the value(s) in the key bucket. KeyData...

24 Overview of a Hash Object A hash object is similar to rows of buckets that are identified by the value of a key. SAS puts value(s) in the data bucket(s) based on the value(s) in the key bucket. Value(s) are retrieved from the data bucket(s) based on the value(s) in the key bucket. KeyData

25 DATA data-set-name; IF _N_=1 THEN DO; DECLARE HASH object-name( ); object-name.DEFINEKEY('key-name'); object-name.DEFINEDATA('data-name'); object-name.DEFINEDONE(); END; return-code=object-name.FIND( ); RUN; DATA data-set-name; IF _N_=1 THEN DO; DECLARE HASH object-name( ); object-name.DEFINEKEY('key-name'); object-name.DEFINEDATA('data-name'); object-name.DEFINEDONE(); END; return-code=object-name.FIND( ); RUN; Overview of Hash Objects General form of the hash object:  The READ statement can be the SET, MERGE, or INFILE/INPUT statement. The syntax within the DO group defines and can populate the hash object. The FIND method retrieves the data value based on the key value.

26 Overview of Hash Objects data country_info; length Continent_Name $ 30; if _N_=1 then do; declare hash Cont_Name(dataset:'orion.continent'); Cont_Name.definekey('Continent_ID'); Cont_Name.definedata('Continent_Name'); Cont_Name.definedone(); end; set orion.country; rc=Cont_Name.find(key:Continent_ID); if rc=0; run; The syntax within the DO group defines and populates the hash object. The FIND method retrieves the data value based on the key value. p304d02

27

28 Setup for the Poll data country_info; length Continent_Name $ 30; if _N_=1 then do; declare hash Cont_Name(dataset:'orion.continent'); Cont_Name.definekey('Continent_ID'); Cont_Name.definedata('Continent_Name'); Cont_Name.definedone(); end; set orion.country; rc=Cont_Name.find(key:Continent_ID); if rc=0; run; p304d02

Multiple Choice Poll In p304d02, how many times do the statements in the DO group execute? a.only once b.once for every observation in the data set orion.country c.once for every observation in the data set orion.continent

Multiple Choice Poll – Correct Answer In p304d02, how many times do the statements in the DO group execute? a.only once b.once for every observation in the data set orion.country c.once for every observation in the data set orion.continent

31 Overview of a Format A format is similar to rows of buckets that are identified by the data value. Data ValueLabel...

32 Overview of a Format A format is similar to rows of buckets that are identified by the data value. SAS puts data values and label values in the buckets when the format is used in a FORMAT statement, PUT function, or PUT statement. Data ValueLabel...

33 Overview of a Format A format is similar to rows of buckets that are identified by the data value. SAS puts data values and label values in the buckets when the format is used in a FORMAT statement, PUT function, or PUT statement. SAS uses a binary search on the data value bucket in order to return the value in the label bucket. Data ValueLabel

34 Overview of a Format General form of the user-defined format:  The READ statement can be the SET, MERGE, or INFILE/INPUT statement. PROC FORMAT; VALUE fmtname range-1=label-1... range-n=label-n; RUN; DATA data-set-name; ; new-variable=PUT(variable,fmtname.); RUN; PROC FORMAT; VALUE fmtname range-1=label-1... range-n=label-n; RUN; DATA data-set-name; ; new-variable=PUT(variable,fmtname.); RUN; When the PUT function executes, the format is loaded into memory, and a binary search is used to retrieve the format value. The FORMAT step compiles the format and stores it on disk.

35 Overview of a Format proc format; value Cont_Name 91='North America' 93='Europe' 94='Africa' 95='Asia' 96='Australia/Pacific'; run; data country_info; set orion.country; Continent=put(Continent_ID,Cont_Name.); run; When the PUT function executes, the format is loaded into memory, and a binary search is used to retrieve the format value. The FORMAT step compiles the format and stores it on disk. p304d03

36 Chapter 4: Introduction to Lookup Techniques 4.1 Introduction to Lookup Techniques 4.2 In-Memory Lookup Techniques 4.3 Disk Storage Techniques

37 Objectives List methods for combining data horizontally. Use multiple SET statements to combine data horizontally. Compare methods for combining SAS data sets.

38 Combining Data Horizontally DATA step techniques for combining data horizontally include using the following: MERGE statement multiple SET statements UPDATE statement MODIFY statement In addition, you can use the SQL procedure with an inner or outer join.

39

Multiple Answer Poll Which techniques do you currently use when you perform table lookups with multiple data sets? a.MERGE statement b.Joins c.Multiple SET statements d.UPDATE statement e.MODIFY statement f.None of the above

41 Overview of Merges and Joins The DATA step MERGE and the SQL join operators are similar to multiple stacks of buckets that are referred to by the value of one or more common variables. By Value(s)Data By Value(s)Data

42 DATA Step MERGE Statement General form of the DATA step merge: Matches on equal values for like-named variables: Continent_ID DATA data-set-name; MERGE SAS-data-sets; BY variables; RUN; DATA data-set-name; MERGE SAS-data-sets; BY variables; RUN;

43 DATA Step MERGE Statement proc sort data=orion.country out=country; by Continent_ID; run; data country_info; merge country orion.continent; by Continent_ID; run; Matches on equal values for like-named variables p304d04

44

45 Setup for the Poll proc sort data=orion.country out=country; by Continent_ID; run; data country_info; merge country orion.continent; by Continent_ID; run; p304d04

Multiple Choice Poll In p304d04, if the data set country has seven observations and the data set orion.continent has five observations, what stops the execution of the DATA step? a.end of file for work.country, the data set with the most observations b.end of file for orion.continent, the last data set listed in the MERGE statement c.end of file for the data set that contains the final value of the BY variable Continent_ID

Multiple Choice Poll – Correct Answer In p304d04, if the data set country has seven observations and the data set orion.continent has five observations, what stops the execution of the DATA step? a.end of file for work.country, the data set with the most observations b.end of file for orion.continent, the last data set listed in the MERGE statement c.end of file for the data set that contains the final value of the BY variable Continent_ID

48 You can use an SQL procedure inner or outer join to create a SAS data set. General form of the SQL procedure CREATE TABLE statement with an inner join: PROC SQL; CREATE TABLE SAS-data-set AS SELECT column-1, column-2,…,column-n FROM table-1, table-2,…,table-n WHERE joining criteria ORDER BY sorting criteria; QUIT; PROC SQL; CREATE TABLE SAS-data-set AS SELECT column-1, column-2,…,column-n FROM table-1, table-2,…,table-n WHERE joining criteria ORDER BY sorting criteria; QUIT; The SQL Procedure Performs an inner join based on the WHERE criteria

49 The SQL Procedure proc sql; create table country_info as select country.*, Continent_Name from orion.country, orion.continent where country.Continent_ID= continent.Continent_ID; order by country.Continent_ID; quit; Performs an inner join where the Continent_ID values from both data sets are equal p304d05

50

Multiple Choice Poll Which of the following is true of the SQL inner join? a.The resulting data set contains only the observations with matching key values. b.The resulting data set contains both the observations with matching key values and those observations where the key values do not match.

Multiple Choice Poll – Correct Answer Which of the following is true of the SQL inner join? a.The resulting data set contains only the observations with matching key values. b.The resulting data set contains both the observations with matching key values and those observations where the key values do not match.

53 Multiple SET Statements The DATA step with multiple SET statements combines data sets by performing one-to-one reading. Data

54 Multiple SET Statements You can use multiple SET statements to combine observations from several SAS data sets. When you use multiple SET statements, the following occurs: Processing stops when SAS encounters the end-of-file marker on either data set. The variables in the PDV are not reinitialized when a second SET statement is executed.

55 Multiple SET Statements General form of the DATA step with multiple set statements: DATA data-set-name; SET SAS-data-set; RUN; DATA data-set-name; SET SAS-data-set; RUN;

56 Multiple SET Statements data country_info; set orion.country; set orion.continent; run; Country_ Country_ Continent_ Country_Former Obs Country Name Population ID ID Name Continent_Name 1 AU Australia 20,000, North America 2 CA Canada Europe 3 DE Germany 80,000, East/West Germany Africa 4 IL Israel 5,000, Asia 5 TR Turkey 70,000, Australia/Pacific p304d06 Listing of country_info

57 Execution... one XY two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 12.1

58 Execution... one XY two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 12A.1 D

59 Execution... one XY two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 12A31 D

60 Execution... one XY two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 12A31 Implicit OUTPUT; Implicit RETURN; D

61 Execution... one XY two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 12A.2 Initialize PDV. D

62 Execution... one XY two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 23A.2 D

63 Execution... one XY two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 23B.2 D

64 Execution... one XY two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 23B52 D

65 Execution... one XY two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 23B52 Implicit OUTPUT; Implicit RETURN; D

66 Execution... one XY two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 23B.3 Initialize PDV. D

67 Execution... one XY two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 34B.3 D

68 Execution three XYZTotal 12A3 23B5 one XY two Z A B data three; set one; set two; Total=X+Y; run; PDV XYZTotal_N_ 34B.3 EOF D Processing stops.

69

70 Setup for the Poll The previous example created a data set named three with two observations. Using the same one and two data sets, if the SET statements were reversed, how many observations would be in the data set three? data three; set one; set two; Total=X+Y; run; one XY two Z A B data three; set two; set one; Total=X+Y; run;

Multiple Choice Poll Using the same one and two data sets, if the SET statements were reversed, how many observations would be in the data set three? a.5 b.2 c.3 d.6

Multiple Choice Poll – Correct Answer Using the same one and two data sets, if the SET statements were reversed, how many observations would be in the data set three? a.5 b.2 c.3 d.6

73 DATA Step Methods for Reading SAS Data CodeWhich variables are reinitialized to missing at the top of the DATA step? What stops the DATA step? data two; set one; New_Var=Value; run; variables created in the DATA stepend of the file for data set one data three; merge one two; by Var; New_Var=Value; run; variables created in the DATA step all variables when the BY value changes the last end of file that is encountered data three; set one two; New_Var=Value; run; variables created in the DATA step all variables when SAS finishes reading data set one and starts reading data set two end of the file for data set two data three; set one; set two; New_Var=Value; run; variables created in the DATA stepthe first end of file that is encountered

74 Chapter Review 1.What are the three types of in-memory table lookups? 2.What are three types of disk storage table lookups? 3.When multiple SET statements are executed, when does execution stop?

75 Chapter Review – Correct Answers 1.What are the three types of in-memory table lookups? arrays, hash objects, and formats 2.What are three types of disk storage table lookups? PROC SQL, the DATA step with a MERGE statement, or the DATA step with multiple SET statements 3.When multiple SET statements are executed, when does execution stop? Execution stops when the first end of file is encountered.