1 Chapter 4: Creating Simple Queries 4.1 Introduction to Querying Data 4.2 Filtering and Sorting Data 4.3 Creating New Columns with an Expression 4.4 Grouping and Summarizing Data in a Query 4.5 Joining Tables 4.6 Joining Tables Including Nonmatching Rows (Self-Study) 4.7 Creating New Columns by Recoding Values (Self-Study)
2 Chapter 4: Creating Simple Queries 4.1 Introduction to Querying Data 4.2 Filtering and Sorting Data 4.3 Creating New Columns with an Expression 4.4 Grouping and Summarizing Data in a Query 4.5 Joining Tables 4.6 Joining Tables Including Nonmatching Rows (Self-Study) 4.7 Creating New Columns by Recoding Values (Self-Study)
3 Objectives State the function of the Filter and Sort task and the Query Builder. Compare the functionality available in each task.
4 Filter and Sort Task and the Query Builder The Filter and Sort task and the Query Builder can be used to create a new data source from one or more tables according to the criteria specified by the user.
5
Multiple Answer Poll Double-click on any data source in your project. Select Filter and Sort and explore the available tabs. What functionality do you think is supported by this task? a.Subsetting rows b.Selecting columns c.Calculating new columns d.Controlling the sort order of the rows e.Summarizing data f.Create a SAS data set
Multiple Answer Poll – Correct Answers Double-click on any data source in your project. Select Filter and Sort and explore the available tabs. What functionality do you think is supported by this task? a.Subsetting rows b.Selecting columns c.Calculating new columns d.Controlling the sort order of the rows e.Summarizing data f.Create a SAS data set
8 Filter and Sort Task The Filter and Sort task enables you to create a new SAS table by selecting rows, columns, and a sort sequence.
9
Quiz Close the Filter and Sort task and return to the data grid. Select Query Builder. What options appear to be available that are not present in the Filter and Sort task?
Quiz – Correct Answer Close the Filter and Sort task and return to the data grid. Select Query Builder. What options appear to be available that are not present in the Filter and Sort task? Possible answers: Query name, Output name, Computed Columns, Prompt Manager, Tools, Options, Add Tables, Join Tables
12 Query Builder The Query Builder enables you to create a new SAS table by selecting rows, columns, and a sort sequence. It also enables computing new columns, joining tables, grouping, summarizing, and modifying column attributes.
13 Filter and Sort Task versus the Query Builder Filter and SortQuery Builder Sort data Yes Filter rows and columns Yes Create a new SAS data set Yes Define new columns NoYes Join tables NoYes Group and summarize data NoYes Define column attributes NoYes Remove duplicates NoYes
14 Chapter 4: Creating Simple Queries 4.1 Introduction to Querying Data 4.2 Filtering and Sorting Data 4.3 Creating New Columns with an Expression 4.4 Grouping and Summarizing Data in a Query 4.5 Joining Tables 4.6 Joining Tables Including Nonmatching Rows (Self-Study) 4.7 Creating New Columns by Recoding Values (Self-Study)
15 Objectives Apply a filter in a query. Exclude columns in a query. Reorder rows in a query.
16 Business Scenario Orion Star wants to analyze Internet sales since To prepare the data for input to the various analytic tasks, the company must generate a new data source from the orders table, including only those Internet orders placed on or after 01JAN2008. Internet Orders (Order_Type =3)
17 Filter and Sort Task The Variables, Filter, and Sort tabs in the Filter and Sort task provide functionality to select rows and columns in a designated sort order.
18 Filter and Sort: Filter Simple filters can be built using variable names, operators, and data values. Select Advanced Edit… to build more complex filters.
19 Advanced Filter Builder The Advanced Filter Builder provides access to advanced operators and SAS functions to create more complex rules for extracting rows.
20 Filter and Sort: Sort and Results You can sort by multiple variables, and designate either ascending or descending sequence. You can also name the task and output table.
21 Query Builder The Query Builder provides similar tabs for selecting columns, filtering rows, and sorting data. Additional functionality is available, including the following: modifying column properties grouping and summarizing data applying formats selecting distinct rows joining tables
22 Using Query Results in Tasks Data sources generated from queries can serve as the input data for follow-up tasks.
23 Selecting Columns and Filtering Rows
24
25 Exercise This exercise reinforces the concepts discussed previously.
26
27 Chapter 4: Creating Simple Queries 4.1 Introduction to Querying Data 4.2 Filtering and Sorting Data 4.3 Creating New Columns with an Expression 4.4 Grouping and Summarizing Data in a Query 4.5 Joining Tables 4.6 Joining Tables Including Nonmatching Rows (Self-Study) 4.7 Creating New Columns by Recoding Values (Self-Study)
28 Objectives Define a new column of data in a query by building an expression.
29 Business Scenario Orion Star wants to analyze shipment methods by determining how many days elapse between each order date and delivery date. The company also wants to calculate the total amount invoiced to the customer, which is the sum of total retail price and shipping charges. Delivery_Date - Order_Date SUM(Total_Retail_Price, Shipping)
30 Computed Columns New summarized columns, recoded columns, or columns based on an expression can be added to a query in the Query Builder. Select to begin creating a new column.
31 New Computed Column Wizard A wizard guides you through the process of creating the new column and assigning attributes such as the column name, label, and format.
32 Expression Editor The Expression Editor enables you to build expressions based on variables, operators, and functions.
33 SAS Functions Example: A SAS function is a routine that returns a value that is determined from specified arguments. General form of a SAS function: function-name(argument1,argument2,...) sum(Salary,Bonus)
34 Using SAS Functions SAS functions can do the following: perform arithmetic operations compute sample statistics (for example, sum, mean, and standard deviation) manipulate SAS dates process character values perform many other tasks Sample statistics functions ignore missing values.
35
Multiple Choice Poll What is the result of the expression given the values of Var1, Var2, and Var3? a.. (missing) b.3 c.9 d.12 Var1+Var2+Var3 Var1Var2Var3 9.3
Multiple Choice Poll – Correct Answer What is the result of the expression given the values of Var1, Var2, and Var3? a.. (missing) b.3 c.9 d.12 Var1Var2Var3 9.3 Var1+Var2+Var3
Multiple Choice Poll What is the result of the expression given the values of Var1, Var2, and Var3? a.. (missing) b.3 c.9 d.12 sum(Var1,Var2,Var3) Var1Var2Var3 9.3
Multiple Choice Poll – Correct Answer What is the result of the expression given the values of Var1, Var2, and Var3? a.. (missing) b.3 c.9 d.12 sum(Var1,Var2,Var3) Var1Var2Var3 9.3
40 Computed Columns Computed columns appear in the left pane and can be used in a filter, for sorting, or as an input to another computed column.
41 Creating a Column with an Expression This demonstration illustrates using the Computed Column wizard to define new columns based on advanced expressions. SUM(Total_Retail_Price, Shipping) Delivery_Date - Order_Date
42
43 Exercise This exercise reinforces the concepts discussed previously.
44
45 Chapter 4: Creating Simple Queries 4.1 Introduction to Querying Data 4.2 Filtering and Sorting Data 4.3 Creating New Columns with an Expression 4.4 Grouping and Summarizing Data in a Query 4.5 Joining Tables 4.6 Joining Tables Including Nonmatching Rows (Self-Study) 4.7 Creating New Columns by Recoding Values (Self-Study)
46 Objectives Assign a grouping variable in a query. Select the analysis variable and the summary statistic to compute. Filter grouped data.
47 Business Scenario Orion Star wants to offer a sales promotion that highlights the most lucrative products. The company would like a list of all products with a total profit that exceeds $500.
48 Grouping Data The Query Builder can be used to group and summarize data.
49 Grouping Data Data can be grouped and summarized using the Select Data tab. Choose a statistic for columns to be summarized. Columns without an assigned statistic will automatically define the groups.
50 Grouping by Column Values The query result includes one row for every unique value of the group column(s) and a calculated statistic for the summarized column(s).
51
Quiz 1.Open the Query Builder and use any data source in the current project. 2.Click the Filter Data tab and notice the layout. 3.Return to the Select Data tab and add any two columns. 4.For one of the columns in the Select Data tab, select Count in the Summary field. 5.Return to the Filter Data tab. How does the Filter Data tab change after a query includes grouped data?
Quiz – Correct Answer How does the Filter Data tab change after a query includes grouped data? An additional pane labeled “Filter the summarized data” is added to the Filter Data tab. With grouping Without grouping
54 Filtering Data The Filter Data tab can be used to filter both raw data and summarized data.
55 Summarizing and Filtering by Groups This demonstration illustrates grouping, summarizing, and filtering grouped data.
56
57 Exercise This exercise reinforces the concepts discussed previously.
58
59 Chapter 4: Creating Simple Queries 4.1 Introduction to Querying Data 4.2 Filtering and Sorting Data 4.3 Creating New Columns with an Expression 4.4 Grouping and Summarizing Data in a Query 4.5 Joining Tables 4.6 Joining Tables Including Nonmatching Rows (Self-Study) 4.7 Creating New Columns by Recoding Values (Self-Study)
60 Objectives Join multiple tables by common columns. Include only matching rows.
61 Business Scenario In a previous query, products with total profits exceeding $500 were identified. Analysts asked for more details about these top products, including the product category, the product, supplier, and country name. The columns to include come from three different tables. topproductsproductsCountry _lookup
62 Business Scenario To include the necessary columns, the topproducts SAS table must be joined with the products SAS table and the country_lookup Excel spreadsheet.
63 Joining Tables Joining tables enables you to extract and simultaneously process data from more than one table.
64 Joining Tables By default, the Query Builder includes matching rows only in the results.
65
Multiple Answer Poll Which customers will be returned by the Query Builder if these tables are combined using the default join type? a.Smith, John (00001) b.Anderson, Tim (00002) c.Jones, Betsy (00003) d.Customer e.Rigsbee, Marilyn (00005)
Multiple Answer Poll – Correct Answers Which customers will be returned by the Query Builder if these tables are combined using the default join type? a.Smith, John (00001) b.Anderson, Tim (00002) c.Jones, Betsy (00003) d.Customer e.Rigsbee, Marilyn (00005)
68 Tables and Joins Window Select Join Tables to access the Tables and Joins window. This window enables you to add additional tables and verify or change the criteria used to join tables.
69 Join Properties The Join Properties window provides the ability to modify the join type or condition. Selecting a different join type can be used to identify or eliminate nonmatching rows.
70 Query Options Select Options to customize the query, including the type of result produced, query limits, and the SAS server that will execute the query.
71
72 Setup for the Poll 1.Right-click on any data source in the project and select Query Builder…. 2.Select Options Server and carefully read the warning regarding the SAS server for the query.
Multiple Choice Poll Assume that you have SAS on both your local machine and a remote server. If you want to join an Excel spreadsheet on your PC with a large table on the server, what should you do? a.Nothing. Allow SAS Enterprise Guide to choose where to process the query. b.Modify the query options to force the query to process on the local server. c.Modify the query options to force the query to process on your remote SAS Server.
Multiple Choice Poll – Correct Answer Assume that you have SAS on both your local machine and a remote server. If you want to join an Excel spreadsheet on your PC with a large table on the server, what should you do? a.Nothing. Allow SAS Enterprise Guide to choose where to process the query. b.Modify the query options to force the query to process on the local server. c.Modify the query options to force the query to process on your remote SAS Server.
75 Join Results When joining tables in the Query Builder, you can also filter or sort on any of the columns from the input tables, as well as compute new columns, or group and summarize.
76 Joining Tables This demonstration illustrates how to join multiple tables and store the result in a data table.
77
78 Exercise This exercise reinforces the concepts discussed previously.
79
80 Chapter 4: Creating Simple Queries 4.1 Introduction to Querying Data 4.2 Filtering and Sorting Data 4.3 Creating New Columns with an Expression 4.4 Grouping and Summarizing Data in a Query 4.5 Joining Tables 4.6 Joining Tables Including Nonmatching Rows (Self-Study) 4.7 Creating New Columns by Recoding Values (Self-Study)
81 Objectives Perform different join types.
82 Business Scenario In an effort to improve customer retention, the Marketing Department at Orion Star would like to identify those customers in the database that did not place a recent order.
83 Joining Tables Types of Joins: Matching Rows Only (SAS Enterprise Guide default) –produces results where only the rows from one table that have a corresponding match in every other table are returned. All Rows from one or both tables –produces results where all of the matched rows from both tables and the unmatched rows from at least one table are returned. All Rows from AAll Rows from A and BAll Rows from B A B
84 Review: Matching Rows Only
85 Including Nonmatching Rows All rows from customerdatabase and itemsordered
86 All rows from customerdatabase Including Nonmatching Rows
87 Including Nonmatching Rows All rows from itemsordered
88 Join Properties (Review) The Join Properties include the ability to modify the join type or condition. Selecting a different join type can be used to identify or eliminate nonmatching rows.
89 Isolating Nonmatching Rows The query can also include a filter to isolate the nonmatching rows from one or both tables. Customers in the CustomerDatabase table who have not placed orders Filter to include only rows where Customer_ID is missing from the orders table
90
Multiple Choice Poll Which would be the most appropriate join type to begin to isolate those orders placed on products that are no longer included in the products table? a.Matching rows only b.All rows from products c.All rows from orders d.All rows from products and orders
Multiple Choice Poll – Correct Answer Which would be the most appropriate join type to begin to isolate those orders placed on products that are no longer included in the products table? a.Matching rows only b.All rows from products c.All rows from orders d.All rows from products and orders
93 Joining Tables Including Nonmatching Rows This demonstration illustrates how to change the join type to include nonmatching rows in a query.
94
95 Exercise This exercise reinforces the concepts discussed previously.
96
97 Chapter 4: Creating Simple Queries 4.1 Introduction to Querying Data 4.2 Filtering and Sorting Data 4.3 Creating New Columns with an Expression 4.4 Grouping and Summarizing Data in a Query 4.5 Joining Tables 4.6 Joining Tables Including Nonmatching Rows (Self-Study) 4.7 Creating New Columns by Recoding Values (Self-Study)
98 Objectives Recode individual values or a range of values in a column.
99 Business Scenario To further analyze profit per order, management would like to categorize each order in the following ranges: $0 to $100 $100 to $500 $500 and Above
100 Recoded Columns New columns can also be derived by recoding values from an existing column.
101 Recoded Values Recoding a column enables you to assign a value to a new column based on the value of an existing column. When Order_Type=1 Then Order_Type_Detail = 'Retail Sale' TRUE When Order_Type=3 FALSE When Order_Type=2 FALSE Then Order_Type_Detail = 'Catalog Sale' Then Order_Type_Detail = 'Internet Sale'
102
Quiz What should be assigned to the new column if Order_Type = 999? ???
Quiz – Correct Answer What should be assigned to the new column if Order_Type = 999? Possible answers: Assign a missing value. Assign ‘999’. Assign ‘Other’. ???
105 Recode a Column The New Computed Column wizard provides an option for recoding the values of an existing column in the input table.
106 Specify a Replacement The wizard enables you to specify replacements based on distinct values, ranges, or conditions. Select the new column type before you define replacement values. Determine a value for data not assigned a replacement.
107 Creating a New Column by Recoding Values This demonstration illustrates the use of the Recoding Values in a query to create a new column based conditionally on an existing column.
108
109 Chapter Review 1.Name at least three tasks that you can do in the Query Builder that you cannot do in the Filter and Sort task. 2.Can you filter or sort on a calculated column? 3.What is the default join type?
110 Chapter Review Answers 1.Name at least three tasks that you can do in the Query Builder that you cannot do in the Filter and Sort task. 2.Can you filter or sort on a calculated column? 3.What is the default join type? Yes, you can filter or sort on a column whose values are created during processing. The default join type is the inner join, or matching rows only. Define new columns. Join tables. Group and summarize data. Define column attributes. Remove duplicate rows.