SQL Chapter Two
Overview Basic Structure Verifying Statements Specifying Columns Specifying Rows
Introduction SQL is a modular language that uses statements and clauses.
Basic structure of PROC SQL: PROC SQL; statement (select) clauses (from, where, group by, having, order by); QUIT; Note: place semicolon at the end of the last clause only.
Statements select - specifies the columns to be selected Select statement has the following features: -selects data that meets certain conditions -groups data -specifies an order for the data -formats data -calculates new variables
Clauses from - specifies the tables to be queried where - subsets the data based on a condition - optional group by - classifies the data into groups - optional having - subsets groups of data based on a group condition order by - sorts row by the values of specific columns Note: the order of the clauses are significant.
Basic Structure Verifying Statements Specifying Columns Specifying Rows Overview
Verifying Statements Two functions that can be used to verify if your statement syntax are: validate - used to check the select statement syntax noexec - checks for invalid syntax in all types of SQL statements
Validate proc sql; validate select timemile, restpulse, maxpulse from project.fitness where timemile gt 7; NOTE: PROC SQL statement has valid syntax. proc sql; validate select timemile, restpulse, maxpulse, from project.fitness where timemile gt 7; Syntax error, expecting one of the following: a quoted string, !, !!, &...
NoExect proc sql noexec; select timemile, restpulse, maxpulse from project.fitness where timemile gt 7; NOTE: Statement not executed due to NOEXEC option.
Contrasting Features of validate: -tests syntax of query without executing the query -checks the validity of column name -prints error messages for invalid queries -is only used for select statements Features of noexec: -Checks for invalid syntax in all types of SQL statements
Overview Basic Structure Verifying Statements Specifying Columns Specifying Rows
Specifying Columns Objectives -Displaying columns directly from a table -Displaying columns calculated from other columns -Calculating columns using a CASE expression
Displaying data from a table To print all of a table columns in the order that they were stored, use an asterisk in the SELECT statement: PROC SQL; SELECT * FROM VITALS; QUIT; PATIENT PULSE TEMP BPS BPD
Printing Specify Columns If you do not want to print out all columns in a table in the order that they were stored, you can specify the columns to be printed in the order that you want them in the SELECT statement or CASE EXPRESSION in the select statement. PROC SQL; CREATE TABLE TESTMED AS SELECT PATIENT, CASE ((PATIENT/2 = INT(PATIENT/2)) + (PATIENT =.)) WHEN 1 THEN 'Med A' WHEN 0 THEN 'Med B' ELSE 'Error' END AS DOSEGRP LENGTH=5 FROM VITALS ORDER BY PATIENT; QUIT; PATIENT DOSEGRP 101 Med B 101 Med B 102 Med A 102 Med A 103 Med B 104 Med A
Calculating Columns We can calculate a new column by using data in an existing column and then naming the new column using the as function. Calculate the proportion of Units form each country CODE: OUTPUT:
Calculated columns using SAS Dates Recall from previous chapters in our SAS book that dates are stored in a different format when run through SAS. We will then use these dates to calculate new columns.
Example: Calculate the range of dates in a Dailyprices dataset. CODE: OUTPUT:
Creating new columns The use of CASE expression can be used to create a new column CODE: OUTPUT:
Creating a table To create and populate a table with the rows from an SQL query, use create table. proc sql; create table states as select state_code, state_name from d2data.state; quit; State_ Obs Code State_Name 99 UT Utah 100 VT Vermont 101 VA Virginia 102 WA Washington 103 WV West Virginia 104 WI Wisconsin 105 WY Wyoming 106 N/A
Overview Basic Structure Verifying Statements Specifying Columns Specifying Rows
Specifying Rows in a table Objectives -Selecting a subset of rows -Removing duplicate rows -Subsetting using where clauses, escape clauses, and calculated values
Selecting a subset of rows proc sql; title 'large orders'; select Product_ID, total_retail_price from d2data.order_item where total_retail_price > 1000; quit; Large orders Total Retail Price Product ID For This Product $1, $1, $1, $1, $1, $1, $1, $1, $1, $1, $1, $1,937.20
Where clause Use a where to specify a condition that data must fulfill before being selected. CODE: OUTPUT: Where clauses uses common comparisons (lt, gt, eq, etc) and logical operators (OR, Not, And, In, Is Null,...).
Removing duplications Use distinct keyword to eliminate duplications. CODE (without DISTINCT): CODE (with DISTINCT): OUTPUT:
Escape Clause The escape clause allows you to designate a single character that will indicate how proc sql will interpret LIKE wildcards when SAS is searching within a character string. CODE: OUTPUT: Example: Select observations from a string variable containing an underscore ('_').
Subsetting calculated values Since the where clause is evaluated before the select, it's possible for an error to show up since the columns used in the where clause must exist in the table or be derived from an existing column. There are two fixes for this, the first would be repeating the calculation in the where clause. The alternative method would be using CALCULATED keyword to refer to an already calculated column in the select.
Subsetting calculated values proc sql; title 'Lack of profit'; select Product_ID, ((total_retail_price/quan tity) - costprice_per_Unit) as profit from d2data.order_item where calculated profit < 3; quit; title; Lack of profit Product ID profit
SummarySummary Basic Structure PROC SQL; statement (select) clauses (from, where, group by, having, order by); QUIT; Verifying Statements validate - used to check the select statement syntax noexec - checks for invalid syntax in all types of SQL statements Specifying Columns Displaying columns directly from a table Displaying columns calculated from other columns Calculating columns using a CASE expression Specifying Rows Selecting a subset of rows Removing duplicate rows Subsetting using where clauses, escape clauses, and calculated values