An Introduction to SQL
Structured Query Language Structured Query Language (SQL) is a standardized language originally designed as a relational database query tool. SQL is currently used in software products to retrieve and update data.
Structured Query Language: Timeline 1970 1980 1990 2000 IBM develops SQL. 1970 – Dr. E. F. Codd of IBM proposes SQL. 1981 – First commercial SQL product is released. 1989 – More than 75 SQL-based systems exist. SAS 6.06 includes PROC SQL. 1999 – PROC SQL is enhanced for SAS 8. 2004 – PROC SQL is enhanced for SAS®9.
The SQL Procedure Enables the use of SQL in SAS Part of Base SAS Follows American National Standards Institute (ANSI) standards Includes enhancements for compatibility with SAS software
PROC SQL Features
Query SAS data sets proc sql ; select * from orion.employee_payroll having salary=max(salary) ; quit;
Generate reports from SAS data sets proc sql ; select mean(salary) "Average Salary" format=dollar12., employee_gender from orion.employee_payroll group by employee_gender ; quit;
Combine SAS data sets in many ways Inner Joins Full Right Left Outer Joins
Inner joins Return only matching rows Maximum of 256 tables can be joined at the same time.
Outer Joins Return all matching rows, plus nonmatching rows from one or both tables Can be performed on only two tables or views at a time. Left Full Right
Create and delete SAS data sets, views, and indexes proc sql ; create table newbp as select mean(BPXSY1,BPXSY2,BPXSY3,BPXSY4) as mnsbp, mean(BPXDI1,BPXDI2,BPXDI3,BPXDI4) as mndbp, seqn from nh9.bloodpressure ; select n(mnsbp) "mnsbp",n(mndbp) "mndbp" from newbp quit;
Update existing SAS data sets data tmp; input x b $ @@; datalines; 1 a1 1 a2 2 b1 2 b2 4 d ; proc print data=tmp; title "tmp"; run; title; proc sql; update tmp set x=x*2 where b contains "a"; select * from tmp; quit;
Access Meta Data proc sql ; select memname,name,label from dictionary.columns where libname="FRAM" and upcase(label) contains "CHOL"; quit;
Create Macro Variables proc sql ; select mean(age) into : mnage from fram.framexam5subset ; quit; %put Average age: &mnage;
(Sometimes) reproduce the results of multiple DATA and procedure steps with a single query proc sql; create table analysis as select a.seqn,mortstat=1 as dead,permth_exm, mean(BPXSY1,BPXSY2,BPXSY3,BPXSY4) as mnsbp, mean(BPXDI1,BPXDI2,BPXDI3,BPXDI4) as mndbp, riagendr=1 as male, ridageyr as age, ridreth2 as race_ethn, lbdhdl as hdl, lbxtc as chol, bmxbmi as bmi from nh9.mortality(keep=seqn eligstat mortstat permth_exm) a, nh9.bloodpressure(keep=seqn bpxsy1-bpxsy4 BPXDI1-BPXDI4) b, nh9.demographics (keep=seqn ridageyr riagendr RIDRETH2) c, nh9.bodymeasurements(keep=seqn bmxbmi) d, nh9.cholesterolhdl(keep= seqn LBDHDL LBXTC) e where eligstat eq 1 and a.seqn=b.seqn and b.seqn=c.seqn and c.seqn=d.seqn and d.seqn=e.seqn order by seqn ; quit; Example from Create Nhanes1999 Analytic File, 11 data/sort steps replaced by one sql
Structured Query Language Input Output SAS Data Set Report PROC SQL PROC SQL DBMS Table SAS Data Set SAS Data View SAS Data View DBMS Table
Terminology Data Processing SAS SQL File Data Set Table Record Observation Row Field Variable Column
The SQL Procedure Tool for querying data Tool for data manipulation and management An augmentation to the DATA step not A DATA step replacement