Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAS ® PROC SQL or Vanilla Flavor Cecilia Mauldin January 31 2007.

Similar presentations


Presentation on theme: "SAS ® PROC SQL or Vanilla Flavor Cecilia Mauldin January 31 2007."— Presentation transcript:

1 SAS ® PROC SQL or Vanilla Flavor Cecilia Mauldin January 31 2007

2 What is Special in SAS PROC SQL? Things you already know from data step programming Some differences between data step and proc sql syntax Order of execution How good is my code?

3 DATA dogs; LENGTH name breed $20 dob $10; INPUT name breed dob; CARDS; NalaAirdale 11jan1990 NinaAMBR 22apr1997 JadziaPointer 15feb1999 RoccoMMBNR 08may1980 BrujaRottweiler 15aug2000 ChacalRottweiler 29jan1990 ChataBoxer UNK TysonRottweiler 15mar2000 ; RUN; DATA parents; LENGTH parent name $20; INPUT parent name; CARDS; InaNala InaChata InaBruja InaFidonga ChavoRocco Cecilia Nina Cecilia Jadzia ; RUN; The Data

4 What is Special in SAS PROC SQL? Things you already know from data step programming Some differences between data step and Proc SQL syntax Order of execution How good is my code?

5 Things You Already Know from Data Step Programming Our old friends: where, keep, drop and rename In data step –How do we use them? –Why do we use them? In SAS PROC SQL –How do we use them? –Why do we use them?

6 Things You Already Know from Data Step Programming DATA new_data (KEEP= RENAME=( ) DROP= WHERE=( )); SET old_data (KEEP= RENAME=() DROP= WHERE=( )); SAS data steps … ; RUN; It is written from the PDV when specified on an output data set. Is applied when the dataset OLD_DATA is read into the PDV.

7 Data Step: Why Do We Use Our Old Friends? No need to keep observations and/or variables that are not used in the output data set. No need to carry observations or variables that are not going to be used in the PDV. DATA new_data (KEEP= RENAME= ( ) DROP= WHERE=( )); SET old_data (KEEP= RENAME=() DROP= WHERE=( )); SAS data steps … ; RUN;

8 PROC SQL SQL_OPTIONS; CREATE TABLE example (KEEP= …. RENAME=( ….) DROP= WHERE=( ….)) AS SELECT * FROM data_set (KEEP= var1 var2…. RENAME=(var1 = newvar1 var2 = newvar2 ….) DROP= WHERE=(….)); QUIT; Proc SQL: How Do We Use Them?

9 Why Do We Use Them? Typing convenience Right, left and full joins

10 Typing Convenience Your dataset has 50 variables, you need to use 45 –Do you want to type 45 variables? –Do you carry 5 variables that you don’t need?

11 Typing Convenience cont’d SELECT * FROM data_set (DROP=var46 … var50); SELECT var1, var2, ∙∙∙, var45 FROM data_set;

12 Typing Convenience cont’d SELECT * FROM dogs (DROP=dob RENAME=(name=firstname)) ; SELECT name AS firstname, breed FROM dogs ; First name Breed Nala Airdale Nina AMBR Jadzia Pointer Rocco MMBNR Bruja Rottweiler Chacal Rottweiler Chata Boxer Tyson Rottweiler

13 Joins Avoid ‘warning’ messages when creating a data set because two variables in two different datasets have the same name Limit the records that come in a join

14 Joins cont’d Avoid WARNING messages when 2 variables have the same name CREATE TABLE final AS SELECT a.*, b.* FROM data_set1 a, data_set2 b WHERE a.var1=b.var1 AND a.var2=b.var2 AND a.var3=b.var3 ;

15 Joins cont’d Avoid WARNING messages when 2 variables have the same name CREATE TABLE family AS SELECT a.*, b.* FROM dogs a, parent b WHERE a.name=b.name ; WARNING: Variable Name already exists on file WORK.FAMILY. NOTE: Table WORK.FAMILY created, with 6 rows and 4 columns.

16 Joins cont’d Avoid WARNING messages when 2 variables have the same name CREATE TABLE final (DROP=VAR1B VAR2B VAR3B) AS SELECT a.*, b.* FROM data_set1 A, data_set2 (RENAME=(var1=var1b var2=var2b var3=var3b)) B WHERE a.var1=b.var1b AND a.var2=b.var2b AND a.var3=b.var3b;

17 Joins cont’d CREATE TABLE family (DROP=nickname) AS SELECT a.*, b.* FROM dogs a, parents (RENAME=(name=nickname)) b WHERE a.name=b.nickname ; NamebreedDOBParent NalaAirdale11JAN1990Ina NinaAMBR22APR2001Cecilia JadziaPointer15FEB1999Cecilia RoccoMMBNR08MAY2001Chavo BrujaRottweiler15AUG2000Ina ChataBoxer. Ina

18 Joins cont’d SELECT a.*, b.* FROM dogs a LEFT JOIN parents b ON a.name=b.name AND a.breed='Rottweiler' ; NameBreedDOBParent name BrujaRottweiler15AUG2000Ina Bruja ChacalRottweiler29JAN1990 ChataBoxer. JadziaPointer15FEB1999 NalaAirdale11JAN1990 NinaAMBR22APR2001 RoccoMMBNR08MAY2001 TysonRottweiler15MAR2000 Do we get a warning?

19 Joins cont’d Limit records CREATE TABLE final AS SELECT a.*, b.* FROM data_set1 (WHERE=(var1=’keep’)) a LEFT JOIN data_set2 b ON a.var1=B.var1b AND a.var2=B.var2b AND a.var3=B.var3b ;

20 Joins cont’d SELECT a.*, b.* FROM dogs (WHERE=(breed='Rottweiler'))a LEFT JOIN parents b ON a.name=b.name ; NamebreedDOB Parent name BrujaRottweiler15AUG2000 Ina Bruja ChacalRottweiler29JAN1990 TysonRottweiler15MAR2000

21 What is special in SAS PROC SQL? Some differences between data step and SQL syntax Order of execution How good is my code? Things you already know from data step programming

22 Some Differences Between Data Step and SQL Syntax Reserved word Format modifiers Descending sorting

23 Reserved Word CASE –Why to use it Is there something similar in data step? –How to use it –What can be the problem –How to avoid a problem

24 CASE: Why to Use It There is no IF or SELECT statements in SQL Is similar to the SELECT clause in data step

25 CASE: Why to Use It DATA final; SET parents; SELECT (name); WHEN (‘Nina') education =‘Novice'; WHEN (‘Nala') education =‘None'; OTHERWISE education =‘There is hope’; END;

26 CASE: How to Use It SELECT var1, var2, CASE [vark] WHEN (“value1”) THEN [‘constant value’ | function] WHEN (‘value3’) THEN [‘other constant’ | function] … ELSE [‘last value’ | function ] END AS newvar FROM data_set ;

27 CASE: How to Use It cont’d SELECT name, CASE name WHEN ('Nala') THEN 'None' WHEN ('Nina') THEN 'Novice' ELSE 'There is hope' END AS education LENGTH = 15 FORMAT $15., parent FROM parent ; NameEducationParent NalaNoneIna ChataThere is hopeIna BrujaThere is hopeIna FidongaThere is hopeIna RoccoThere is hopeChavo NinaNoviceCecilia JadziaThere is hopeCecilia

28 CASE: How to Use It cont’d SELECT name,breed, CASE WHEN INDEX (breed,'MB') THEN PUT(breed,$UKC.) ELSE breed END AS lbreed label "Long Breed" FROM dogs; NameBreedLong breed BrujaRottweilerRottweiler ChacalRottweilerRottweiler ChataBoxerBoxer JadziaPointerPointer NalaAirdaleAirdale NinaAMBRAmerican Mixed Breed Registry RoccoMMBNRMexican Mixed Breed Non-Registered TysonRottweilerRottweiler

29 CASE: What Can Be the Problem DATA new; LENGTH case $20; SET parents; IF parent ='Ina' THEN case='None'; ELSE IF parent ='Chavo' THEN case='Spoiled'; ELSE case='Work'; RUN; PROC PRINT DATA=new; RUN; ObsCaseParentName 1NoneInaNala 2NoneInaChata 3NoneInaBruja 4NoneInaFidonga 5SpoiledChavoRocco 6WorkCeciliaNina 7WorkCeciliaJadzia

30 CASE: What Can Be the Problem cont’d SELECT * FROM new WHERE parent='Ina'; CaseParentName NoneInaNala NoneInaChata NoneInaBruja NoneInaFidonga

31 CASE: What Can Be the Problem cont’d SELECT * FROM new WHERE case='None'; select * from new WHERE case='None'; + 22 + 76 +ERROR 22-322: Syntax error, expecting one of the following: + a name, a quoted string, a numeric constant, + a datetime constant, a missing value, (, +, -, + BTRIM, CALCULATED, CASE, EXISTS, INPUT, LOWER, + NOT, PUT, SELECT, SUBSTRING, TRANSLATE, UPPER, + USER, WHEN, ^, ~. +ERROR 76-322: Syntax error, statement will be ignored.

32 CASE: How to Avoid a Problem SELECT * FROM new (RENAME=(case=education)) WHERE education='None';

33 Some Differences Between Data Step and SQL Syntax Reserved word Format modifiers Descending sorting

34 Format Modifiers What are format modifiers? Format modifiers in data step Format modifiers in proc SQL

35 Format Modifiers in Data Step DATA ndogs; SET dogs; idate=INPUT(dob,date9.); IF _ERROR_=1 THEN PUT _ALL_; RUN; NOTE: Invalid argument to function INPUT at line 196 column 9. name=Chata breed=Boxer dob=UNK idate=. _ERROR_=1 _N_=3 NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to missing values. Each place is given by: (Number of times) at (Line):(Column). 1 at 196:9

36 Format Modifiers in Data Step cont’d DATA ndogs; SET dogs; idate=INPUT(dob, ?? date9.); IF _ERROR_=1 THEN PUT _ALL_; RUN; NOTE: There were 8 observations read from the data set WORK.DOGS. NOTE: The data set WORK.NDOGS has 8 observations and 4 variables.

37 Format Modifiers in PROC SQL +ERROR: Invalid day value NOTE: Table WORK.NDOGS created, with 8 rows and 3 columns. CREATE TABLE ndogs AS SELECT name, dob, INPUT(dob, date9.) AS idate FORMAT mmddyy8. FROM dogs;

38 Format Modifiers in PROC SQL cont’d CREATE TABLE ndogs AS SELECT name, dob, INPUT(dob, ? date9.) AS idate FORMAT mddyy8. FROM dogs; NOTE: Table WORK.NDOGS created, with 8 rows and 3 columns.

39 Some Differences Between Data Step and SQL Syntax Reserved word Format modifiers Descending sorting

40 Descending Sorting PROC SORT DATA = ndogs; BY DESCENDING idate; RUN; SELECT * FROM ndogs ORDER BY idate descending; PROC SQL Data step and PROCS

41 Descending Sorting cont’d A little lagniappe (pronounced nańap) SELECT name, dob, INPUT( dob,? date9.) AS idate FORMAT date9. FROM dogs ORDER BY 3 DESC ; NameDOBIdate Bruja15Aug200015AUG2000 Tyson15Mar200015MAR2000 Jadzia15Feb199915FEB1999 Nina22APR199722APR1997 Chacal29Jan199029JAN1990 Nala11jan199011JAN1990 Rocco08may198008MAY1980 ChataUNK.

42 What is Special in SAS PROC SQL? The program data vector (PDV) Some differences between data step and SQL syntax Order of execution How good is my code?

43 The Order of Execution SELECT VAR1, VAR2, … FROM data_set [ FULL | LEFT | RIGHT | INNER ] JOIN TABLE2 ON condition 1 AND condition 2 WHERE condition k AND condition l GROUP BY VAR1, VAR…. HAVING condition p AND condition q ORDER BY var1, var2… ;

44 The Order of Execution cont’d PROC SQL NUMBER; SELECT a.breed, INPUT(a.dob, ? date9.) AS ndob format date9., COALESE(a.name, b.name) AS name, b.parent FROM dogs a FULL JOIN parents b ON a.name=b.name; BreedNDOBNameParent 1 Rottweiler15AUG2000 BrujaIna 2 Rottweiler29JAN1990 Chacal 3 BoxerChataIna 4. FidongaIna 5 Pointer15FEB1999JadziaCecilia 6 Airdale11JAN1990NalaIna 7 AMBR22APR1997NinaCecilia 8 MMBNR08MAY1980RoccoChavo

45 The Order of Execution cont’d SELECT a.breed, INPUT( a.dob, ? date9.) AS ndob FORMAT date9., COALESCE (a.name, b.name) AS name, b.parent FROM dogs a FULL JOIN parents b ON a.name=b.name AND breed='Rottweiler'; BreedNDOBNameParent 1 Rottweiler15AUG2000 BrujaIna 2 Rottweiler29JAN1990 Chacal 3 BoxerChataIna 4. FidongaIna 5 Pointer15FEB1999JadziaCecilia 6 Airdale11JAN1990NalaIna 7 AMBR22APR1997NinaCecilia 8 MMBNR08MAY1980RoccoChavo

46 The Order of Execution cont’d SELECT a.breed,b.parent, COALESCE ( a.name, b.name) FROM dogs a FULL JOIN parent b ON a.name=b.name WHERE breed='Rottweiler'; SELECT a.breed, b.parent, COALESCE (b.name, a.name) FROM dogs a FULL JOIN parent b ON a.breed='Rottweiler' WHERE a.name=b.name; Breed Parent Rottweiler Ina Bruja Rottweiler Chacal Rottweiler Tyson Breed Parent Rottweiler Ina Bruja

47 The Order of Execution cont’d SELECT a.breed, INPUT(a.dob, ? date9.) AS ndob format date9., COALESCE(a.name, b.name) AS name, b.parent FROM dogs a FULL JOIN parent b ON a.name=b.name WHERE breed='Rottweiler' HAVING ndob GT '01jan2000'd; BreedNDOBNameParent Rottweiler15AUG2000BrujaIna Rottweiler15MAR2000Tyson

48 How Good is My Code? Does the options used in data step make my code faster? Is speed important?

49 How Good is My Code? cont’d Do data step options make my code faster? data all; merge dogs parent; by name; do i=1 to 100; do j=1 to 100; output; end; run; NOTE: There were 8 observations read from the data set WORK.DOGS. NOTE: There were 7 observations read from the data set WORK.PARENT. NOTE: The data set WORK.ALL has 891000 observations and 6 variables.

50 How Good is My Code? cont’d Does the statements used to control the size of my input data set make my code faster? proc sql stimer; NOTE: SQL statement used: real time 0.01 seconds cpu time 0.02 seconds

51 How Good is My Code? cont’d NOTE: SQL statement used: real time 10.82 seconds cpu time 10.50 seconds SELECT * FROM all WHERE breed='Rottweiler'; SELECT * FROMall (WHERE=(breed='Rottweiler')); NOTE: SQL statement used: real time 10.85 seconds cpu time 10.61 seconds Does the statements used to control the size of my input data set make my code faster?

52 SELECT DISTINCT name, breed, dob FROM all WHERE breed='Rottweiler'; SELECT DISTINCT name, breed, dob FROM all ( WHERE=(breed='Rottweiler')); SELECT DISTINCT name, breed, dob FROM all (WHERE=(breed='Rottweiler') KEEP=name breed dob); Does the statements used to control the size of my input data set make my code faster? real time 5.89 seconds cpu time 4.30 seconds real time 9.42 seconds cpu time 4.57 seconds real time 4.90 seconds cpu time 4.35 seconds How Good is My Code? cont’d

53 Conclusions This presentation showed examples on how and when to use statements that we already know –KEEP –RENAME –DROP –WHERE (things you already know)

54 Conclusions Possible language problems (line mines) –Reserved word (CASE) –Format modifiers –Descending order (things you think you already know)

55 Conclusions cont’d Be careful with the order of the conditions The order of the conditions affect the result

56 Conclusions cont’d When you use SQL, consider if you need all your variables and observations If you don’t need them consider: –It there is a significant difference in computer use –Are you a good typist?

57 Conclusions cont’d You can have plain vanilla SAS PROC SQL or you can SASsy it up with data step statements

58 Acknowledgements PPD management –Kim Sturgen and Bonnie Duncan –Andy Barnett –Darrel Edgley –Neil Howard, Larry Sleeper and Craig Mauldin –For their time, although if there are mistakes, they are all mine; I made modifications and additions after they read the paper

59 Contact Information Angelina Cecilia Mauldin Abraxis BioScience 4505 Emperor Blvd. #400 Durham, NC 27703 919 433 8400 cmauldin@abraxisbio.com


Download ppt "SAS ® PROC SQL or Vanilla Flavor Cecilia Mauldin January 31 2007."

Similar presentations


Ads by Google