1 Combining (with SQL) HRP223 – 2010 October 27, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.
2 PROC SQL - Set Operators NO GUI (“noh gooey”) Outer Union Corresponding – concatenates Unions – unique rows from both queries Except – rows that are part of first query Intersect – rows common to both queries
3 outer union corresponding You can concatenate data files. I rarely use it. proc sql; create table isOuter as select dude from baseline outer union corresponding select dude from followup; quit;
4 union You can also concatenate data files and keep unique records: proc sql; create table isUnion as select dude from baseline union select dude from followup; quit;
5 Say you needed everyone who did not come back. Start out with the baseline group and remove the people who came back. proc sql; select id from baseline except select id from followup; quit; except
6 Say you wanted to know who came back. In other words, what IDs are in both files? proc sql; select id from baseline intersect select id from followup; quit; intersect
7 PROC SQL - Set Operators When you have tables (with more than one column) with the same structure, you can combine them with these set operators. – Be extremely careful because SAS/SQL is forgiving about the structure of the tables and you may not notice problems in the data. – For this to work as intended, the two tables must have the same variables, in the same order, and the variables must be of the same type (variables with the same name must both be character or both be numeric). Use the key word corresponding to have it match like-named variables.
8 corresponding The columns do not need to have matching names or even the same length and it will still operate on them. Use corresponding to help spot this problem.
9 Working with Repeated Keys A file tracking diagnoses or treatments will have multiple records for some people. – If you want to count the number of records for a person, specify what variable(s) are used to group by. – Count records in the group with count(*) or count not missing values with count(variableName)
10 Repeat Counting I want to know: – how many people I have – how many diagnoses each person has – how many distinct diagnoses each person has You can sort the data and count or use the SQL commands on grouped data.
11
12 How many records? Select ID to be included in the new data set. Add an Advanced expression as a Computed Column and select the count() function.
13 It automatically groups the data by ID when you do the count(*) function.
14 Other Aggregates To get the counts of diagnoses and/or the distinct diagnoses, drag the diagnosis (DX) variable over to the select variable list and choose the appropriate summary statistic.
15