HRP223 2008 Copyright © 1999-2008 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.

Slides:



Advertisements
Similar presentations
JQuery MessageBoard. Lets use jQuery and AJAX in combination with a database to update and retrieve information without refreshing the page. Here we will.
Advertisements

Final Thoughts HRP 223 – 2013 December 4 th, 2013 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
Concepts of Database Management Sixth Edition
Working with Data in Windows HRP223 – 2010 October 4 th, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 SAS Formats and SAS Macro Language HRP223 – 2011 November 9 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
Beginning Data Manipulation HRP Topic 4 Oct 19 th 2011.
1 Merging with SQL HRP223 – 2011 October 31, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
1 Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
1 Processing Grouped Data HRP223 – 2011 November 14 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 Combining (with SQL) HRP223 – 2010 October 27, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 Database Theory and Normalization HRP223 – 2010 November 14 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
1 Lab 1 HRP223 – 2011 Oct 10, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Introduction to SQL Session 2 Retrieving Data From Multiple Tables.
XP Chapter 3 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Analyzing Data For Effective Decision Making.
Basic And Advanced SAS Programming
SAS for Categorical Data Copyright © 2004 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Akhila Kondai October 30, 2013.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Banner and the SQL Select Statement: Part Four (Multiple Connected Select Statements) Mark Holliday Department of Mathematics and Computer Science Western.
Mr. Justin “JET” Turner CSCI 3000 – Fall 2015 CRN Section A – TR 9:30-10:45 CRN – Section B – TR 5:30-6:45.
HPR Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
PHP meets MySQL.
Working with Data in Windows HRP223 – 2009 Sept 28 th, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
McGraw-Hill Technology Education © 2004 by the McGraw-Hill Companies, Inc. All rights reserved. Office Access 2003 Lab 3 Analyzing Data and Creating Reports.
Analyzing Data For Effective Decision Making Chapter 3.
Banner and the SQL Select Statement: Part Three (Joins) Mark Holliday Department of Mathematics and Computer Science Western Carolina University 4 November.
Concepts of Database Management Seventh Edition
Database Systems Microsoft Access Practical #3 Queries Nos 215.
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
1 Agenda – 10/24/2013 Answer questions from lab on 10/22. Present SQL View database object. Present SQL UNION statement.
1 Summary HRP223 – 2009 November 1 st, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Loops Wrap Up 10/21/13. Topics *Sentinel Loops *Nested Loops *Random Numbers.
Gold – Crystal Reports Introductory Course Cortex User Group Meeting New Orleans – 2011.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
M1G Introduction to Programming 2 3. Creating Classes: Room and Item.
1 Lab 1 HRP223 – 2011 Oct 10, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
IS2803 Developing Multimedia Applications for Business (Part 2) Lecture 5: SQL I Rob Gleasure robgleasure.com.
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Beginning Data Manipulation HRP Topic 4 Oct 14 th 2012 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
There’s a particular style to it… Rob Hatton
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
 CONACT UC:  Magnific training   
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
Working with Data in Windows
ECONOMETRICS ii – spring 2018
SAS Output Delivery System
Types of Joins Farrokh Alemi, Ph.D.
Combining (with SQL) HRP223 – 2013 October 30, 2013
Combining Data Sets in the DATA step.
Lab 3 and HRP259 Lab and Combining (with SQL)
Lab 2 and Merging Data (with SQL)
Combining (with SQL) HRP223 – 2012 November 05, 2011
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Contents Preface I Introduction Lesson Objectives I-2
Chapter 8 Advanced SQL.
A Bit About SAS/Macro Language Database Theory and Normalization
File Sharing and Processing Grouped Data
Data Manipulation (with SQL)
Processing Grouped Data
Software Development Techniques
Presentation transcript:

HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law. HRP Topic 6 – Relational Data

HRP HW 2  SORRY! I apologize for not getting it posted before yesterday! It is due in two weeks. – The new datasets have less variables and one variable renamed. You want to change newID to have the same name as the old subject ID. You can put a rename command on the line that does the import: proc import out = wide (rename = (dude = subjectID)) datafile = "C:\Projects\classes\HRP \day6\wideDx.xls" replace; mixed = yes; sheet = "Sheet1"; run;

HRP Flat Files  Some people try to store all their data in a single file. This causes lots of extra work because of holes in the tables and repeated information.  Both problems can be fixed by a relational model. – Split the data into many tables.  You need to use SQL to work with data split across multiple tables.

HRP Not Normalized  I frequently get data, from people who are not professional programmers, where the diagnosis data is organized “wide” across the page. Where the first diagnosis is in the first column, the second is in the second, etc. and the task is to find or fix a diagnosis.

HRP Subsetting Based on 5 Variables

HRP SQL vs. Datastep  The GUI generates this code:  Or you could write a little data step program:

HRP Change All 9s to 999s?  It is a lot of clicking.

HRP Code  The SQL is a bit complicated

HRP As Data Step  If it is more than 5 columns, things get unruly. Imagine doing this across 20 possible diagnoses. There is an easy solution in data step code.  First, the SQL code can be done easily in a data step.

HRP A List  As you can see, there is a list of variables and you are doing the same things over and over.  You want to make a list called dx and have the 1 st element refer to dx1, the 2 nd thing refer to dx2, etc. The concept of a named list of variables or an alias to a bunch of variables is instantiated as an array.

HRP Arrays  A major improvement….. Ummmm.  You want to process the same one line over and over. You need to count from 1 to 5…. Sounds like a loop.

HRP Change Lots of Things  If you have an array, you can process wide files easily.

HRP Restructuring with Arrays  You can use similar code to restructure data so that you have only a couple of columns of data.  Add a new column that is called dxNum and another called theDX. Those two columns plus the subject ID number can contain the same information without all the “holes”.

HRP How does that work?  Go through all five variables, one at a time.  If the variable is not missing, you need to do three things: – Copy the diagnosis counter number into the dxNum variable. – Copy the diagnosis code number into the variable called theDx. – Write to the new data set.

HRP Repeated Ifs  This is a lot of typing and it obscures the fact that you are doing three things if a condition is true:

HRP do end  You have seen do statements in the context where you do stuff over and over. There is also a do end command for when you need to do a block of instructions if a condition is true. You need both do and end

HRP Actual Code

HRP Normalization Part 2  I got data where I needed to analyze age for people who have a particular diagnosis. The data was a not-normalized mess:

HRP Normalization Part 2 The Wrong Way  If your database is like this, you need code like this: data bad2; set bad; if (dob1 ne. and not missing(dx1)) then do; if code1= 22 then IsCase1=1; else Iscase1=0; end; if (dob2 ne. and not missing(dx2)) then do; if code2=22 then IsCase2=1; else Iscase2=0; end; if (dob3 ne. and not missing(dx3)) then do; if code3=22 then IsCase3=1; else Iscase3=0; end; if (dob4 ne. and not missing(dx4)) then do; if code4=22 then IsCase4=1; else Iscase4=0; end; if (dob5 ne. and not missing(dx5)) then do; if code5=22 then IsCase5=1; else Iscase5=0; end; run; You will end up with the same code repeated as many times as you have repetitions.

HRP Normalization Part 2 The Right Way  Instead, you should have a record in a table corresponding to each repetition.  With code like this: data good2; set good; if code= 22 then isCase1=1; else isCase1=0; run;

HRP  Your first attempt could go something like this: data normal1 (keep = sid mid dob dx code); set bad; format dob dx mmddyy8.; if (dob1 ne. and dx1 ne. and code1 ne.) then do; mid = 1;dob = dob1; dx = dx1;code = code1; output; end; if (dob2 ne. and dx2 ne. and code2 ne.) then do; mid = 2; dob=dob2; dx=dx2; code=code2; output; end; if (dob3 ne. and dx3 ne. and code3 ne.) then do; mid=3; dob=dob3; dx=dx3; code=code3; output; end; if (dob4 ne. and dx4 ne. and code4 ne.) then do; mid=4; dob=dob4; dx=dx4; code=code4; output; end; if (dob5 ne. and dx5 ne. and code5 ne.) then do; mid=5; dob=dob5; dx=dx5; code=code5; output; end; run; But you end up with just as many blocks of code.

HRP Setting up Aliases (Arrays)  What you want is a way to repeat this code over the five sets of variables: if (dob1 ne. and dx1 ne. and code1 ne.) then do; mid = 1;dob = dob1; dx = dx1;code = code1; output; end;  You need: – A dob alias (dob_a) to refer to dob1, dob2, dob3, dob4 and dob5 – A dx alias (dx_a) to refer to dx1, dx2, dx3, dx4 and dx5 – A code alias (code_a) to refer to code1, code2, code3, code4 and code5

HRP Setting up Aliases (Arrays) data normal2a; set bad; array dob_a dob1-dob5; array dx_a dx1-dx5; array code_a code1-code5; if (dob1 ne. and dx1 ne. and code1 ne.) then do; mid = 1;dob = dob1; dx = dx1;code = code1; output; end; run; This sets up the arrays but they are not used in this program.

HRP Setting up Aliases (Arrays) data normal2a; set bad; array dob_a dob1-dob5; array dx_a dx1-dx5; array code_a code1-code5; if (dob_a[1] ne. and dx_a[1] ne. and code_a[1] ne.) then do; mid = 1;dob = dob_a[1]; dx = dx_a[1];code = code_a[1]; output; end; run;

HRP Setting up Aliases (Arrays) data normal2c (keep = sid mid dob dx code); set bad; array dob_a dob1-dob5; array dx_a dx1-dx5; array code_a code1-code5; do c = 1 to 5 by 1; if (dob_a[c] ne. and dx_a[c] ne. and code_a[c] ne.) then do; mid = c; dob = dob_a[c]; dx = dx_a[c]; code = code_a[c]; output; end; run;

HRP Arrays  You can tell SAS that a set of variables are related by putting them into an array statement.  Arrays in SAS are not like arrays in other languages like BASIC or C. SAS arrays are only aliases to an existing set of variables. They are created using the array statement: array times_a [365] time1-time365; My notation for arrays An optional size of the array What the array refers to

HRP Arrays (2)  If your array references variables that do not exist, they will be created. Make sure to use the $ if you intend to create character variables.  If you want to reference all numeric variables between theValue and thingy2, do it like this: array x theValue -- thingy2 _numeric_; -- means all values between and including the starting and ending variables - indicates the numeric sequence starting with the first variable and ending with the second

HRP SQL and Colors  You may have noticed that the guys who made the enhanced editor don’t know SQL commands because some of the key words were not colorized. There are lots of them, but they can be easily fixed.

HRP Fix Color  Go to Tools > Options ….> SAS Programs and then click Editor Options… then User Defined Keywords

HRP Missing Words  Add – calculated – coalesce – corresponding – except – full – group – inner – intersect – join – left – on – or – order – outer – right – union

HRP Minimal SQL  Print a report showing the contents of variables from a single data set. Put a comma-delimited list of variables here. Specify a library.table here.

HRP What variables?  Use an * to indicate that you want all variables instead of typing them all.  There is no syntax to specify variables based on position in the source files. That is, you can not specify that you want to select the 2 nd and 7 th variables (from left to right) or to select the first 3 variables.

HRP Likely Tweaks  You can rename a variable in the list with an as statement.  You can also specify variable formats and labels.

HRP More Tweaks  The from line references tables which are in libraries. Complex queries require you to reference the table name over and over again. Instead of having to type the long library and dataset names repeatedly, you can refer to the files as an alias. Print the column called dude from the table reportedCancers which is in the ovCancer library. Here the c. is optional because dude is only in one table (the query only uses one table).

HRP Stacking  You already know how to use proc append or the Data > Append Table menu item to combine two sets of data on top of one another. How do you “copy/paste” to insert columns from one table into another?

HRP The GUI can do easy SQL.  You could write data step or proc sql code.  Happily, most of the merges you need are in the graphical user interface.

HRP How are tables linked?  You need to tell it who is matched with whom in the tables. If you have a demographics table and a disease table, you need to specify which column says which disease belongs to which person. In this case you would say match on the subject ID numbers in the two tables using a key column.

HRP Inner Join  If you want records where there is a match in both tables, you want an inner join (aka, equijoin or natural join). – For example, which subjects have demographic and cancer information?

HRP

Alternate Syntax This is what I write.

HRP All Information from the Left Table  If you want all the demographics, as well as the cancers if they occur:

HRP Left Join Code

HRP All Information from the Right Table

HRP  If you wanted the cancers info plus demographics where there were any:

HRP Right Join Code

HRP Full Join  If you wanted all information: It would be nice if you could combine the two dude variables so the first not –missing value was used.

HRP Full Join Code

HRP Coalesce  Coalesce says take the first not-missing value from the set of variables.

HRP Checking for ID Numbers with SQL  A task that I need to do frequently is to build a list of all subject IDs when data is coming from multiple sources. – List IDs with duplicates. – List unique ID numbers. – List who is in both files. – List who is in one file but not the second. – Make a summary showing all IDs and an indicator for who appears where.

HRP PROC SQL - Set Operators NO GUI  Outer Union Corresponding – concatenates  Unions – unique rows from both queries  Except – rows that are part of first query  Intersect – rows common to both queries

HRP outer union corresponding  You can concatenate data files.  I rarely use it.  proc sql; create table isOuter as select dude from baseline outer union corresponding select dude from followup; quit;

HRP union  You can also concatenate data files and keep unique records: proc sql; create table isUnion as select dude from baseline union select dude from followup; quit;

HRP  Say you needed everyone who did not come back. Start out with the baseline group and remove the people who came back. proc sql; select id from baseline except select id from followup; quit; except

HRP  Say you wanted to know who came back. In other words, what IDs are in both files? proc sql; select id from baseline intersect select id from followup; quit; intersect

HRP PROC SQL - Set Operators  When you have tables (with more than one column) with the same structure, you can combine them with these set operators. – Be extremely careful because SAS/SQL is forgiving about the structure of the tables and you may not notice problems in the data. – For this to work as intended, the two tables must have the same variables, in the same order, and the variables must be of the same type (variables with the same name must both be character or both be numeric). Use the key word corresponding to have it match like named variables.

HRP corresponding  The columns do not need to have matching names or even the same length and it will still operate on them.  Use correponding to help spot this problem.

HRP Summary Table  Say you have two or more files and they are supposed to have the same subject IDs. How do you make a summary table showing who has information in each table? – Make a master list that has all people regardless of the source file. – Add an indicator column with the value 1 where the subject ID in a table matches the master ID table. – Add in a second column with the value 1 where the subject ID in the second file matches the master ID.

HRP Make Some Data Make a file with 100 random numbers between 1 and 100 (you can get the same number more than once) and an indicator variable holding the value 1. Sort the data and remove the duplicates.

HRP In Code

HRP Subquery  In real life the tables that you are comparing will not include a convenient variable that is holding “1”. You can have SQL make a new variable easily enough: Notice there is no column indicating it is inDay1. Add in a column called inDay1 with the value 1 for everyone.

HRP Subquery You can do this with a single query.

HRP Order  Notice that the data is never put into order. In this case, it ended up ordered correctly because of the union statement. You can explicitly request having the data sorted so you do not need to use the Data > Sort Data… menu. Just add an order by clause.

HRP Working with Repeated Keys  A file tracking diagnoses or treatments will have multiple records for some people. – If you want to count the number of records for a person, specify what variable(s) are used to group by. – Count records in the group with count(*) or count not missing values with count(variableName)

HRP Joining on Duplicated Keys  If you join tables that have duplicated key values, you will end up with lots of records. Specifically, the new table will have as many records as the sum of the product of the two key counts. 2 in appt * 2 in dx = 4 records 2 in appt * 4 in dx = 8 records

HRP distinct  The word distinct removes duplicates.  If you want the IDs of people who had any records in both tables, use distinct.

HRP Joint Keys  Sometimes you need to use more than one variable to indicate which records match across tables. For example, if you use both pedigree numbers and family member numbers in tables to identify people, you need to use both these pedigree ID number and accession number variables to join tables.