A Bit About SAS/Macro Language Database Theory and Normalization

Slides:



Advertisements
Similar presentations
Final Thoughts HRP 223 – 2013 December 4 th, 2013 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
Advertisements

Working with Data in Windows HRP223 – 2010 October 4 th, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 SAS Formats and SAS Macro Language HRP223 – 2011 November 9 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
1 Merging with SQL HRP223 – 2011 October 31, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
1 Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
1 Processing Grouped Data HRP223 – 2011 November 14 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 Combining (with SQL) HRP223 – 2010 October 27, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 Lab 1 HRP223 – 2010 October 6, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
1 Database Theory and Normalization HRP223 – 2010 November 14 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
1 Lab 1 HRP223 – 2011 Oct 10, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Adding Automated Functionality to Office Applications.
1 Windows and Beginning Data Manipulation HRP223 – 2013 Oct 9, 2012 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
SAS for Categorical Data Copyright © 2004 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 6 Value- Returning Functions and Modules.
HPR Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Working with Data in Windows HRP223 – 2009 Sept 28 th, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Computer Programming TCP1224 Chapter 3 Completing the Problem-Solving Process and Getting Started with C++
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Perl Tutorial. Why PERL ??? Practical extraction and report language Similar to shell script but lot easier and more powerful Easy availablity All details.
1. FINISHING FUNCTIONS 2. INTRODUCING PLOTTING 1.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 Lab 1 HRP223 – 2011 Oct 10, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
1 Lab 1 HRP223 – 2009 October 5, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Beginning Data Manipulation HRP Topic 4 Oct 14 th 2012 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 Agenda  Unit 7: Introduction to Programming Using JavaScript T. Jumana Abu Shmais – AOU - Riyadh.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international.
Chapter 8: Understanding Collections Textbook: Chapter 4.
Retrieving Data Using the SQL SELECT Statement
User-Written Functions
Repetition Structures Chapter 9
Using the Excel Creation Template to Create a Variable Parameter Problem (Macro Enabled “Alpha 1.4.2”) Getting started – Example 1 Note – You should be.
Jonathan W. Duggins; James Blum NC State University; UNC Wilmington
Scripts & Functions Scripts and functions are contained in .m-files
JavaScript: Functions.
Intro to PHP & Variables
CS 240 – Lecture 11 Pseudocode.
Working with Data in Windows
Dead Man Visiting Farrokh Alemi, PhD Narrated by …
SAS Output Delivery System
MACRO Processors CSCI/CMPE 3334 David Egle.
Computer Programming.
Chapter 8: Introduction to High-Level Language Programming
Combining (with SQL) HRP223 – 2013 October 30, 2013
PHP.
T. Jumana Abu Shmais – AOU - Riyadh
File I/O in C Lecture 7 Narrator: Lecture 7: File I/O in C.
Procedures Organized by Farrokh Alemi, Ph.D. Narrated by Yara Alemi
Coding Concepts (Basics)
Computer Science Projects Database Theory / Prototypes
Lab 3 and HRP259 Lab and Combining (with SQL)
Lab 2 and Merging Data (with SQL)
Topics Introduction to Value-returning Functions: Generating Random Numbers Writing Your Own Value-Returning Functions The math Module Storing Functions.
Combining (with SQL) HRP223 – 2012 November 05, 2011
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Lab 1 HRP223 – 2009 October 5, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
File Sharing and Processing Grouped Data
Getting started – Example 1
Data Manipulation (with SQL)
Final Thoughts.
Processing Grouped Data
Presentation transcript:

A Bit About SAS/Macro Language Database Theory and Normalization HRP223 – 2009 November 2nd, 2009 Copyright © 1999-2009 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

And now for something completely different… Macros Early in the class I told you to download my keyboard macros. Those auto-complete SAS procedures as you type into the editor. Keyboard macros are not what SAS people call Macros. Macros are programs that do tasks. Rather than having to reinvent solutions to complex problems, SAS programmers keep libraries of useful code in easy to use macro format.

I want…. Bar charts are the wrong way to display data if you have tiny samples. I want a plot to show the mean value as a red bar and the individual data points around it. This requires fairly complicated voodoo and I want to be able to reuse the code.

Here is the call that makes this plot. I create the plot once then tweak it to turn it into a macro. This is like a user-defined function. Here is the call that makes this plot. %plotit(w2, weight, group, 4, group1=Thing1, group2=Thing2, group3=Thing3, group4=Thing4);

Macro Details Macros begin with the word… macro and end with the word mend. As a user of the macro, you can ignore everything after the first line. The first line has the parameters (aka arguments) that the macro needs. Hopefully the person who wrote the macro will give you the details on the arguments. The parameters are filled in using the order you typed them unless the arguments have names.

Arguments are hopefully well named. They are a comma delimited list. Definition of the plotit macro is here. This creates the macro but does not cause it to do anything. Expand the code if and only if you want to. Arguments are hopefully well named. They are a comma delimited list. Macros typically arrive with a big comment to explain what you use for the arguments This calls/invokes the plotit macro. The macro is actually “done” when you include a line like this.

Other Examples I want to make a quantile plot to show the percentiles for a dataset.

The code may be hardcore but you only need to figure out the comments. Run the macros once. You can then invoke the macro repeatedly.

Sensitivity, Specificity, Positive and Negative Predicted Value Somehow SAS forgot them… In the log In the log

Binomial Probabilities If you need to calculate binomial probabilities look at my macro:

How to Create a Macro Get code that works either by manually writing it or looking at the code that EG generates. Identify the things that you want to change with different runs of the macro. Replace the those keywords with a name preceded by an & and followed by a . Enclose the edited code with inside of %macro(); and mend; Insert your macro variables as a comma delimited list inside the () on the macro line.

Original Macro variables

Now you can call the macro. Original Add the wrapper. Add names of the macro variables notice you do NOT put & and . In the list of arguments. Now you can call the macro.

You can define default values You can define default and you can override the defaults.

Flat Files Some people try to store all their data in a single file. This causes lots of extra work because of holes in the tables and repeated information. Both problems can be fixed by a relational model. Split the data into many tables. You need to use SQL to work with data split across multiple tables.

Not Normalized I frequently get data, from people who are not professional programmers, where the diagnosis data is organized “wide” across the page. Where the first diagnosis is in the first column, the second is in the second, etc. and the task is to find or fix a diagnosis.

Subsetting Based on 5 Variables

SQL vs. Datastep The GUI generates this code: Or you could write a little data step program:

Change All 9s to 999s? It is a lot of clicking.

Code The SQL is a bit complicated

As Data Step If it is more than 5 columns, things get unruly. Imagine doing this across 20 possible diagnoses. There is an easy solution in data step code. First, the SQL code can be done easily in a data step.

A List As you can see, there is a list of variables and you are doing the same things over and over. You want to make a list called dx and have the 1st element refer to dx1, the 2nd thing refer to dx2, etc. The concept of a named list of variables or an alias to a bunch of variables is instantiated as an array.

Arrays A major improvement….. Ummmm. You want to process the same one line over and over. You need to count from 1 to 5…. Sounds like a loop. data change9Code; set wide; array dx dx1-dx5; if dx[1] = 9 then dx[1] =999; if dx[2] = 9 then dx[2] =999; if dx[3] = 9 then dx[3] =999; if dx[4] = 9 then dx[4] =999; if dx[5] = 9 then dx[5] =999; run;

Change Lots of Things If you have an array, you can process wide files easily. data change9CodeArray; set wide; array dx dx1-dx5; do counter = 1 to 5 by 1; if dx[counter] = 9 then dx[counter] = 999; end; run;

Restructuring with Arrays You can use similar code to restructure data so that you have only a couple of columns of data. Add a new column that is called dxNum and another called theDX. Those two columns plus the subject ID number can contain the same information without all the “holes”.

How does that work? Go through all five variables, one at a time. If the variable is not missing, you need to do three things: Copy the diagnosis counter number into the dxNum variable. Copy the diagnosis code number into the variable called theDx. Write to the new data set.

Repeated Ifs This is a lot of typing and it obscures the fact that you are doing three things if a condition is true: data restructure (keep = dude dxNum theDx); set wide; array dx dx1-dx5; do counter = 1 to 5 by 1; if not missing(dx[counter]) then dxNum = counter; if not missing(dx[counter]) then theDx = dx[counter]; if not missing(dx[counter]) then output; end; run;

do end You have seen do statements in the context where you do stuff over and over. There is also a do end command for when you need to do a block of instructions if a condition is true. You need both do and end

Actual Code

Normalization Part 2 I got data where I needed to analyze age for people who have a particular diagnosis. The data was a not-normalized mess:

Normalization Part 2 The Wrong Way If your database is like this, you need code like this: data bad2; set bad; if (dob1 ne . and not missing(dx1)) then do; if code1= 22 then IsCase1=1; else Iscase1=0; end; if (dob2 ne . and not missing(dx2)) then do; if code2=22 then IsCase2=1; else Iscase2=0; if (dob3 ne . and not missing(dx3)) then do; if code3=22 then IsCase3=1; else Iscase3=0; if (dob4 ne . and not missing(dx4)) then do; if code4=22 then IsCase4=1; else Iscase4=0; if (dob5 ne . and not missing(dx5)) then do; if code5=22 then IsCase5=1; else Iscase5=0; run; You will end up with the same code repeated as many times as you have repetitions.

Normalization Part 2 The Right Way Instead, you should have a record in a table corresponding to each repetition. With code like this: data good2; set good; if code= 22 then isCase1=1; else isCase1=0; run;

Your first attempt could go something like this: data normal1 (keep = sid mid dob dx code); set bad; format dob dx mmddyy8.; if (dob1 ne . and dx1 ne . and code1 ne .) then do; mid = 1; dob = dob1; dx = dx1; code = code1; output; end; if (dob2 ne . and dx2 ne . and code2 ne .) then do; mid = 2; dob=dob2; dx=dx2; code=code2; output; if (dob3 ne . and dx3 ne . and code3 ne .) then do; mid=3; dob=dob3; dx=dx3; code=code3; output; if (dob4 ne . and dx4 ne . and code4 ne .) then do; mid=4; dob=dob4; dx=dx4; code=code4; output; if (dob5 ne . and dx5 ne . and code5 ne .) then do; mid=5; dob=dob5; dx=dx5; code=code5; output; end; run; But you end up with just as many blocks of code.

Setting up Aliases (Arrays) What you want is a way to repeat this code over the five sets of variables: if (dob1 ne . and dx1 ne . and code1 ne .) then do; mid = 1; dob = dob1; dx = dx1; code = code1; output; end; You need: A dob alias (dob_a) to refer to dob1, dob2, dob3, dob4 and dob5 A dx alias (dx_a) to refer to dx1, dx2, dx3, dx4 and dx5 A code alias (code_a) to refer to code1, code2, code3, code4 and code5

Setting up Aliases (Arrays) data normal2a; set bad; array dob_a dob1-dob5; array dx_a dx1-dx5; array code_a code1-code5; if (dob1 ne . and dx1 ne . and code1 ne .) then do; mid = 1; dob = dob1; dx = dx1; code = code1; output; end; run; This sets up the arrays but they are not used in this program.

Setting up Aliases (Arrays) data normal2a; set bad; array dob_a dob1-dob5; array dx_a dx1-dx5; array code_a code1-code5; if (dob_a[1] ne . and dx_a[1] ne . and code_a[1] ne .) then do; mid = 1; dob = dob_a[1]; dx = dx_a[1]; code = code_a[1]; output; end; run;

Setting up Aliases (Arrays) data normal2c (keep = sid mid dob dx code); set bad; array dob_a dob1-dob5; array dx_a dx1-dx5; array code_a code1-code5; do c = 1 to 5 by 1; if (dob_a[c] ne . and dx_a[c] ne . and code_a[c] ne .) then do; mid = c; dob = dob_a[c]; dx = dx_a[c]; code = code_a[c]; output; end; run;

Arrays You can tell SAS that a set of variables are related by putting them into an array statement. Arrays in SAS are not like arrays in other languages like BASIC or C. SAS arrays are only aliases to an existing set of variables. They are created using the array statement: My notation for arrays array times_a [365] time1-time365; An optional size of the array What the array refers to

Arrays(2) If your array references variables that do not exist, they will be created. Make sure to use the $ if you intend to create character variables. If you want to reference all numeric variables between theValue and thingy2, do it like this: array x theValue -- thingy2 _numeric_; -- means all values between and including the starting and ending variables - indicates the numeric sequence starting with the first variable and ending with the second