Longitudinal Data Techniques: Looking Across Observations Ronald Cody, Ed.D., Robert Wood Johnson Medical School.

Slides:



Advertisements
Similar presentations
Haas MFE SAS Workshop Lecture 3:
Advertisements

Creating Summary Data Sets Ron Cody, Ed.D. Robert Wood Johnson Medical School.
Timothy Forsyth Ashok Viswanathan Debbie McCullough Donn Garvert.
Combining Lags and Arrays [and a little macro] Lynn Lethbridge SHRUG OCT 28, 2011.
S ORTING WITH SAS L ONG, VERY LONG AND LARGE, VERY LARGE D ATA Aldi Kraja Division of Statistical Genomics SAS seminar series June 02, 2008.
A guide to the unknown…  A dataset is longitudinal if it tracks the same type of information on the same subjects at multiple points in time or space.
Outline Proc Report Tricks Kelley Weston. Outline Examples 1.Text that spans columnsText that spans columns 2.Patient-level detail in the titlesPatient-level.
Creating a Compact Columnar Output with PROC REPORT Walter R. Young Principal Clinical Programmer Analyst Wyeth.
I OWA S TATE U NIVERSITY Department of Animal Science Modifying and Combing SAS Data Sets (Chapter in the 6 Little SAS Book) Animal Science 500 Lecture.
Quick Data Summaries in SAS Start by bringing in data –Use permanent data set for these examples Proc Tabulate –Produces summaries very quickly and easily.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Data Cleaning 101 Ron Cody, Ed.D Robert Wood Johnson Medical School Piscataway, NJ.
Data Preparation for Analytics Using SAS Gerhard Svolba, Ph.D. Reviewed by Madera Ebby, Ph.D.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
SAS SQL SAS Seminar Series
SAS PROC REPORT PROC TABULATE
Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
1 Experimental Statistics - week 6 Chapter 15: Randomized Complete Block Design (15.3) Factorial Models (15.5)
Chapter 21 Reading Hierarchical Files Reading Hierarchical Raw Data Files.
Different Decimal Places For Different Laboratory Tests PharmaSug 2004, TT01 A. Cecilia Mauldin.
Introduction to SAS Essentials Mastering SAS for Data Analytics
SAS SQL Part 2 Alan Elliott. Dealing with Missing Values Title "Dealing with Missing Values in SQL"; PROC SQL; select INC_KEY,GENDER, RACE, INJTYPE, case.
BMTRY 789 Lecture 3: Categorical Data and Dates Readings – Chapter 3 & 4 Lab Problems 3.1, 3.2, 3.19, 4.1, 4.3, 4.5 Homework – HW 2 Book Problems Due 6/24!
Learning to love the SAS LAG function Phuse 9-12 October 2011 Herman Ament, MSD, Oss NL Phuse 9-12 October 2011.
Lesson 5 - Topics Formatting Output Working with Dates Reading: LSB:3:8-9; 4:1,5-7; 5:1-4.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Grant Brown.  AIDS patients – compliance with treatment  Binary response – complied or no  Attempt to find factors associated with better compliance.
SQL Chapter Two. Overview Basic Structure Verifying Statements Specifying Columns Specifying Rows.
1 Efficient SAS Coding with Proc SQL When Proc SQL is Easier than Traditional SAS Approaches Mike Atkinson, May 4, 2005.
Summer SAS Workshop Lecture 2. Summer Summer SAS Workshop Lecture 2 I’ve got Data…how do I get started? Libname Review How do you do arithmetic.
1 Filling in the blanks with PROC FREQ Bill Klein Ryerson University.
Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.
1 Using the Magical Keyword “INTO” in PROC SQL Thiru Satchi Blue Cross and Blue Shield of Massachusetts Boston Area SAS Users Group April 5, 1999.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
YET ANOTHER TIPS, TRICKS, TRAPS, TECHNIQUES PRESENTATION: A Random Selection of What I Learned From 15+ Years of SAS Programming John Pirnat Kaiser Permanente.
Lesson 8 - Topics Creating SAS datasets from procedures Using ODS and data steps to make reports Using PROC RANK Programs in course notes LSB 4:11;5:3.
Controlling Input and Output
Lecture 4 Ways to get data into SAS Some practice programming
Practical Uses of the DOW Loop Richard Allen Peak Stat April 8, 2009.
Time Series Data Processes by Tai Yu April 15, 2013.
An Introduction Katherine Nicholas & Liqiong Fan.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
An Introduction to Proc Transpose David P. Rosenfeld HR Consultant, Workforce Planning & Data Management City of Toronto.
Use the SET statement to: –create an exact copy of a SAS dataset –modify an existing SAS dataset by creating new variables, subsetting (using a subsetting.
SHRUG, F EB 2013: N ETWORKING EXERCISE Many Ways to Solve a SAS Problem.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapter 26 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 8, 13, & 24 By Tasha Chapman, Oregon Health Authority.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
Notes on SQL. SQL Programming Employers increasingly tell us that they look for 3 things on a resume: SAS, R and SQL. In these notes you will learn: 1.What.
Applied Business Forecasting and Regression Analysis
Chapter 6: Modifying and Combining Data Sets
Lesson 8 - Topics Creating SAS datasets from procedures
Match-Merge in the Data Step
SAS Essentials How SAS Thinks
Quick Data Summaries in SAS
Statistical Programming
Combining Data Sets in the DATA step.
Producing Descriptive Statistics
Framingham, Exam 5 subset
Fixed Effects estimate using person level data
Hans Baumgartner Penn State University
Introduction to SAS Essentials Mastering SAS for Data Analytics
Presentation transcript:

Longitudinal Data Techniques: Looking Across Observations Ronald Cody, Ed.D., Robert Wood Johnson Medical School

A Typical Longitudinal Data Set PATNO DATE DOBHRSBPDBP /21/ /21/ /01/ /04/ /07/ /11/ /08/ /04/ /22/ /08/ /21/ /06/ /11/ /09/ /18/ /28/ /01/ /04/

Common Data Processing Requirements Extract the first or last visit for a subject Copy information from the first visit to subsequent visits Compute intra-patient statistics such as mean, median, minimum, or maximum

Common Data Processing Requirements Count the number of visits per patient Compute differences of variables between visits Make decisions on the current visit, based on information from a future visit

Selecting the First (or last) Visit for Each Patient PROC SORT DATA=LABS; BY PATNO DATE; RUN; DATA FIRST; SET LABS; BY PATNO; IF FIRST.PATNO; RUN; Listing of FIRST PATNO DATE DOB HR SBP DBP /21/ /21/ /11/ /08/ /04/ /22/ /08/ /06/ /11/ /09/ /28/ /01/ /12/ /07/ /08/ /12/ /01/ /14/ DATA LAST; SET LABS; BY PATNO; IF LAST.PATNO; RUN;

Adding DOB to the Second Through the Last Observation for Each Patient PROC SORT DATA=LABS; BY PATNO DATE; RUN; DATA LAB2; SET LAB; BY PATNO; RETAIN OLD_DOB; IF FIRST.PATNO THEN OLD_DOB = DOB; ELSE DOB = OLD_DOB; RUN;

DOB Added to Observations PATNO DATE DOBHRSBP DBP /21/ /21/ /01/ /21/ /04/ /21/ /07/ /21/ /11/ /08/ /04/ /22/ /08/ /21/ /08/ /06/ /11/ /09/ /18/ /09/ /28/ /01/ /04/ /01/

Computing Differences Between Observations (using RETAIN) *DIFFERENCE BETWEEN FIRST AND LAST VISIT; DATA DIFFERENCE; /*NOT USING V7? */ SET LAB2; BY PATNO; *REMOVE PATIENTS WITH ONE VISIT; IF FIRST.PATNO AND LAST.PATNO THEN DELETE; RETAIN R_HR R_SBP R_DBP; IF FIRST.PATNO THEN DO; R_HR = HR; R_SBP = SBP; R_DBP = DBP; END; (continued)

Computing Differences Between Observations (using RETAIN) (continued) IF LAST.PATNO THEN DO; DIFF_HR = HR - R_HR; DIFF_SBP = SBP - R_SBP; DIFF_DBP = DBP - R_DBP; OUTPUT; END; DROP R_: ; RUN;

Computing Differences Between Observations (using RETAIN) Listing of data set difference D D D I I I F F P F F F A D F _ _ O T A D S D _ S D b N T O H B B H B B s O E B R P P R P P /07/ /21/ /21/ /08/ /18/ /09/ /04/ /01/

Computing Differences Between the First and Last Visit (using LAG) DATA DIFF3; SET LABS2; BY PATNO; *Remove patients with one visit; IF FIRST.PATNO AND LAST.PATNO THEN DELETE; IF FIRST.PATNO OR LAST.PATNO THEN DO; DIFF_HR = HR - LAG(HR); DIFF_SBP = SBP - LAG(SBP); DIFF_DBP = SBP - LAG(DBP); END; IF LAST.PATNO THEN OUTPUT; RUN; Note: About the only time I ever executed a LAG function conditionally (on purpose)

Computing Differences Between Observations (using the LAG function) *DIFFERENCES BETWEEN ALL VISITS; DATA DIFF2; SET LAB2; BY PATNO; DIFF_HR = HR - LAG(HR); *Alternative (below) using the DIF function; DIFF_SBP = DIF(SBP); DIFF_DBP = DIF(DBP); IF NOT FIRST.PATNO THEN OUTPUT; RUN;

Computing Differences Between Observations (using the LAG function) Listing of data set diff2 D D D I I I F F P F F F A D F _ _ O T A D S D _ S D b N T O H B B H B B s O E B R P P R P P /01/ /21/ /04/ /21/ /07/ /21/ /21/ /08/ /18/ /09/ /04/ /01/

Counting the Number of Observations in Each BY Group DATA COUNT_IT; SET LABS2(KEEP=PATNO); BY PATNO; IF FIRST.PATNO THEN N_VISITS = 1; ELSE N_VISITS + 1; IF LAST.PATNO THEN OUTPUT; RUN; Listing of data set COUNT_IT Obs PATNO N_VISITS

Using PROC FREQ to Output a Data Set Containing Counts PROC FREQ DATA=LABS2 NOPRINT; TABLES PATNO / OUT=COUNTS(KEEP=PATNO COUNT RENAME=(COUNT=N_VISITS)); RUN; Listing of data set COUNTS Obs PATNO N_VISITS

Creating Summary Data Sets Using PROC MEANS PROC MEANS DATA=LABS2 NWAY NOPRINT; CLASS PATNO; VAR HR SBP DBP; OUTPUT OUT = SUMS(DROP=_TYPE_ RENAME=(_FREQ_ = N_VISITS)) N = N_HR N_SBP N_DBP MEAN = M_HR M_SBP M_DBP; RUN;

Creating Summary Data Sets Using PROC MEANS (OUTPUT) Listing of data set SUMS Obs PATNO N_VISITS N_HR N_SBP N_DBP M_HR M_SBP M_DBP

Selecting All Patients with "n" Visits (using PROC FREQ) PROC FREQ DATA=LABS2 NOPRINT; TABLES PATNO / OUT = COUNTS(KEEP=PATNO COUNT RENAME=(COUNT=N_VISITS) WHERE=(N_VISITS = 2)); RUN; DATA TWO_VISITS; MERGE COUNTS(IN=IN_COUNT) LABS2; BY PATNO; IF IN_COUNT; RUN;

Selecting All Patients with "n" Visits (OUTPUT) Listing of data set TWO_VISITS Obs PATNO N_VISITS DATE DOB HR SBP DBP /22/ /08/ /21/ /08/ /11/ /09/ /18/ /09/ /28/ /01/ /04/ /01/

Using PROC SQL and a Macro Variable to Select Observations PROC SQL NOPRINT; SELECT QUOTE(PATNO) INTO :DUP_LIST SEPARATED BY " " FROM LABS2 GROUP BY PATNO HAVING FREQ(PATNO) EQ 2; QUIT; PROC PRINT DATA=LABS2; WHERE PATNO IN (&DUP_LIST); TITLE "Using SQL and a Macro Variable"; RUN; DUP_LIST “001” “008” “013” “123”

Selecting All Patients with "n" Visits (using a Data Step) DATA TWO; SET LABS(KEEP=PATNO); BY PATNO; IF FIRST.PATNO THEN N = 1; ELSE N + 1; IF LAST.PATNO AND N = 2 THEN OUTPUT; RUN; DATA TWO_VISITS; MERGE TWO(IN=IN_COUNT) LABS2; BY PATNO; IF IN_COUNT; RUN;

Looking Ahead Using Multiple SET Statements DATA DOC; PATNO VISIT DOCTOR $3.; FORMAT VISIT MMDDYY10.; DATALINES; /21/1998 ABC /29/1998 XYZ /12/1998 QED /01/1998 ABC /13/1998 QED /15/1998 MAD /06/1998 XYZ /08/1998 QED ; Failure by ABC Failure by XYZ

Looking Ahead Using Multiple SET Statements PROC SORT DATA=DOC; BY PATNO VISIT; RUN; DATA FAILURES; SET DOC; BY PATNO; SET DOC (FIRSTOBS = 2 KEEP = VISIT RENAME = (VISIT = NEXT_VISIT)); IF NOT LAST.PATNO AND (NEXT_VISIT - VISIT) LT 30 THEN OUTPUT; KEEP PATNO VISIT NEXT_VISIT DOCTOR; RUN; /21/1998 ABC /29/1998 XYZ /12/1998 QED /01/1998 ABC /13/1998 QED /15/1998 MAD /06/1998 XYZ /08/1998 QED

Looking Ahead Using Multiple SET Statements DATA FAILURES; SET DOC; BY PATNO; SET DOC (FIRSTOBS = 2 KEEP = VISIT RENAME = (VISIT = NEXT_VISIT)); IF NOT LAST.PATNO AND (NEXT_VISIT - VISIT) LT 30 THEN OUTPUT; KEEP PATNO VISIT NEXT_VISIT DOCTOR; RUN; PROC SORT DATA=DOC; BY PATNO VISIT; RUN; Listing of data set FAILURES Obs PATNO VISIT DOCTOR NEXT_VISIT /21/1998 ABC 10/29/ /06/1998 XYZ 05/08/1998