Axio Research E-Compare A Tool for Data Review Bill Coar.

Slides:



Advertisements
Similar presentations
Effecting Efficiency Effortlessly Daniel Carden, Quanticate.
Advertisements

Minitab® 15 Tips and Tricks
ASENT_IMPORT.PPT Importing Board Data Last revised 08/10/2005.
Chapter 3: Editing and Debugging SAS Programs. Some useful tips of using Program Editor Add line number: In the Command Box, type num, enter. Save SAS.
Axio Research Idea to Application via SAS Macro Language Reading Directories Bill Coar
Tutorial 8: Developing an Excel Application
Introduction to SQL Session 2 Retrieving Data From Multiple Tables.
Introduction to SQL Session 1 Retrieving Data From a Single Table.
ASP.NET Programming with C# and SQL Server First Edition Chapter 8 Manipulating SQL Server Databases with ASP.NET.
1 Chapter 2 Reviewing Tables and Queries. 2 Chapter Objectives Identify the steps required to develop an Access application Specify the characteristics.
Using Proc Datasets for Efficiency Originally presented as a Coder’s NESUG2000 by Ken Friedman Reviewed by Karol Katz.
Let SAS Do the Coding for You! Robert Williams Business Info Analyst Sr. WellPoint Inc.
1 Access Lesson 6 Integrating Access Microsoft Office 2010 Introductory Pasewark & Pasewark.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 4: Working with Forms.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Akhila Kondai October 30, 2013.
CHAPTER 9 DATABASE MANAGEMENT © Prepared By: Razif Razali.
Copyright © 2006, SAS Institute Inc. All rights reserved. Enterprise Guide 4.2 : A Primer SHRUG : Spring 2010 Presented by: Josée Ranger-Lacroix SAS Institute.
4-1 INTERNET DATABASE CONNECTOR Colorado Technical University IT420 Tim Peterson.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Spreadsheet-Based Decision Support Systems Chapter 22:
First Screen : First window form will always remain open, for the user to select menu options. 1.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
An Animated Guide©: Sending SAS files to Excel Concentrating on a D.D.E. Macro.
Winrunner Usage - Best Practices S.A.Christopher.
PHP meets MySQL.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
SAS Macro: Some Tips for Debugging Stat St. Paul’s Hospital April 2, 2007.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Basic & Advanced Reporting in TIMSNT ** Part Two **
Chapter 7 File I/O 1. File, Record & Field 2 The file is just a chunk of disk space set aside for data and given a name. The computer has no idea what.
MS Access 2007 Management Information Systems 1. Overview 2  What is MS Access?  Access Terminology  Access Window  Database Window  Create New Database.
Introduction to Using the Data Step Hash Object with Large Data Sets Richard Allen Peak Stat.
SQL Chapter Two. Overview Basic Structure Verifying Statements Specifying Columns Specifying Rows.
1 Efficient SAS Coding with Proc SQL When Proc SQL is Easier than Traditional SAS Approaches Mike Atkinson, May 4, 2005.
Define your Own SAS® Command Line Commands Duong Tran – Independent Contractor, London, UK Define your Own SAS® Command Line Commands Duong Tran – Independent.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
ITGS Databases.
PowerBuilder Online Courses - by Prasad Bodepudi Database Painter Primary & Foreign Keys Extended Attributes PowerBuilder System Tables Database Profiles.
SAS Basics. Windows Program Editor Write/edit all your statements here. Log Watch this for any errors in program as it runs. Output Will automatically.
Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Aliya Farheen October 29,2015.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
SAS Basics. Windows Program Editor Write/edit all your statement here.
Time Series Data Processes by Tai Yu April 15, 2013.
An Introduction Katherine Nicholas & Liqiong Fan.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
Copyright 2009 The Little Engine That Could: Using EXCEL LIBNAME Engine Options to Enhance Data Transfers between SAS® and Microsoft® Excel Files William.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
Build your Metadata with PROC CONTENTS and ODS OUTPUT Louise S. Hadden Abt Associates Inc.
Better Metadata Through SAS® II: %SYSFUNC, PROC DATASETS, and Dictionary Tables.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
Hints and Tips SAUSAG Q SORTING – NOUNIQUEKEY The NOUNIQUEKEY option on PROC SORT is a useful way in 9.3 to easily retain only those records with.
ASP.NET Programming with C# and SQL Server First Edition
Chapter 6: Modifying and Combining Data Sets
Chapter 18: Modifying SAS Data Sets and Tracking Changes
By Don Henderson PhilaSUG, June 18, 2018
Tamara Arenovich Tony Panzarella
SAS Essentials How SAS Thinks
A SAS macro to check SDTM domains against controlled terminology
Beautiful PROC CONTENTS Output Using the ODS Excel Destination
3 Iterative Processing.
Hans Baumgartner Penn State University
Writing Robust SAS Macros
Presentation transcript:

Axio Research E-Compare A Tool for Data Review Bill Coar

Motivation Consider the case when programming with near final data Begin running some standard validation checks Identify problem records and request changes Desire to know all changes are made, and no unexpected changes occurred

Motivation Consider the case where you receive accumulating data throughout the life of a project In each iteration, some data has already been reviewed and queried For subsequent reviews –Wish to know the requested changes were made –Only review data that is new Goal is to develop a tool using SAS to assist in these areas of data review

Outline Identify the goals of the tool (E-compare) Introduction and steps of E-compare Look at some examples Extension to comparing many datasets Final remarks

Goals Based on needs of data management group and clinical scientists –Identify new records –Identify which records were changed Review new values versus old values –Identify records that did not change –Identify records that were deleted

Proc Compare Compares (two) datasets (based on key variables) –Base versus compare Identify attributes that differ Identify variables\records in one but not the other Allows for variable names to differ but values be compared Can set tolerances for defining what is really “different” Many other procedure options to assist

Basics of Proc Compare Proc compare base=basedata compare=compddata listvar listobs; id key variables; var var1 var2 var3; with ovar1 ovar2 ovar3; Run; In preparing for this presentation, I found the TRANSPOSE option that might help!

Proc Compare Pros –Displays a lot of relevant information –Fairly straightforward Cons –Not always easy to read Amount of text that gets displayed for differences –Non-SAS users seem to be intimidated by it

Introduction to E-compare Idea originated from talking with data managers and clinical scientists Different group with different needs Many not comfortable working within SAS –Excel –Review listings Desire for repeatability Extend to many datasets –D-compare

Introduction to E-compare Parameters: –Base data, compare data, key variables, variables to compare (optional), output data, debugging indicator Assumes the same data structure, and that the key variables exist Uniqueness identified by key variables Output is a SAS dataset with essentially the same structure as the input datasets –One additional flag to identify the results of the compare

Steps in E-Compare Sorting and creating working copies of input datasets Check for uniqueness based on key variables –First. and last. on the last key variable –Check both the base and compare datasets If there are records with duplicate key variables –Print a message in the output and log –Goto the end of the macro to stop execution %goto NOEXEC;. %NOEXEC: %mend;

Steps in E-Compare Merge on key variables, create 3 datasets –NEW records (zz_newrecs) –DELETED records (zz_delrecs) –Records in BOTH datasets needed to identify differences (zz_both) Perform proc compare –ID key variables –Default compares all variables –Obtain the output dataset using OUT= and OUTNOEQUAL options

Steps in E-Compare Straight-forward merge… data zz_newrecs zz_delrecs zz_both; merge zz_comp(in=a keep=&keyvar) zz_base(in=b keep=&keyvar); by &keyvar; if a and ^b then output zz_newrecs; if b and ^a then output zz_delrecs; if (a and b) then output zz_both; run;

Steps in E-Compare Straight-forward proc compare proc compare base=zz_base compare=zz_comp out=zz_cout noprint outnoequal; id &keyvar; %if &compvar ne ALL %then %do; var &compvar; %end; run;

Steps in E-Compare If a record changed, it is in the output data (zz_cout) from proc compare due to the OUTNOEQUAL option Merge various datasets on key variables Identify records that did not change –Remerge ZZ_COUT with ZZ_BOTH to obtain records that did not change For records that did change –Remerge ZZ_COUT with ZZ_BASE to obtain old values –Remerge ZZ_COUT with ZZ_COMP to obtain new values

Steps in E-Compare Set 5 datasets together and define flags using the in= option –1 - No change –2 - Change from –3 - Change to –4 - New record –5 - Deleted record Clean up work space by deleting interim data, unless –DEBUG option is specified to be TRUE

Steps in E-Compare Basic set statement… data &out; set zz_nodiff(in=a) zz_diffbase(in=b) zz_diffcomp(in=c) zz_newcomp(in=e) zz_delbase(in=f) ; by &keyvar; length zz_compflg $15; if a then zz_compflg='1 - No Change'; else if b then zz_compflg='2 - Change From'; else if c then zz_compflg='3 - Change To'; else if d then zz_compflg=‘4 - Rec Added'; else if e then zz_compflg=‘5 - Rec Deleted'; label zz_compflg='Per record comparison'; Run;

Steps in E-Compare Some cleaning up of the work space… %if &debug=F %then %do; proc datasets library=work nodetails nolist; delete zz_: / memtype=data; quit; %end;

Steps in E-Compare Note about DEBUG –If macro does not execute because of non-uniqueness in key variables, set DEBUG=TRUE –This does not delete the working datasets –Allows one to identify the problem records using a viewtable

E-compare What E-compare does not do: –Does not identify the variable that changed –Does not indicate if the attributes of a variable change –Does not actually generate a report Generation of a report can be added, but… –This component was considered in extending E-compare to all corresponding datasets in two libraries allowing for a single output –Proc report or export to Excel –This part is defined by the needs of the users

E-compare Example Output Creation of RTF via Proc Report and ODS Creation of Excel file via SAS Access to PC File formats or ODBCCreation of Excel file via SAS Access to PC File formats or ODBC Consider repeating E-compare on all datasets in two libraries

Schematic of D-compare with Excel Output Use proc contents output to obtain information about datasets in each Identify mismatches (in one library but not the other) Subset using a list of datasets to exclude Obtain a list of datasets for looping

Schematic of D-compare with Excel Output Check if the Excel file exists (may need to delete) For each iteration, identify key variables from a proc format and %sysfunc For each iteration, perform E-compare For each iteration, update the Excel file –Select records to include –SAS\Access to PC File Formats –SAS\Access to ODBC %let kvars=%sysfunc(putc(&&MEM&I,$fmtname.));

D-compare with Excel Output Proc export –Requires SAS\Access to PC File Formats –Specify the SHEET to have the name of the dataset being compared –Appends to the excel file if it exists proc export data=zz_fnl outfile="&OUTFILE" DBMS=excel; sheet="&&MEM&I."; run;

D-compare with Excel Output Export using a data step and ODBC –Requires SAS\Access to ODBC –libname prior to iteration through each dataset –Data step to append within each iteration LIBNAME _lbxls odbc NOprompt= "dsn=Excel Files; Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)}; dbq=&OUTFILE"; DATA _lbxls.&&MEM&I; SET zz_fnl; run;

E-compare Example Output Creation of Excel file via SAS Access to PC File formats or ODBCCreation of Excel file via SAS Access to PC File formats or ODBC

Conclusions E-compare is just a different way of looking at Proc Compare results Provides the ability to monitor data as changes are applied to the central database Reports can be printed or saved to assist in documentation Strict data structures allow for simplification across studies

Any Question? Conclusions