Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015.

Slides:



Advertisements
Similar presentations
When Good Looks Arent Enough Lisa Eckler. When Good Looks Arent Enough.
Advertisements

KompoZer. This is what KompoZer will look like with a blank document open. As you can see, there are a lot of icons for beginning users. But don't be.
Beyond “The System Shall...” A Journey from Good to Great Requirements.
Systems Analysis, Prototyping and Iteration Systems Analysis.
Content Categorization A Road Map Julia Marshall USAID (Bridgeborn Inc.)
Bowen Yu Programming Practice Midterm, 7/30/2013.
CS305: HCI in SW Development Evaluation (Return to…)
Tailoring Needs Chapter 3. Contents This presentation covers the following: – Design considerations for tailored data-entry screens – Design considerations.
School of Computing, Dublin Institute of Technology.
Macros Tutorial Week 20. Objectives By the end of this tutorial you should understand how to: Create macros Assign macros to events Associate macros with.
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
Chapter 13 Auditing Information Technology
Let SAS Do the Coding for You! Robert Williams Business Info Analyst Sr. WellPoint Inc.
Copyright © 2006, SAS Institute Inc. All rights reserved. Shortcuts- what you may not know that can save you time! Elizabeth Ceranowski SAS Student Programs.
Other Features Index and table of contents Macros and VBA.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Advanced Tables Lesson 9. Objectives Creating a Custom Table When a table template doesn’t suit your needs, you can create a custom table in Design view.
Problems with reuse – Increased maintenance costs; lack of tool support; not-invented- here syndrome; creating, maintaining, and using a component library.
Organizing Your Data for Statistical Analysis in SPSS
Database Applications – Microsoft Access Lesson 2 Modifying a Table and Creating a Form 45 slides in presentation Accessibility check 9/14.
Writing a Persuasive Essay
Database Tables two order-entry scenarios: A customer wants to cancel an order that she's placed. If her address is in a separate table from her.
BEYOND CITATION: Using NoodleBib to Organize and Synthesize Research Information Christy Batelka SLM
Overview: Humans are unique creatures. Everything we do is slightly different from everyone else. Even though many times these differences are so minute.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Microsoft ® Office Access ™ 2007 Training Choose between Access and Excel ICT Staff Development presents:
Moodle (Course Management Systems). Assignments 1 Assignments are a refreshingly simple method for collecting student work. They are a simple and flexible.
How to make a Power Point Relationship Narrative By Lisa Williams.
Question 10 What do I write?. Spreadsheet Make sure that you have got a printout of your spreadsheet - no spreadsheet, no marks!
SAM 2010 v1.5 Student Walkthrough. Initial Set Up 1.Ensure that you are connected to the Internet. 2.Launch your web browser (Internet Explorer 7 or 8,
What is a Business Analyst? A Business Analyst is someone who works as a liaison among stakeholders in order to elicit, analyze, communicate and validate.
This chapter is extracted from Sommerville’s slides. Text book chapter
Emission Inventory Quality Assurance/Quality Control (QA/QC) Melinda Ronca-Battista ITEP/TAMS Center.
Colleague, Excel & Word Best of Friends Presented by: Joan Kaun & Yvonne Nelson College of the Rockies.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
CS 350, slide set 5 M. Overstreet Old Dominion University Spring 2005.
Hello! This presentation is designed to be used with your students to help them join your class and create unique usernames and passwords.
Team # 2 Members: Sowmya Krishnaswamy Hakan Terzioglu Manu Mehan Jerome Tunaya.
Copyright © , Satisfice, Inc. V1. James Bach, Satisfice, Inc. (540)
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part II.
Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.
Conditional Statements.  Quiz  Hand in your jQuery exercises from last lecture  They don't have to be 100% perfect to get full credit  They do have.
YET ANOTHER TIPS, TRICKS, TRAPS, TECHNIQUES PRESENTATION: A Random Selection of What I Learned From 15+ Years of SAS Programming John Pirnat Kaiser Permanente.
Click to add text Systems Analysis, Prototyping and Iteration.
P51UST: Unix and SoftwareTools Unix and Software Tools (P51UST) Version Control Systems Ruibin Bai (Room AB326) Division of Computer Science The University.
1/5/2016Slide 1 We will use a one-sample test of proportions to test whether or not our sample proportion supports the population proportion from which.
Chapter 7: Macros in SAS  Macros provide for more flexible programming in SAS  Macros make SAS more “object-oriented”, like R 1 © Fall 2011 John Grego.
Logical Operators.  Quiz  Let's look at the schedule  Logical Operators 2.
BMTRY 789 Lecture9: Proc Tabulate Readings – Chapter 11 & Selected SUGI Reading Lab Problems , 11.2 Homework Due Next Week– HW6.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
Automated Testing April 2001WISQA Meeting Ronald Utz, Automated Software Testing Analyst April 11, 2001.
© The McGraw-Hill Companies, Inc., 2002 McGraw-Hill/Irwin Accounting Systems For Measuring Costs Chapter 17.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
In today’s lesson we will be looking at: what we mean by the software development lifecycle the phases in the lifecycle We will focus particularly on testing:
What every benchmarking coordinator needs to know
Data Virtualization Tutorial: Introduction to SQL Script
Test Around the Clock Testing Revolutionized
Data Virtualization Demoette… Flat-File Data Sources
Pivot Tables, Macros and VBA
Data Virtualization Tutorial: XSLT and Streaming Transformations
CS 641 – Requirements Engineering
CS 641 – Requirements Engineering
Object oriented system development life cycle
Fundamentals of Data Structures
Essay #1: Your Goals as a Writer
CSE 303 Concepts and Tools for Software Development
Joining an eService Class.
Joining an eService Class.
Software Development Techniques
Presentation transcript:

Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS Holistic approach Allocate most effort to what’s most important Avoid or automate repetitive tasks Ask ourselves the right questions

Common Sense Validation Using SAS Defining terms: QA Data quality assurance is the process of profiling the data to discover inconsistencies, and other anomalies in the data and performing data cleansing activities to improve the data quality. – Wikipedia

Common Sense Validation Using SAS Defining terms: Verification Verification is the act of reviewing, inspecting, testing, etc. to establish and document that a product, service, or system meets the regulatory, standard, or specification requirements.  Does it meet the structural requirements?  Is it complete ?

Common Sense Validation Using SAS Defining terms: Validation Validation refers to meeting the needs of the intended end-user or customer.  Does it answer the user’s question?  Does it meet all of the needs?  Structure and completeness, data integrity, appropriateness

Common Sense Validation Using SAS – Pablo Picasso “Computers are useless. They can only give you answers.”

Common Sense Validation Using SAS How do I know if I got it right?

Common Sense Validation Using SAS Is Validation a programming task?  Yes – mostly  The routine parts can and should be automated and repeatable  That leaves more resources for the parts which require human attention

Common Sense Validation Using SAS  PROC COMPARE  PROC CONTENTS  PROC CONTENTS with compare (using PROC COMPARE or TRANSPOSE, MERGE and flag)  PROC FREQ +/- PROC FORMAT  PROC SUMMARY  PROC SUMMARY + compare (using PROC COMPARE or TRANSPOSE, MERGE and flag)

Common Sense Validation Using SAS  PROC COMPARE  PROC CONTENTS  PROC CONTENTS with compare (using PROC COMPARE or TRANSPOSE, MERGE and flag)  PROC FREQ +/- PROC FORMAT  PROC SUMMARY  PROC SUMMARY + compare (using PROC COMPARE or TRANSPOSE, MERGE and flag) Wrap a macro around this and you have a flexible, re-usable tool!

Common Sense Validation Using SAS Does this mean writing more SAS code after I thought I was finished writing SAS code? Yes… and no We can save time and improve the quality of results by using code that isn’t part of the final program. Don’t think of it as disposable, though: this code can be set up once and saved to use for all future validation efforts. Additional benefits Automated validation provides a log Easily repeatable

Common Sense Validation Using SAS What are the questions? Should this be a replication of something I have seen before? If not, is it similar to something I’ve done before? Is it – or some part of it – supposed to be different from anything I’ve seen before? Is the result packaged properly?

Common Sense Validation Using SAS Mantra for Validation Check your assumptions Confirm similarities Focus on differences

Common Sense Validation Using SAS How is this result expected to compare with what we’ve seen before? Entirely differentSome overlap Complete overlap Subset

Common Sense Validation Using SAS Some possibilities – not an exhaustive list!

Common Sense Validation Using SAS ** This is the simplest form of **; ** comparison between two sets of data **; proc compare compare = SHOES base = OLD_SHOES; run ;

Common Sense Validation Using SAS

** PROC CONTENTS gives us metadata **; proc contents data = OLD_SHOES; run ;

Common Sense Validation Using SAS

** CONTENTS with select facts saved to **; ** a data set --> a table of metadata **; proc contents data = OLD_SHOES out = CONTENTS_OLD_SHOES (keep=name type length); run ;

Common Sense Validation Using SAS ** Same as previous slide except for the **; ** new data set **; proc contents data = NEW_SHOES out = CONTENTS_NEW_SHOES (keep=name type length); run ;

Common Sense Validation Using SAS ** Comparing metadata tables rather than **; ** data tables **; proc compare compare = CONTENTS_OLD_SHOES base = CONTENTS_NEW_SHOES; run ;

Common Sense Validation Using SAS

proc contents data = OLD_SHOES out = CONTENTS1(keep=name type length); run ; proc contents data = NEW_SHOES out = CONTENTS2(keep=name type length); run ; proc compare compare = CONTENTS1 base = CONTENTS2; run ;

Common Sense Validation Using SAS %macro COMPARE_STRUCTURE1 ; proc contents data = OLD_SHOES out = CONTENTS1(keep=name type length); run; proc contents data = NEW_SHOES out = CONTENTS2(keep=name type length); run; proc compare compare = CONTENTS1 base = CONTENTS2; run; %mend COMPARE_STRUCTURE1; % COMPARE_STRUCTURE1 ;

Common Sense Validation Using SAS %macro COMPARE_STRUCTURE(DS1,DS2); proc contents data = &DS1 out = CONTENTS1(keep=name type length); run; proc contents data = &DS2 out = CONTENTS2(keep=name type length); run; proc compare compare = CONTENTS1 base = CONTENTS2; run; %mend COMPARE_STRUCTURE; % COMPARE_STRUCTURE (OLD_SHOES, NEW_SHOES);

Common Sense Validation Using SAS %macro COMPARE_STRUCTURE(DS1,DS2); proc contents data = &DS1 out = CONTENTS1(keep=name type length); run; proc contents data = &DS2 out = CONTENTS2(keep=name type length); run; proc compare compare = CONTENTS1 base = CONTENTS2; run; %mend COMPARE_STRUCTURE; % COMPARE_STRUCTURE (OLD_SHOES, NEW_SHOES);

Common Sense Validation Using SAS ** We've just built a generic tool for comparing **; ** the STRUCTURE of any two SAS data sets **; % COMPARE_STRUCTURE (, );

Common Sense Validation Using SAS Reasonableness: complete overlap

Common Sense Validation Using SAS Reasonableness: complete overlap “_character_” gives the list of ALL vars in the table with data type character, which may include some vars with too many values

Common Sense Validation Using SAS Reasonableness: complete overlap This code also gives a list of ALL vars in the table with data type character

Common Sense Validation Using SAS Reasonableness: complete overlap The above code lets us customize our list to exclude non-categorical character columns and include the others

Common Sense Validation Using SAS Reasonableness: complete overlap

Common Sense Validation Using SAS Reasonableness: complete overlap

Common Sense Validation Using SAS Reasonableness: complete overlap Similar to the way we compared the structure of two tables, we can compare the frequency counts of values in two tables

Common Sense Validation Using SAS proc compare compare = OLD_SHOES base = NEW_SHOES; run ;  Judicious use of unrestricted PROC COMPARE -- after confirming reasonableness Data correctness: complete overlap

Common Sense Validation Using SAS If we are expecting a result that is a complete replication of something that already exists Confirm that the structure is identical Confirm that the data is the same at a high level Confirm that the data is the same at a detailed level  fully automated

Common Sense Validation Using SAS What if we don’t have an existing results table to compare to? Similar SAS data in an existing table or produced by someone else? Similar data in some other format that can be imported into SAS for comparison? Do we have a data requirements document? The truly original data will require much greater attention to validation and the involvement of a subject matter expert Data correctness: completely new

Common Sense Validation Using SAS Packaging: completely new

Common Sense Validation Using SAS Assuming we have a Requirements “document”… Import REQUIREMENTS into SAS data set run PROC CONTENTS on new data set to get CONTENTS_NEW_SHOES run PROC COMPARE, comparing CONTENTS_NEW_SHOES to REQUIREMENTS OR Join REQUIREMENTS with CONTENTS_NEW_SHOES and flag non-matching rows Packaging: completely new

Common Sense Validation Using SAS Packaging: completely new

Common Sense Validation Using SAS Packaging: completely new

Common Sense Validation Using SAS Reasonableness: completely new

Common Sense Validation Using SAS Reasonableness: completely new

Common Sense Validation Using SAS

What if part of our result should be the same as an existing result but there should be some differences? Treat it as a hybrid and split the validation exercise into two parts Expected same (by rows, columns, data or metadata) Expected different (by rows, columns, data or metadata)

Common Sense Validation Using SAS For each of the two parts Confirm (expected) similarities Focus efforts on (expected) differences Run the validation procedures we’ve alreay looked at as appropriate for the “same” and “different” aspects

Common Sense Validation Using SAS Recall the scenario where our data sets should be identical record _id abc 1*** 2*** 3*** abc 1*** 2*** 3***

Common Sense Validation Using SAS record _id abc 1** 2** 3** abd 1** 2** 3** When some columns should be the same 50

Common Sense Validation Using SAS record _id abc 1** 2** 3** abd 1** 3** 4 When some “cells” (parts of rows and columns) should be the same

Common Sense Validation Using SAS Review:

Common Sense Validation Using SAS Summary:  Ask the right questions  Confirm similarities with known things – quickly and programmatically – then focus time and effort on validating “unknown” or new things  Basic base SAS procedures for validation  vary the technique based on how much is similar/different from what you’ve validated previously and what types of data are involved

Common Sense Validation Using SAS You can find my related conference papers at Don’t Forget About Small Data (SESUG 2015) When Good Looks Aren’t Enough (NESUG 2009) If you have comments or questions…