OASUS Spring or Fall YYYY

Slides:



Advertisements
Similar presentations
PROC_CODEBOOK: An Automated, General Purpose Codebook Generator
Advertisements

Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
Using SAS PROC FCMP Cheryl Xiyun Wang Statistics Canada
Scion Macros How to make macros for Scion The Fast and Easy Guide.
Chapter 6. Character String Types It is one in which the values consists of sequences of characters. How to Define a variable contain a string? In a programming.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
BIL 104E Introduction to Scientific and Engineering Computing Lecture 4.
Data quality & VALIDATION
Secure Coding Rules for C++ Copyright © 2016 Curt Hill
Tips for Mastering Relational Databases Using SAS/ACCESS®
Week 3-4 Control flow (review) Function definition Program Structures
Chapter 11 Reading SAS Data
By Sasikumar Palanisamy
Topics Introduction to Functions Defining and Calling a Void Function
Types CSCE 314 Spring 2016.
Using ODS Excel Migrating from DDE to ODS
Control Structures Combine individual statements into a single logical unit with one entry point and one exit point. Used to regulate the flow of execution.
Experience and process for collaborating with an outsource company to create the define file. Ganesh Sankaran TAKE Solutions.
Unit 16 – Database Systems
Expressions An expression is a portion of a C++ statement that performs an evaluation of some kind Generally requires that a computation or data manipulation.
Jonathan W. Duggins; James Blum NC State University; UNC Wilmington
UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing April 2017 The Hague,
Regular Languages.
Deitel- C:How to Program (5ed)
Kevin Moore Head of Platforms Development and Support Branch
Secure Coding Rules for C++ Copyright © Curt Hill
United States Department of Agriculture
OASUS Spring or Fall YYYY
Advanced Analytics Using Enterprise Miner
Creating ADaM Friendly Analysis Data from SDTM Using Meta-data by Erik Brun & Rico Schiller (CD ) H. Lundbeck A/S 13-Oct
Unit Test Pattern.
Some ways to encourage quality programming
Chapter 18: Modifying SAS Data Sets and Tracking Changes
Chapter 5 - Functions Outline 5.1 Introduction
By Don Henderson PhilaSUG, June 18, 2018
Instructor: Raul Cruz-Cano
Conditional Processing
OASUS Spring or Fall YYYY Lihsin Hwang Statistics Canada
Fall 2017 Questions and Answers (Q&A)
Make your SAS programs ready for any language
Chapter 7: Macros in SAS Macros provide for more flexible programming in SAS Macros make SAS more “object-oriented”, like R Not a strong suit of text ©
Topics Introduction to File Input and Output
Chapter 8: Advanced Pattern Matching
Programming Logic and Design Fourth Edition, Comprehensive
6 Chapter Functions.
An Approach to Standard Programming in a Clinical Data Repository
Generating Variable Attributes for Define 2.0
Coding Concepts (Basics)
3 Iterative Processing.
Hunter Glanz & Josh Horstman
Introduction to DATA Step Programming: SAS Basics II
Stay Connected to Work Away from Work: A Simple Approach to Send s from SAS® using VBScript Paper #
A First Book of ANSI C Fourth Edition
Lab 3 and HRP259 Lab and Combining (with SQL)
Never Cut and Paste Again
Lab 2 and Merging Data (with SQL)
Topics Introduction to Value-returning Functions: Generating Random Numbers Writing Your Own Value-Returning Functions The math Module Storing Functions.
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Automate Repetitive Programming Tasks: Effective SAS® Code Generators
You deserve ARRAYs! How to be more efficient using SAS®
Topics Introduction to Functions Defining and Calling a Function
Data Definition Language
An ACEware Presentation
OPS-14: Effective OpenEdge® Database Configuration
Topics Introduction to File Input and Output
Enforcing Data Integrity with SAS Audit Trails
Writing Robust SAS Macros
Framework Anil
Presentation transcript:

OASUS Spring or Fall YYYY Wednesday, June-20-18 Generic Programming Techniques For Input Validation and Imputation Cheryl Xiyun Wang Statistics Canada May 19 2016, Ottawa, Canada First & last name Company name

Agenda Input Parameters for SAS Macro Common Types & Rules for Parameters Define Rules for Parameters, File Structure and Data Three Levels of Validation & Imputation Generic Process Flow Generic Programming on Numeric Type Create Generic SAS macros Examples

Input Parameters for SAS Macro Wrap your SAS Code into a SAS Macro Parameter: Type & Validate/Imputation Rules Parameter Type Validation Imputation x Numeric Of Numeric Type If not provided, x=0 y If not provided, y=0

Input Parameters for SAS Macro (Cont’d) %macro _generateIDs( inLib=, inDSN=, inDSNKey=, outDSN= ); /*generate internal sequential ID as 1, 2, ……, N*/ %mend; Parameter: type & validate/imputation rules Parameter Type Validation Imputation inLib Library Valid library name & exist default to SAS WORK inDSN Dataset Valid dataset name & exist inDSNKey Character Of Character Type outDSN Valid dataset name &inDSN Dataset: variable and data validate/imputation rules inDSN variable Type Validation Imputation inDSNKey Character Unique; required

Common Types & Rules for Parameters Parameter Type Validation Imputation SAS Library Valid name & exist Default to WORK SAS Dataset Valid name &/or exist Character Customized Constraints Numeric Integer Boolean_YES_NO Must be YES or NO Default to YES Boolean_1_0 Must be 1 or 0 Default to 1 Date/Time Of Date/Time type Default to TODAY() or TIME(); or &sysdate; External File Name Folder Path Default: path of WORK

Common Rules for Input Dataset 1) Record Layout Variable names Variable types: character or numeric Required? If optional, add the variable automatically Length 2) Data Inside Dataset Variable name Unique values? Constraints on the variable (range, positive, list, etc.) Default value if value is missing

Define Rules for Parameters Key Info %_sum(x=,y=) member_name x y object_type MACRO object_name _sum member_IOtype INPUT member_type NUMERIC member_default required NO constraints %sysevalf(&x>=0.0) and %sysevalf(&x<=100.0) %sysevalf(&y>=0.0) and %sysevalf(&y<2000.0) errMsg_en Parameter error: x=<&x> is not in range of [0.0,100.0] Parameter error: y=<&y> is not in range of [0.0,2000.0) errMsg_fr Erreur de paramètre: x=<&x> n’est pas dans la gamme valide [0.0,100.0] Erreur de paramètre: y=<&y> n’est pas dans la gamme valide [0.0,2000.0)

Define Rules for File Structure and Data Key Info %_generateIDs(inLib=, inDSN=, inDSNKey=, outDSN= ); member_name inDSNKey object_type DATASET object_name &inDSN member_IOtype INPUT member_type Character member_default required YES constraints (substr(upcase(“&inDSNKey”),1,5)=“TEST_”) errMsg_en The data for key variable <&inDSNKey> in dataset <&inDSN> must have a prefix “TEST_” errMsg_fr Les donées de la variable clé <&inDSNKey> dans <&inDSN> doit avoir un préfixe “TEST_”

Three Levels of Validation & Imputation on Input for SAS Macro 1) Macro parameters level Validations: all parameters satisfy defined rules Imputations: set to default if possible 2) Input dataset file structure level, if any Validations: variable exist, type, length Imputations: add variables with right types 3) Data inside input dataset level, if any Validations: data satisfy defined rules Imputations: set default values

Generic Processing Flow

Generic Programming on Numeric Type

Create Generic SAS Macros for Three Level of Validation and Imputation 1) Macro parameters level %genericParamsValidation(inMacroName=, rulesFile=); %genericMacroParamValidation (inMacroName=, paramName=, paramType=, paramValue=, paramReqFlag=, paramDefault=, paramInvalid_condition=, paramErrMsg=, paramIOType=);

Create Generic SAS Macros for Three Level of Validation and Imputation (Cont’d) 2) Input File Structure Level (if Any) %genericFileStructValidation( inMacroName=, inDSN=, RulesFile=); 3) Data Level (if Any) %genericDataValidation( keyID=,

Example 1: %_sum(x=,y=) Without validation and imputation (code)

Example 1: %_sum(x=,y=) (Cont’d) Without validation and imputation (SAS Log)

Example 1: %_sum(x=,y=) (Cont’d) With Validation and Imputation - Define Rules into ParamRules.xls

Example 1: %_sum(x=,y=) (Cont’d) With Validation and Imputation (code)

Example 1: %_sum(x=,y=) (Cont’d) With Validation and Imputation (SAS Log)

Example 2 - Reusability A new SAS macro to be defined as %Celsius_Fahrenheit_Conversion( inDegree=, inDegreeType=); Validation and Imputation Rules are defined into ParamRules.xls Re-use the generic macro to validate the input parameters inDegree and inDegreeType

Example 2 – Reusability (Cont’d) Define Rules in paramRules.xlsx

Example 2 – Reusability (Cont’d)

Example 2 – Reusability (Cont’d) %put ---case 1---; %Celsius_Fahrenheit_Conversion(inDegree=100,inDegreeType=c); %put ---case 2---; %Celsius_Fahrenheit_Conversion(inDegree=100,inDegreeType=f); %put ---case 3---; %Celsius_Fahrenheit_Conversion(inDegree=324C,inDegreeType=C); %put ---case 4---; %Celsius_Fahrenheit_Conversion(inDegree=324C,inDegreeType=1234); %put ---case 5---; %Celsius_Fahrenheit_Conversion(inDegree=100,inDegreeType=);

---case 1--- 100 Celsius = 212 Fahrenheit ---case 2--- 100 Fahrenheit = 37.7777777777777 Celsius ---case 3--- ERROR: For macro <CELSIUS_FAHRENHEIT_CONVERSION>, Parameter <INDEGREE> should be <NUMERIC> type ERROR: -->Celsius_Fahrenheit_Conversion stops due to list of errors ---case 4--- ERROR: For macro <CELSIUS_FAHRENHEIT_CONVERSION>, Parameter <INDEGREETYPE> should be <CHARACTER> type ERROR: For macro <CELSIUS_FAHRENHEIT_CONVERSION>, MACRO PARAMETER ERROR FOUND: INDEGREETYPE=<1234> MUST BE C OR F ---case 5--- WARNING: For macro <CELSIUS_FAHRENHEIT_CONVERSION>, macro parameter of <CHARACTER> type <INDEGREETYPE> is empty and set to default as <C> Example 1 - Macro Parameter Level of Validation and Imputation (Cont’d) Testing: %put ---case 1---; %Celsius_Fahrenheit_Conversion(inDegree=100,inDegreeType=c); %put ---case 2---; %Celsius_Fahrenheit_Conversion(inDegree=100,inDegreeType=f); %put ---case 3---; %Celsius_Fahrenheit_Conversion(inDegree=324C,inDegreeType=C); %put ---case 4---; %Celsius_Fahrenheit_Conversion(inDegree=324C,inDegreeType=1234; %put ---case 5---; %Celsius_Fahrenheit_Conversion(inDegree=100,inDegreeType=);

Benefits Efficiency in system development Reusability Easy maintainability Coding consistency Modularization Work on Rules and generic macros by one group Work on system core processing by another group

Challenges Define Rules properly (Global view of the validation and imputation processes; right sequences of processing steps) Rules for validation and imputation have to be written in SAS® syntax(data step syntax or SAS macro syntax). Review and testing done separately Exceptions; efforts on analysis and design of Rules driven portion and non-Rule driven portion Interleaving two datasets: Rules Datasets and input files

Questions Xiyun Cheryl Wang Systems Team Leader / Chef d’équipe de Systèmes Statistics Canada / Statistiques Canada 150 Tunney's Pasture Driveway Ottawa, Ontario, Canada, K1A 0T6 Cherylxiyun.Wang@canada.ca (613) 797-9853 SAS Paper: 012-2013: A Metadata-Driven Programming Technique Using SAS