Download presentation
Presentation is loading. Please wait.
1
OASUS Spring or Fall YYYY
Wednesday, June-20-18 Generic Programming Techniques For Input Validation and Imputation Cheryl Xiyun Wang Statistics Canada May , Ottawa, Canada First & last name Company name
2
Agenda Input Parameters for SAS Macro
Common Types & Rules for Parameters Define Rules for Parameters, File Structure and Data Three Levels of Validation & Imputation Generic Process Flow Generic Programming on Numeric Type Create Generic SAS macros Examples
3
Input Parameters for SAS Macro
Wrap your SAS Code into a SAS Macro Parameter: Type & Validate/Imputation Rules Parameter Type Validation Imputation x Numeric Of Numeric Type If not provided, x=0 y If not provided, y=0
4
Input Parameters for SAS Macro (Cont’d)
%macro _generateIDs( inLib=, inDSN=, inDSNKey=, outDSN= ); /*generate internal sequential ID as 1, 2, ……, N*/ %mend; Parameter: type & validate/imputation rules Parameter Type Validation Imputation inLib Library Valid library name & exist default to SAS WORK inDSN Dataset Valid dataset name & exist inDSNKey Character Of Character Type outDSN Valid dataset name &inDSN Dataset: variable and data validate/imputation rules inDSN variable Type Validation Imputation inDSNKey Character Unique; required
5
Common Types & Rules for Parameters
Parameter Type Validation Imputation SAS Library Valid name & exist Default to WORK SAS Dataset Valid name &/or exist Character Customized Constraints Numeric Integer Boolean_YES_NO Must be YES or NO Default to YES Boolean_1_0 Must be 1 or 0 Default to 1 Date/Time Of Date/Time type Default to TODAY() or TIME(); or &sysdate; External File Name Folder Path Default: path of WORK
6
Common Rules for Input Dataset
1) Record Layout Variable names Variable types: character or numeric Required? If optional, add the variable automatically Length 2) Data Inside Dataset Variable name Unique values? Constraints on the variable (range, positive, list, etc.) Default value if value is missing
7
Define Rules for Parameters
Key Info %_sum(x=,y=) member_name x y object_type MACRO object_name _sum member_IOtype INPUT member_type NUMERIC member_default required NO constraints %sysevalf(&x>=0.0) and %sysevalf(&x<=100.0) %sysevalf(&y>=0.0) and %sysevalf(&y<2000.0) errMsg_en Parameter error: x=<&x> is not in range of [0.0,100.0] Parameter error: y=<&y> is not in range of [0.0,2000.0) errMsg_fr Erreur de paramètre: x=<&x> n’est pas dans la gamme valide [0.0,100.0] Erreur de paramètre: y=<&y> n’est pas dans la gamme valide [0.0,2000.0)
8
Define Rules for File Structure and Data
Key Info %_generateIDs(inLib=, inDSN=, inDSNKey=, outDSN= ); member_name inDSNKey object_type DATASET object_name &inDSN member_IOtype INPUT member_type Character member_default required YES constraints (substr(upcase(“&inDSNKey”),1,5)=“TEST_”) errMsg_en The data for key variable <&inDSNKey> in dataset <&inDSN> must have a prefix “TEST_” errMsg_fr Les donées de la variable clé <&inDSNKey> dans <&inDSN> doit avoir un préfixe “TEST_”
9
Three Levels of Validation & Imputation on Input for SAS Macro
1) Macro parameters level Validations: all parameters satisfy defined rules Imputations: set to default if possible 2) Input dataset file structure level, if any Validations: variable exist, type, length Imputations: add variables with right types 3) Data inside input dataset level, if any Validations: data satisfy defined rules Imputations: set default values
10
Generic Processing Flow
11
Generic Programming on Numeric Type
12
Create Generic SAS Macros for Three Level of Validation and Imputation
1) Macro parameters level %genericParamsValidation(inMacroName=, rulesFile=); %genericMacroParamValidation (inMacroName=, paramName=, paramType=, paramValue=, paramReqFlag=, paramDefault=, paramInvalid_condition=, paramErrMsg=, paramIOType=);
13
Create Generic SAS Macros for Three Level of Validation and Imputation (Cont’d)
2) Input File Structure Level (if Any) %genericFileStructValidation( inMacroName=, inDSN=, RulesFile=); 3) Data Level (if Any) %genericDataValidation( keyID=,
14
Example 1: %_sum(x=,y=) Without validation and imputation (code)
15
Example 1: %_sum(x=,y=) (Cont’d)
Without validation and imputation (SAS Log)
16
Example 1: %_sum(x=,y=) (Cont’d)
With Validation and Imputation - Define Rules into ParamRules.xls
17
Example 1: %_sum(x=,y=) (Cont’d)
With Validation and Imputation (code)
18
Example 1: %_sum(x=,y=) (Cont’d)
With Validation and Imputation (SAS Log)
19
Example 2 - Reusability A new SAS macro to be defined as
%Celsius_Fahrenheit_Conversion( inDegree=, inDegreeType=); Validation and Imputation Rules are defined into ParamRules.xls Re-use the generic macro to validate the input parameters inDegree and inDegreeType
20
Example 2 – Reusability (Cont’d)
Define Rules in paramRules.xlsx
21
Example 2 – Reusability (Cont’d)
22
Example 2 – Reusability (Cont’d)
%put ---case 1---; %Celsius_Fahrenheit_Conversion(inDegree=100,inDegreeType=c); %put ---case 2---; %Celsius_Fahrenheit_Conversion(inDegree=100,inDegreeType=f); %put ---case 3---; %Celsius_Fahrenheit_Conversion(inDegree=324C,inDegreeType=C); %put ---case 4---; %Celsius_Fahrenheit_Conversion(inDegree=324C,inDegreeType=1234); %put ---case 5---; %Celsius_Fahrenheit_Conversion(inDegree=100,inDegreeType=);
23
---case 1--- 100 Celsius = 212 Fahrenheit ---case 2--- 100 Fahrenheit = Celsius ---case 3--- ERROR: For macro <CELSIUS_FAHRENHEIT_CONVERSION>, Parameter <INDEGREE> should be <NUMERIC> type ERROR: -->Celsius_Fahrenheit_Conversion stops due to list of errors ---case 4--- ERROR: For macro <CELSIUS_FAHRENHEIT_CONVERSION>, Parameter <INDEGREETYPE> should be <CHARACTER> type ERROR: For macro <CELSIUS_FAHRENHEIT_CONVERSION>, MACRO PARAMETER ERROR FOUND: INDEGREETYPE=<1234> MUST BE C OR F ---case 5--- WARNING: For macro <CELSIUS_FAHRENHEIT_CONVERSION>, macro parameter of <CHARACTER> type <INDEGREETYPE> is empty and set to default as <C> Example 1 - Macro Parameter Level of Validation and Imputation (Cont’d) Testing: %put ---case 1---; %Celsius_Fahrenheit_Conversion(inDegree=100,inDegreeType=c); %put ---case 2---; %Celsius_Fahrenheit_Conversion(inDegree=100,inDegreeType=f); %put ---case 3---; %Celsius_Fahrenheit_Conversion(inDegree=324C,inDegreeType=C); %put ---case 4---; %Celsius_Fahrenheit_Conversion(inDegree=324C,inDegreeType=1234; %put ---case 5---; %Celsius_Fahrenheit_Conversion(inDegree=100,inDegreeType=);
24
Benefits Efficiency in system development Reusability
Easy maintainability Coding consistency Modularization Work on Rules and generic macros by one group Work on system core processing by another group
25
Challenges Define Rules properly (Global view of the validation and imputation processes; right sequences of processing steps) Rules for validation and imputation have to be written in SAS® syntax(data step syntax or SAS macro syntax). Review and testing done separately Exceptions; efforts on analysis and design of Rules driven portion and non-Rule driven portion Interleaving two datasets: Rules Datasets and input files
26
Questions Xiyun Cheryl Wang Systems Team Leader / Chef d’équipe de Systèmes Statistics Canada / Statistiques Canada 150 Tunney's Pasture Driveway Ottawa, Ontario, Canada, K1A 0T6 (613) SAS Paper: : A Metadata-Driven Programming Technique Using SAS
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.