Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Cleaning Invalid Data. Clean data by using assignment statements in the DATA step. Clean data by using IF-THEN / ELSE statements in the DATA step. 2.

Similar presentations


Presentation on theme: "1 Cleaning Invalid Data. Clean data by using assignment statements in the DATA step. Clean data by using IF-THEN / ELSE statements in the DATA step. 2."— Presentation transcript:

1 1 Cleaning Invalid Data

2 Clean data by using assignment statements in the DATA step. Clean data by using IF-THEN / ELSE statements in the DATA step. 2

3 Invalid Data to Clean The orion.nonsales data set contains invalid data that needs to be cleaned. After checking the data and finding the invalid data, the correct data values are needed. 3

4 Invalid Data to Clean VariableObsInvalid ValueCorrect Value Employee_ID 7120108120109 14.120116 Gender 12GF 101F Job_Title 10Security Guard I Country 2, 84, 87, 125, 197, and 200au or usAU or US Salary 4.26960 13265026500 20240124015 Hire_Date 521/01/195321/01/1995 9.01/11/1978 21401/01/196801/01/1998 4

5 Programmatically Cleaning Data The DATA step can be used to programmatically clean the invalid data. Use the DATA step to clean the following observations: 5 VariableObsInvalid ValueCorrect Value Country 2, 84, 87, 125, 197, and 200au or usAU or US Salary 4.26960 13265026500 20240124015 Hire_Date 521/01/195321/01/1995 9.01/11/1978 21401/01/196801/01/1998

6 The Assignment Statement The assignment statement evaluates an expression and assigns the resulting value to a variable. General form of the assignment statement: variable names an existing or new variable. expression is a sequence of operands and operators that form a set of instructions that produce a value. 6 variable = expression ;

7 The Assignment Statement Expression Operands are character constants numeric constants date constants character variables numeric variables. Operators are symbols that represent an arithmetic calculation SAS functions. 7

8 The Assignment Statement Expression 8 Country = upcase(Country); Salary = 26960; Gender = 'F'; Hire_Date = '21JAN1995'd; functionvariable numeric constant character constant date constant

9 SAS Functions A SAS function is a a routine that returns a value that is determined from specified arguments. The UPCASE function converts all letters in an argument to uppercase. General form of the UPCASE function: The argument specifies any SAS character expression. 9 UPCASE(argument)

10 10 %clearall libname orion "&path/prg1"; data work.clean; set orion.nonsales; Country=upcase(Country); run; proc freq data=clean; tables country; run;

11 The Assignment Statement All the values of Country in the data set orion.nonsales need to be uppercase. 11 data work.clean; set orion.nonsales; Country=upcase(Country); run; PDV Employee_IDJob_TitleCountry 120101...DirectorAU...

12 The Assignment Statement 12 data work.clean; set orion.nonsales; Country=upcase(Country); run; PDV Employee_IDJob_TitleCountry 120101...DirectorAU... upcase(AU)

13 The Assignment Statement 13 data work.clean; set orion.nonsales; Country=upcase(Country); run; PDV Employee_IDJob_TitleCountry 120104...Administration Managerau...

14 The Assignment Statement 14 data work.clean; set orion.nonsales; Country=upcase(Country); run; PDV Employee_IDJob_TitleCountry 120104...Administration ManagerAU... upcase(au)

15 The Assignment Statement The assignment statement executed for every observation regardless of whether the value needed to be uppercased or not. 15 %clearall proc print data=clean; var Employee_ID Job_Title Country; run;

16 Programmatically Cleaning Data 16 VariableObsInvalid ValueCorrect Value Country 2, 84, 87, 125, 197, and 200au or usAU or US Salary 4.26960 13265026500 20240124015 Hire_Date 521/01/195321/01/1995 9.01/11/1978 21401/01/196801/01/1998 The assignment statement was applied to all observations. The assignment statement needs to be applied to specific observations.

17 Programmatically Cleaning Data The DATA step can be used to programmatically clean the invalid data. Use the DATA step to clean the following observations: 17 VariableObsInvalid ValueCorrect Value Country 2, 84, 87, 125, 197, and 200au or usAU or US Salary 4.26960 13265026500 20240124015 Hire_Date 521/01/195321/01/1995 9.01/11/1978 21401/01/196801/01/1998

18 IF-THEN Statements The IF-THEN statement executes a SAS statement for observations that meet specific conditions. General form of the IF-THEN statement: expression is a sequence of operands and operators that form a set of instructions that define a condition for selecting observations. statement is any executable statement such as the assignment statement. 18 IF expression THEN statement ;

19 IF-THEN Statements 19 PDV Employee_IDSalaryJob_Title 120105...27110Secretary I... %clearall data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; if Employee_ID=120115 then Salary=26500; if Employee_ID=120191 then Salary=24015; run;

20 IF-THEN Statements All the values of Salary must be in the range of 24000 – 500000. 20 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; if Employee_ID=120115 then Salary=26500; if Employee_ID=120191 then Salary=24015; run; PDV Employee_IDSalaryJob_Title 120105...27110Secretary I... FALSE

21 IF-THEN Statements All the values of Salary must be in the range of 24000 – 500000. 21 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; if Employee_ID=120115 then Salary=26500; if Employee_ID=120191 then Salary=24015; run; PDV Employee_IDSalaryJob_Title 120105...27110Secretary I... FALSE

22 IF-THEN Statements All the values of Salary must be in the range of 24000 – 500000. 22 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; if Employee_ID=120115 then Salary=26500; if Employee_ID=120191 then Salary=24015; run; PDV Employee_IDSalaryJob_Title 120105...27110Secretary I... FALSE

23 IF-THEN Statements All the values of Salary must be in the range of 24000 – 500000. 23 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; if Employee_ID=120115 then Salary=26500; if Employee_ID=120191 then Salary=24015; run; PDV Employee_IDSalaryJob_Title 120106....Office Assistant II...

24 IF-THEN Statements All the values of Salary must be in the range of 24000 – 500000. 24 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; if Employee_ID=120115 then Salary=26500; if Employee_ID=120191 then Salary=24015; run; PDV Employee_IDSalaryJob_Title 120106...26960Office Assistant II... TRUE

25 IF-THEN Statements All the values of Salary must be in the range of 24000 – 500000. 25 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; if Employee_ID=120115 then Salary=26500; if Employee_ID=120191 then Salary=24015; run; PDV Employee_IDSalaryJob_Title 120106...26960Office Assistant II... FALSE

26 IF-THEN Statements All the values of Salary must be in the range of 24000 – 500000. 26 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; if Employee_ID=120115 then Salary=26500; if Employee_ID=120191 then Salary=24015; run; PDV Employee_IDSalaryJob_Title 120106...26960Office Assistant II... FALSE

27 IF-THEN Statements When an IF expression is TRUE in this IF-THEN statement series, there is no reason to check the remaining IF-THEN statements when checking Employee_ID. The word ELSE can be placed before the word IF, causing SAS to execute conditional statements until it encounters the first true statement. 27 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; if Employee_ID=120115 then Salary=26500; if Employee_ID=120191 then Salary=24015; run; TRUE

28 IF-THEN / ELSE Statements All the values of Salary must be in the range of 24000 – 500000. 28 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; else if Employee_ID=120115 then Salary=26500; else if Employee_ID=120191 then Salary=24015; run; PDV Employee_IDSalaryJob_Title 120106....Office Assistant II...

29 IF-THEN / ELSE Statements All the values of Salary must be in the range of 24000 – 500000. 29 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; else if Employee_ID=120115 then Salary=26500; else if Employee_ID=120191 then Salary=24015; run; PDV Employee_IDSalaryJob_Title 120106...26960Office Assistant II... SKIP TRUE

30 Programmatically Cleaning Data The DATA step can be used to programmatically clean the invalid data. 30 VariableObsInvalid ValueCorrect Value Country 2, 84, 87, 125, 197, and 200au or usAU or US Salary 4.26960 13265026500 20240124015 Hire_Date 521/01/195321/01/1995 9.01/11/1978 21401/01/196801/01/1998

31 IF-THEN / ELSE Statements All the values of Hire_Date must have a value of 01/01/1974 or later. 31 %clearall data work.clean; set orion.nonsales; Country=upcase(Country); if Employee_ID=120106 then Salary=26960; else if Employee_ID=120115 then Salary=26500; else if Employee_ID=120191 then Salary=24015; else if Employee_ID=120107 then Hire_Date='21JAN1995'd; else if Employee_ID=120111 then Hire_Date='01NOV1978'd; else if Employee_ID=121011 then Hire_Date='01JAN1998'd; run;


Download ppt "1 Cleaning Invalid Data. Clean data by using assignment statements in the DATA step. Clean data by using IF-THEN / ELSE statements in the DATA step. 2."

Similar presentations


Ads by Google