Download presentation
Presentation is loading. Please wait.
Published byMelinda Francis Modified over 8 years ago
1
1 Cleaning Invalid Data
2
Clean data by using assignment statements in the DATA step. Clean data by using IF-THEN / ELSE statements in the DATA step. 2
3
Invalid Data to Clean The orion.nonsales data set contains invalid data that needs to be cleaned. After checking the data and finding the invalid data, the correct data values are needed. 3
4
Invalid Data to Clean VariableObsInvalid ValueCorrect Value Employee_ID 7120108120109 14.120116 Gender 12GF 101F Job_Title 10Security Guard I Country 2, 84, 87, 125, 197, and 200au or usAU or US Salary 4.26960 13265026500 20240124015 Hire_Date 521/01/195321/01/1995 9.01/11/1978 21401/01/196801/01/1998 4
5
Programmatically Cleaning Data The DATA step can be used to programmatically clean the invalid data. Use the DATA step to clean the following observations: 5 VariableObsInvalid ValueCorrect Value Country 2, 84, 87, 125, 197, and 200au or usAU or US Salary 4.26960 13265026500 20240124015 Hire_Date 521/01/195321/01/1995 9.01/11/1978 21401/01/196801/01/1998
6
The Assignment Statement The assignment statement evaluates an expression and assigns the resulting value to a variable. General form of the assignment statement: variable names an existing or new variable. expression is a sequence of operands and operators that form a set of instructions that produce a value. 6 variable = expression ;
7
The Assignment Statement Expression Operands are character constants numeric constants date constants character variables numeric variables. Operators are symbols that represent an arithmetic calculation SAS functions. 7
8
The Assignment Statement Expression 8 Country = upcase(Country); Salary = 26960; Gender = 'F'; Hire_Date = '21JAN1995'd; functionvariable numeric constant character constant date constant
9
SAS Functions A SAS function is a a routine that returns a value that is determined from specified arguments. The UPCASE function converts all letters in an argument to uppercase. General form of the UPCASE function: The argument specifies any SAS character expression. 9 UPCASE(argument)
10
10 %clearall libname orion "&path/prg1"; data work.clean; set orion.nonsales; Country=upcase(Country); run; proc freq data=clean; tables country; run;
11
The Assignment Statement All the values of Country in the data set orion.nonsales need to be uppercase. 11 data work.clean; set orion.nonsales; Country=upcase(Country); run; PDV Employee_IDJob_TitleCountry 120101...DirectorAU...
12
The Assignment Statement 12 data work.clean; set orion.nonsales; Country=upcase(Country); run; PDV Employee_IDJob_TitleCountry 120101...DirectorAU... upcase(AU)
13
The Assignment Statement 13 data work.clean; set orion.nonsales; Country=upcase(Country); run; PDV Employee_IDJob_TitleCountry 120104...Administration Managerau...
14
The Assignment Statement 14 data work.clean; set orion.nonsales; Country=upcase(Country); run; PDV Employee_IDJob_TitleCountry 120104...Administration ManagerAU... upcase(au)
15
The Assignment Statement The assignment statement executed for every observation regardless of whether the value needed to be uppercased or not. 15 %clearall proc print data=clean; var Employee_ID Job_Title Country; run;
16
Programmatically Cleaning Data 16 VariableObsInvalid ValueCorrect Value Country 2, 84, 87, 125, 197, and 200au or usAU or US Salary 4.26960 13265026500 20240124015 Hire_Date 521/01/195321/01/1995 9.01/11/1978 21401/01/196801/01/1998 The assignment statement was applied to all observations. The assignment statement needs to be applied to specific observations.
17
Programmatically Cleaning Data The DATA step can be used to programmatically clean the invalid data. Use the DATA step to clean the following observations: 17 VariableObsInvalid ValueCorrect Value Country 2, 84, 87, 125, 197, and 200au or usAU or US Salary 4.26960 13265026500 20240124015 Hire_Date 521/01/195321/01/1995 9.01/11/1978 21401/01/196801/01/1998
18
IF-THEN Statements The IF-THEN statement executes a SAS statement for observations that meet specific conditions. General form of the IF-THEN statement: expression is a sequence of operands and operators that form a set of instructions that define a condition for selecting observations. statement is any executable statement such as the assignment statement. 18 IF expression THEN statement ;
19
IF-THEN Statements 19 PDV Employee_IDSalaryJob_Title 120105...27110Secretary I... %clearall data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; if Employee_ID=120115 then Salary=26500; if Employee_ID=120191 then Salary=24015; run;
20
IF-THEN Statements All the values of Salary must be in the range of 24000 – 500000. 20 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; if Employee_ID=120115 then Salary=26500; if Employee_ID=120191 then Salary=24015; run; PDV Employee_IDSalaryJob_Title 120105...27110Secretary I... FALSE
21
IF-THEN Statements All the values of Salary must be in the range of 24000 – 500000. 21 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; if Employee_ID=120115 then Salary=26500; if Employee_ID=120191 then Salary=24015; run; PDV Employee_IDSalaryJob_Title 120105...27110Secretary I... FALSE
22
IF-THEN Statements All the values of Salary must be in the range of 24000 – 500000. 22 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; if Employee_ID=120115 then Salary=26500; if Employee_ID=120191 then Salary=24015; run; PDV Employee_IDSalaryJob_Title 120105...27110Secretary I... FALSE
23
IF-THEN Statements All the values of Salary must be in the range of 24000 – 500000. 23 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; if Employee_ID=120115 then Salary=26500; if Employee_ID=120191 then Salary=24015; run; PDV Employee_IDSalaryJob_Title 120106....Office Assistant II...
24
IF-THEN Statements All the values of Salary must be in the range of 24000 – 500000. 24 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; if Employee_ID=120115 then Salary=26500; if Employee_ID=120191 then Salary=24015; run; PDV Employee_IDSalaryJob_Title 120106...26960Office Assistant II... TRUE
25
IF-THEN Statements All the values of Salary must be in the range of 24000 – 500000. 25 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; if Employee_ID=120115 then Salary=26500; if Employee_ID=120191 then Salary=24015; run; PDV Employee_IDSalaryJob_Title 120106...26960Office Assistant II... FALSE
26
IF-THEN Statements All the values of Salary must be in the range of 24000 – 500000. 26 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; if Employee_ID=120115 then Salary=26500; if Employee_ID=120191 then Salary=24015; run; PDV Employee_IDSalaryJob_Title 120106...26960Office Assistant II... FALSE
27
IF-THEN Statements When an IF expression is TRUE in this IF-THEN statement series, there is no reason to check the remaining IF-THEN statements when checking Employee_ID. The word ELSE can be placed before the word IF, causing SAS to execute conditional statements until it encounters the first true statement. 27 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; if Employee_ID=120115 then Salary=26500; if Employee_ID=120191 then Salary=24015; run; TRUE
28
IF-THEN / ELSE Statements All the values of Salary must be in the range of 24000 – 500000. 28 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; else if Employee_ID=120115 then Salary=26500; else if Employee_ID=120191 then Salary=24015; run; PDV Employee_IDSalaryJob_Title 120106....Office Assistant II...
29
IF-THEN / ELSE Statements All the values of Salary must be in the range of 24000 – 500000. 29 data work.clean; set orion.nonsales; if Employee_ID=120106 then Salary=26960; else if Employee_ID=120115 then Salary=26500; else if Employee_ID=120191 then Salary=24015; run; PDV Employee_IDSalaryJob_Title 120106...26960Office Assistant II... SKIP TRUE
30
Programmatically Cleaning Data The DATA step can be used to programmatically clean the invalid data. 30 VariableObsInvalid ValueCorrect Value Country 2, 84, 87, 125, 197, and 200au or usAU or US Salary 4.26960 13265026500 20240124015 Hire_Date 521/01/195321/01/1995 9.01/11/1978 21401/01/196801/01/1998
31
IF-THEN / ELSE Statements All the values of Hire_Date must have a value of 01/01/1974 or later. 31 %clearall data work.clean; set orion.nonsales; Country=upcase(Country); if Employee_ID=120106 then Salary=26960; else if Employee_ID=120115 then Salary=26500; else if Employee_ID=120191 then Salary=24015; else if Employee_ID=120107 then Hire_Date='21JAN1995'd; else if Employee_ID=120111 then Hire_Date='01NOV1978'd; else if Employee_ID=121011 then Hire_Date='01JAN1998'd; run;
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.