1 Cleaning Invalid Data. Clean data by using assignment statements in the DATA step. Clean data by using IF-THEN / ELSE statements in the DATA step. 2.

Slides:



Advertisements
Similar presentations
How SAS implements structured programming constructs
Advertisements

Microsoft® Small Basic
Objectives Understand the software development lifecycle Perform calculations Use decision structures Perform data validation Use logical operators Use.
True or false A variable of type char can hold the value 301. ( F )
Val Function A Function performs an action and returns a value The expression to operate upon, known as the argument, (or multiple arguments), must be.
Control Structures 4 Control structures control the flow of execution of a program 4 The categories of control structures are: –Sequence –Selection –Repetition.
1 Outline 13.1Introduction 13.2A Simple Program: Printing a Line of Text in a Web Page 13.3Another JavaScript Program: Adding Integers 13.4Memory Concepts.
Differences between Java and C CS-2303, C-Term Differences between Java and C CS-2303, System Programming Concepts (Slides include materials from.
1 Chapter 4 Language Fundamentals. 2 Identifiers Program parts such as packages, classes, and class members have names, which are formally known as identifiers.
Basic Elements of Programming A VB program is built from statements, statements from expressions, expressions from operators and operands, and operands.
Switch structure Switch structure selects one from several alternatives depending on the value of the controlling expression. The controlling expression.
Comparing Numeric Values If Val(Text1.Text) = MaxPrice Then (Is the current numeric value stored in the Text property of Text1 equal to the value stored.
Basic Elements of C++ Chapter 2.
Ceng 356-Lab2. Objectives After completing this lesson, you should be able to do the following: Limit the rows that are retrieved by a query Sort the.
The Data Element. 2 Data type: A description of the set of values and the basic set of operations that can be applied to values of the type. Strong typing:
The Data Element. 2 Data type: A description of the set of values and the basic set of operations that can be applied to values of the type. Strong typing:
Chapter 14: Generating Data with Do Loops OBJECTIVES Understand iterative DO loops. Construct a DO loop to perform repetitive calculations Use DO loops.
Microsoft Visual Basic 2008: Reloaded Fourth Edition
©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina Chapter 17 supplement: Review of Formatting Data STAT 541.
Oracle10g Developer: PL/SQL Programming1 Objectives Programming fundamentals The PL/SQL block Define and declare variables Initialize variables The NOT.
Lecture 4 C Program Control Acknowledgment The notes are adapted from those provided by Deitel & Associates, Inc. and Pearson Education Inc.
1 Session 3: Flow Control & Functions iNET Academy Open Source Web Programming.
Flow of Control Part 1: Selection
Integer numerical data types. The integer data types The integer data types use the binary number system as encoding method There are a number of different.
Programming Fundamental Slides1 Data Types, Identifiers, and Expressions Topics to cover here: Data types Variables and Identifiers Arithmetic and Logical.
Introduction to Programming Languages S1.3.1Bina © 1998 Liran & Ofir Introduction to Programming Languages Programming in C.
© The McGraw-Hill Companies, 2006 Chapter 2 Selection.
Lesson - 5. Introduction While programming, we usually need to decide the path of the program flow according to the parameters and conditions. Actually.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
WRITING CONTROL STRUCTURES (CONDITIONAL CONTROL).
 2000 Deitel & Associates, Inc. All rights reserved. Outline 8.1Introduction 8.2A Simple Program: Printing a Line of Text in a Web Page 8.3Another JavaScript.
Summer SAS Workshop Lecture 3. Summer SAS Workshop Website
Computer Programming TCP1224 Chapter 5 The Selection Structure.
CSC 1010 Programming for All Lecture 3 Useful Python Elements for Designing Programs Some material based on material from Marty Stepp, Instructor, University.
Controlling Input and Output
SAS for Data Management and Analysis
Chapter 17 Supplement: Alternatives to IF-THEN/ELSE Processing STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South.
Sum of Arithmetic Sequences. Definitions Sequence Series.
An Introduction to Programming with C++ Sixth Edition Chapter 5 The Selection Structure.
1 Checking Data with the PRINT and FREQ Procedures.
Chapter 4 Chapter 4: Variables, Constants, and Arithmetic Operators.
C Program Control September 15, OBJECTIVES The essentials of counter-controlled repetition. To use the for and do...while repetition statements.
BIL 104E Introduction to Scientific and Engineering Computing Lecture 6.
Hello world . Variables A variable consists of a name that you can choose, preceded by a dollar ($) sign. Some legal variables.
Random Functions Selection Structure Comparison Operators Logical Operator
Chapter Topics The Basics of a C++ Program Data Types
Chapter 4: Making Decisions.
JavaScript Objects.
Basic Elements of C++.
Data Types, Identifiers, and Expressions
The Selection Structure
Chapter 4 - Program Control
Chapter 4: Making Decisions.
Basic Queries Specifying Columns
Chapter 5: Using DATA Step Arrays
An Introduction to Programming with C++ Fifth Edition
Basic Elements of C++ Chapter 2.
Data Types, Identifiers, and Expressions
Types, Truth, and Expressions (Part 2)
Objectives After studying this chapter, you should be able to:
Types, Truth, and Expressions (Part 2)
Types, Truth, and Expressions (Part 2)
Fundamentals of visual basic
Computer Science Core Concepts
Types, Truth, and Expressions (Part 2)
The Data Element.
An Introduction to Programming with C++ Fifth Edition
Introduction to Programming
The Data Element.
Types, Truth, and Expressions
Presentation transcript:

1 Cleaning Invalid Data

Clean data by using assignment statements in the DATA step. Clean data by using IF-THEN / ELSE statements in the DATA step. 2

Invalid Data to Clean The orion.nonsales data set contains invalid data that needs to be cleaned. After checking the data and finding the invalid data, the correct data values are needed. 3

Invalid Data to Clean VariableObsInvalid ValueCorrect Value Employee_ID Gender 12GF 101F Job_Title 10Security Guard I Country 2, 84, 87, 125, 197, and 200au or usAU or US Salary Hire_Date 521/01/195321/01/ /11/ /01/196801/01/1998 4

Programmatically Cleaning Data The DATA step can be used to programmatically clean the invalid data. Use the DATA step to clean the following observations: 5 VariableObsInvalid ValueCorrect Value Country 2, 84, 87, 125, 197, and 200au or usAU or US Salary Hire_Date 521/01/195321/01/ /11/ /01/196801/01/1998

The Assignment Statement The assignment statement evaluates an expression and assigns the resulting value to a variable. General form of the assignment statement: variable names an existing or new variable. expression is a sequence of operands and operators that form a set of instructions that produce a value. 6 variable = expression ;

The Assignment Statement Expression Operands are character constants numeric constants date constants character variables numeric variables. Operators are symbols that represent an arithmetic calculation SAS functions. 7

The Assignment Statement Expression 8 Country = upcase(Country); Salary = 26960; Gender = 'F'; Hire_Date = '21JAN1995'd; functionvariable numeric constant character constant date constant

SAS Functions A SAS function is a a routine that returns a value that is determined from specified arguments. The UPCASE function converts all letters in an argument to uppercase. General form of the UPCASE function: The argument specifies any SAS character expression. 9 UPCASE(argument)

10 %clearall libname orion "&path/prg1"; data work.clean; set orion.nonsales; Country=upcase(Country); run; proc freq data=clean; tables country; run;

The Assignment Statement All the values of Country in the data set orion.nonsales need to be uppercase. 11 data work.clean; set orion.nonsales; Country=upcase(Country); run; PDV Employee_IDJob_TitleCountry DirectorAU...

The Assignment Statement 12 data work.clean; set orion.nonsales; Country=upcase(Country); run; PDV Employee_IDJob_TitleCountry DirectorAU... upcase(AU)

The Assignment Statement 13 data work.clean; set orion.nonsales; Country=upcase(Country); run; PDV Employee_IDJob_TitleCountry Administration Managerau...

The Assignment Statement 14 data work.clean; set orion.nonsales; Country=upcase(Country); run; PDV Employee_IDJob_TitleCountry Administration ManagerAU... upcase(au)

The Assignment Statement The assignment statement executed for every observation regardless of whether the value needed to be uppercased or not. 15 %clearall proc print data=clean; var Employee_ID Job_Title Country; run;

Programmatically Cleaning Data 16 VariableObsInvalid ValueCorrect Value Country 2, 84, 87, 125, 197, and 200au or usAU or US Salary Hire_Date 521/01/195321/01/ /11/ /01/196801/01/1998 The assignment statement was applied to all observations. The assignment statement needs to be applied to specific observations.

Programmatically Cleaning Data The DATA step can be used to programmatically clean the invalid data. Use the DATA step to clean the following observations: 17 VariableObsInvalid ValueCorrect Value Country 2, 84, 87, 125, 197, and 200au or usAU or US Salary Hire_Date 521/01/195321/01/ /11/ /01/196801/01/1998

IF-THEN Statements The IF-THEN statement executes a SAS statement for observations that meet specific conditions. General form of the IF-THEN statement: expression is a sequence of operands and operators that form a set of instructions that define a condition for selecting observations. statement is any executable statement such as the assignment statement. 18 IF expression THEN statement ;

IF-THEN Statements 19 PDV Employee_IDSalaryJob_Title Secretary I... %clearall data work.clean; set orion.nonsales; if Employee_ID= then Salary=26960; if Employee_ID= then Salary=26500; if Employee_ID= then Salary=24015; run;

IF-THEN Statements All the values of Salary must be in the range of – data work.clean; set orion.nonsales; if Employee_ID= then Salary=26960; if Employee_ID= then Salary=26500; if Employee_ID= then Salary=24015; run; PDV Employee_IDSalaryJob_Title Secretary I... FALSE

IF-THEN Statements All the values of Salary must be in the range of – data work.clean; set orion.nonsales; if Employee_ID= then Salary=26960; if Employee_ID= then Salary=26500; if Employee_ID= then Salary=24015; run; PDV Employee_IDSalaryJob_Title Secretary I... FALSE

IF-THEN Statements All the values of Salary must be in the range of – data work.clean; set orion.nonsales; if Employee_ID= then Salary=26960; if Employee_ID= then Salary=26500; if Employee_ID= then Salary=24015; run; PDV Employee_IDSalaryJob_Title Secretary I... FALSE

IF-THEN Statements All the values of Salary must be in the range of – data work.clean; set orion.nonsales; if Employee_ID= then Salary=26960; if Employee_ID= then Salary=26500; if Employee_ID= then Salary=24015; run; PDV Employee_IDSalaryJob_Title Office Assistant II...

IF-THEN Statements All the values of Salary must be in the range of – data work.clean; set orion.nonsales; if Employee_ID= then Salary=26960; if Employee_ID= then Salary=26500; if Employee_ID= then Salary=24015; run; PDV Employee_IDSalaryJob_Title Office Assistant II... TRUE

IF-THEN Statements All the values of Salary must be in the range of – data work.clean; set orion.nonsales; if Employee_ID= then Salary=26960; if Employee_ID= then Salary=26500; if Employee_ID= then Salary=24015; run; PDV Employee_IDSalaryJob_Title Office Assistant II... FALSE

IF-THEN Statements All the values of Salary must be in the range of – data work.clean; set orion.nonsales; if Employee_ID= then Salary=26960; if Employee_ID= then Salary=26500; if Employee_ID= then Salary=24015; run; PDV Employee_IDSalaryJob_Title Office Assistant II... FALSE

IF-THEN Statements When an IF expression is TRUE in this IF-THEN statement series, there is no reason to check the remaining IF-THEN statements when checking Employee_ID. The word ELSE can be placed before the word IF, causing SAS to execute conditional statements until it encounters the first true statement. 27 data work.clean; set orion.nonsales; if Employee_ID= then Salary=26960; if Employee_ID= then Salary=26500; if Employee_ID= then Salary=24015; run; TRUE

IF-THEN / ELSE Statements All the values of Salary must be in the range of – data work.clean; set orion.nonsales; if Employee_ID= then Salary=26960; else if Employee_ID= then Salary=26500; else if Employee_ID= then Salary=24015; run; PDV Employee_IDSalaryJob_Title Office Assistant II...

IF-THEN / ELSE Statements All the values of Salary must be in the range of – data work.clean; set orion.nonsales; if Employee_ID= then Salary=26960; else if Employee_ID= then Salary=26500; else if Employee_ID= then Salary=24015; run; PDV Employee_IDSalaryJob_Title Office Assistant II... SKIP TRUE

Programmatically Cleaning Data The DATA step can be used to programmatically clean the invalid data. 30 VariableObsInvalid ValueCorrect Value Country 2, 84, 87, 125, 197, and 200au or usAU or US Salary Hire_Date 521/01/195321/01/ /11/ /01/196801/01/1998

IF-THEN / ELSE Statements All the values of Hire_Date must have a value of 01/01/1974 or later. 31 %clearall data work.clean; set orion.nonsales; Country=upcase(Country); if Employee_ID= then Salary=26960; else if Employee_ID= then Salary=26500; else if Employee_ID= then Salary=24015; else if Employee_ID= then Hire_Date='21JAN1995'd; else if Employee_ID= then Hire_Date='01NOV1978'd; else if Employee_ID= then Hire_Date='01JAN1998'd; run;