1 Summary HRP223 – 2009 November 1 st, 2010 Copyright © 1999-2010 Leland Stanford Junior University. All rights reserved. Warning: This presentation is.

Slides:



Advertisements
Similar presentations
Final Thoughts HRP 223 – 2013 December 4 th, 2013 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
Advertisements

Working with Data in Windows HRP223 – 2010 October 4 th, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 SAS Formats and SAS Macro Language HRP223 – 2011 November 9 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
1 Merging with SQL HRP223 – 2011 October 31, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
1 Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
1 Processing Grouped Data HRP223 – 2011 November 14 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 Combining (with SQL) HRP223 – 2010 October 27, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 Lab 1 HRP223 – 2010 October 6, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
1 Database Theory and Normalization HRP223 – 2010 November 14 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
1 Lab 1 HRP223 – 2011 Oct 10, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Introduction to SQL Session 2 Retrieving Data From Multiple Tables.
XP Chapter 3 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Analyzing Data For Effective Decision Making.
XP New Perspectives on Microsoft Office Access 2003, Second Edition- Tutorial 2 1 Microsoft Office Access 2003 Tutorial 2 – Creating And Maintaining A.
Creating And Maintaining A Database. 2 Learn the guidelines for designing databases When designing a database, first try to think of all the fields of.
Microsoft Access 2010 Chapter 7 Using SQL.
1 Windows and Beginning Data Manipulation HRP223 – 2013 Oct 9, 2012 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
Chapter 2 Querying a Database
Introduction to Access By Mary Ann Chaney and Alicia Harkleroad.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall1 Exploring Microsoft Office Access Committed to Shaping the Next Generation.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
HPR Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Working with Data in Windows HRP223 – 2009 Sept 28 th, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Analyzing Data For Effective Decision Making Chapter 3.
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. ACCESS 2007 M I C R O S O F T ® THE PROFESSIONAL APPROACH S E R I E S Lesson 6 – Designing.
® Microsoft Access 2010 Tutorial 5 Creating Advanced Queries and Enhancing Table Design.
Math 3400 Computer Applications of Statistics Lecture 1 Introduction and SAS Overview.
Lesson 2.  To help ensure accurate data, rules that check entries against specified values can be applied to a field. A validation rule is applied to.
Concepts of Database Management Seventh Edition
CIS 338: Using Queries in Access as a RecordSource Dr. Ralph D. Westfall May, 2011.
 Agenda 2/20/13 o Review quiz, answer questions o Review database design exercises from 2/13 o Create relationships through “Lookup tables” o Discuss.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
XP Chapter 3 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Analyzing Data For Effective Decision Making Chapter.
Chapter 5: More on the Selection Structure Programming with Microsoft Visual Basic 2005, Third Edition.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
Concepts of Database Management Eighth Edition Chapter 3 The Relational Model 2: SQL.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 Chapter 4: Creating Simple Queries 4.1 Introduction to the Query Task 4.2 Selecting Columns and Filtering Rows 4.3 Creating New Columns with an Expression.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Access Queries Agenda 6/16/14 Review Access Project Part 1, answer questions Discuss queries: Turning data stored in a database into information for decision.
Overview Excel is a spreadsheet, a grid made from columns and rows. It is a software program that can make number manipulation easy and somewhat painless.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 Lab 1 HRP223 – 2011 Oct 10, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Access 2010 by Robert Grauer, Keith Mast, and Mary Anne.
1 Lab 1 HRP223 – 2009 October 5, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Controlling Program Flow with Decision Structures.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Beginning Data Manipulation HRP Topic 4 Oct 14 th 2012 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Computer Science Up Down Controls, Decisions and Random Numbers.
Summary HRP223 – 2009 October 28, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Working with Data in Windows
SAS Output Delivery System
MODULE 7 Microsoft Access 2010
Combining (with SQL) HRP223 – 2013 October 30, 2013
Lab 3 and HRP259 Lab and Combining (with SQL)
Lab 2 and Merging Data (with SQL)
Combining (with SQL) HRP223 – 2012 November 05, 2011
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Lab 1 HRP223 – 2009 October 5, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
File Sharing and Processing Grouped Data
Chapter 3: Selection Structures: Making Decisions
Data Manipulation (with SQL)
Processing Grouped Data
Shelly Cashman: Microsoft Access 2016
Presentation transcript:

1 Summary HRP223 – 2009 November 1 st, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

2 You know … How to create a table from scratch How to import tables – From external sources like Excel or using export/import code from databases How to create tables – from a single existing table with selected variables with recoded variables with or without subsets of the records – from multiple tables by adding columns (joins) by adding sets of records (set operators) With code or GUI

3 Create a New Table The GUI is the easiest. Look in the optional textbooks for the class to learn the syntax for code. $ means a character string. 10. means 10 letters wide. The age variable starts in column 11. Missing numbers are just a. Missing characters are just spaces (not tabs)

4 Importing The most bullet proof way to import is to use the import wizard. You can also write a program with proc import

5 Code If you write any code be sure to load my keyboard macros: Once you have a program node open in a flowchart, you add the macros to both SAS and EG by using the Program menu. The import macro gives you the shell to import Excel files.

6 From a Database If you load data that came with an import/export program, you will probably need to add the path to infile statement.

7 Importing Advice It is a good idea to import the source into a permanent library. After importing, use the Query Builder or a Program node and copy all the variables into a new data set. This node can be tweaked later to fix the problems that you identify later. – If you do not do this, you will have to change the links leading from the cleaned/fixed data to point to the analyses.

8 Creating New Datasets From 1 Table Name the query and new table. Drag the entire table or individual variables to the Select Data pane. In the Select Data pane pick variables then click the properties button.

9 Changing a Variable Computed Columns>New… > Recoded column> pick a variable. Notice the other tabs for selecting what to change to a new value. SAS allows 27 different types of missing numbers..A through.Z and.

10 Bad Ages Recoded to NULL If you get data from a program that uses bogus numbers to indicate problems in a numeric field, replace the values with different NULL values.A,.B, etc. When you do descriptive statistics the null values will be automatically excluded.

11 Removing/Choosing Records Right click on the variables you want to use for dropping records or use the Filter Data tab.

12

13 Advanced Changes Comparisons You can use the Advanced Expression dialog box to do complex tasks like editing and combining text variables. – catt(), lowcase(), compress(), combl() SAS has built in Regular Expression processing (like PERL) as well as Soundex for phonetic spellings and (Levenshtein) edit distances for measuring dissimilarity between strings.

14 Working with Several Tables Joins add columns to a base table. Set operations add (or subtract) records. Table 1 Table 2 New Table Table 1 Table 2 New Table

15 Commonly Used Joins Table 1 Table 2 Inner Join New Table Table 1 Table 2 Left Join New Table Keep only records where you can match IDs in both tables. Keep only all records from the left table and matching records from the right. Use NULL for the unmatched records in the right table variables.

16 One to Many Joins All of the SQL joins that I have mentioned work with either a 1 to 1 match of key variables across tables or a 1 to many match. But you need to be cognizant of how many records are in each table. Double check the new table size. Inner Left

17 If there are duplicate key values in one of the tables and you do not join on a second variable, SQL will multiply the combinations and you can end up with the total records being the product of the number of records. Cartesian Joins Inner Join on Family

18 PROC SQL - Set Operators NO GUI Outer Union Corresponding – concatenates Unions – unique rows from both queries Except – rows that are part of first query Intersect – rows common to both queries

19 How does a data step typically work? The data statement says make this (or these) data set(s). 1.SAS then reads every line down to the run statement and gathers a list of all variables used. This list is called the program data vector (PDV). 2.It then sets all the variables to missing.

20 How does a data step typically work? 3.It then does the instruction listed on each line of the data step program in the order that the lines are written. 4.Then it writes all the variables out to the new dataset. 5.It then repeats the process if there is more data.

21 How SAS Processes a Dataset (1) In the example below, SAS will look in the existing dataset called Teletubbies and it will find two variables, teletubby and thing. Then it will find the variable called kid. Then it will do the instructions in order. data Teletubbies2; *name of a new data set; set Teletubbies; *load 1 observation of data; kid = "Andrew"; * fill in the blank; output; *write the variables to teletubbies2; return; *return to the top of the step; run; *end of these instructions;

22 The Set Statement set Teletubbies; This line tells SAS to load one row of data from the data set Teletubbies into the PDV. The first time this line is run, the first row of data is loaded into the PDV. When there is no more data to load, the data step is done.

23 Variable Assignment In the example the word Andrew is assigned to the variable kid. All variables are assigned from the right side into the variable named on the left. kid = "Andrew"; If a variable appears on the left and right side of an equal sign, the original value on the right is changed and then written to the left. aNumber = aNumber + 4; Assignment goes this way original valuenew value

24 How SAS Processes a Dataset (2) If you do not include the output and return statements, SAS will do them automatically. So, the previous data step would typically be written like this. data Teletubbies2; set Teletubbies; kid = "Andrew"; run;

25 How SAS Processes a Dataset (3) If, If-else, or select statements are typically used to conditionally assign values in a data step. If: one possibility If else: two possibilities Select when otherwise end: multiple possibilities

26 Error Trapping “Tinkywinkey” is not “Tinky Winkey” … Bad Teletubby.

27 Test Your Understanding data test3a test3b; set source; if isMale = 1 then output test3a; hasCancer = 1; output test3b; run;

28 Common Ground … where Both SQL and data step programming use where statements to select what records are included in the new dataset. With data steps the variables used in the where statement need to already exist in the source file. Use if to check variables created in the data step.

29 where The syntax for where is identical in SQL and data steps. Differences vs. if statements: – main points work in where only sub points work in either – x between y and z x >= y and x <= z y <= x <= z – string1 ? string2 or string1 contains string2 index(string1,string2) > 0 – string1 =* string2 soundex(string1) = soundex(string2) – x is null or x is missing missing(x) – String1 like “U%of%A%” use regular expressions (PRX)

30 where Syntax The where statement, like all SAS statements, begins with a keyword (where) and ends in a semicolon. –where isDead = "false"; –where isDead ne "true"; –where missing(gender); –where salary > ; –where country in ("USA", "Japan", "UK"); –where country in ("USA" "Japan" "UK");

31 where Syntax Arithmetic –where salary/12 > 10000; –where (salary /12) * 1.20 ge 9900; –where salary + bonus < ; Logical –where gender ne "M" and salary >= 50000; –where gender ne "M" or salary >= 50000; –where country = "UK" or country = "UTAH"; –where country not in ("USA", "AU");

32 SAS has many operations available to help you make decisions. = eq, ~= ne, gt, = ge, in ( ) Not requires the expression following it to not be true. & And, | or, in & Requires both operands to be true. | Requires one operand to be true. In () requires at least one comparison to be true. Math operations: + - * / **. Make Decisions

33 Logical Decisions & Compound Expressions Common tests and common problems: where YODeath < YOBirth; where Sex = "M" and numPreg > 0; where Sex="M" and numPreg > 0 or ageLMP > 0; *** bad ***; where Sex="M" and (numPreg > 0 or ageLMP > 0); *** good ***; – Moral: Use parentheses generously with ands and ors.

34 Where is everywhere

35 Numeric Data and Looping Say somebody tells you to simulate rolling dice. The formula to do this says: – generate a random number between 0 and 1 – multiply it by 6 – round up to the closest integer data die; *the 22 says which list of numbers between 0 & 1; aNumber = ranuni(22); die = ceil(6*aNumber); * Generate a random integer between 1 and 6.; dieDie = ceil(6*ranuni( )); output; * write to the new dataset; return; * go to the top and try to read in data; run;

36 Doing Stuff Repeatedly How to roll two dice: data dice; do x = 1 to 2 by 1; roll= ceil(6*ranuni( )); output; end; return; * go to the top and try to read in data; run;

37 Craps… In the dice game “craps” you throw two dice and the number you roll determines if you win or lose. How do you simulate rolling 10 pairs of dice? data craps ; do trial = 1 to 10; do dieNumber = 1 to 2; roll = ceil(6*ranuni( )); output; end; return; run;

38 Summing