1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © 1999-2011 Leland Stanford Junior University. All rights reserved. Warning: This.

Slides:



Advertisements
Similar presentations
Creating Data Entry Screens in Epi Info
Advertisements

Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
Final Thoughts HRP 223 – 2013 December 4 th, 2013 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
Chapter 18 - Data sources and datasets 1 Outline How to create a data source How to use a data source How to use Query Builder to build a simple query.
Working with Data in Windows HRP223 – 2010 October 4 th, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 SAS Formats and SAS Macro Language HRP223 – 2011 November 9 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
Beginning Data Manipulation HRP Topic 4 Oct 19 th 2011.
1 Merging with SQL HRP223 – 2011 October 31, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
1 Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
1 Processing Grouped Data HRP223 – 2011 November 14 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 Combining (with SQL) HRP223 – 2010 October 27, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
1 Database Theory and Normalization HRP223 – 2010 November 14 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
1 Lab 1 HRP223 – 2011 Oct 10, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
1 An Introduction to IBM SPSS PSY450 Experimental Psychology Dr. Dwight Hennessy.
Introduction to Structured Query Language (SQL)
Introduction to Structured Query Language (SQL)
1 Windows and Beginning Data Manipulation HRP223 – 2013 Oct 9, 2012 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
SAS for Categorical Data Copyright © 2004 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright.
Introduction to SPSS (For SPSS Version 16.0)
8 Copyright © 2004, Oracle. All rights reserved. Creating LOVs and Editors.
Copyright 2007, Paradigm Publishing Inc. BACKNEXTEND 3-1 LINKS TO OBJECTIVES Save a Filter as a Query Save a Filter as a Query Parameter Query Inner, Left,
XP New Perspectives on Microsoft Access 2002 Tutorial 51 Microsoft Access 2002 Tutorial 5 – Enhancing a Table’s Design, and Creating Advanced Queries and.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor Ms. Arwa.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
LINDSEY BREWER CSSCR (CENTER FOR SOCIAL SCIENCE COMPUTATION AND RESEARCH) UNIVERSITY OF WASHINGTON September 17, 2009 Introduction to SPSS (Version 16)
HPR Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
PHP meets MySQL.
Working with Data in Windows HRP223 – 2009 Sept 28 th, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
HAP 709 – Healthcare Databases SQL Data Manipulation Language (DML) Updated Fall, 2009.
Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
Database Systems Microsoft Access Practical #1 Creating Tables Nos 215.
CIS 338: Using Queries in Access as a RecordSource Dr. Ralph D. Westfall May, 2011.
Chapter 17 Creating a Database.
7 1 Chapter 7 Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
1 Summary HRP223 – 2009 November 1 st, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is.
WHAT IS A DATABASE? A DATABASE IS A COLLECTION OF DATA RELATED TO A PARTICULAR TOPIC OR PURPOSE OR TO PUT IT SIMPLY A GENERAL PURPOSE CONTAINER FOR STORING.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
Copyright 2007, Paradigm Publishing Inc. ACCESS 2007 Chapter 3 BACKNEXTEND 3-1 LINKS TO OBJECTIVES Modify a Table – Add, Delete, Move Fields Modify a Table.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 Chapter 20 – Data sources and datasets Outline How to create a data source How to use a data source How to use Query Builder to build a simple query.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 Lab 1 HRP223 – 2011 Oct 10, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
1 Lab 1 HRP223 – 2009 October 5, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Simple Queries DBS301 – Week 1. Objectives Basic SELECT statement Computed columns Aliases Concatenation operator Use of DISTINCT to eliminate duplicates.
Beginning Data Manipulation HRP Topic 4 Oct 14 th 2012 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international.
SPSS For a Beginner CHAR By Adebisi A. Abdullateef
Working with Data in Windows
ECONOMETRICS ii – spring 2018
ISC440: Web Programming 2 Server-side Scripting PHP 3
Combining (with SQL) HRP223 – 2013 October 30, 2013
Lab 3 and HRP259 Lab and Combining (with SQL)
Lab 2 and Merging Data (with SQL)
Combining (with SQL) HRP223 – 2012 November 05, 2011
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
A Bit About SAS/Macro Language Database Theory and Normalization
File Sharing and Processing Grouped Data
Data Manipulation (with SQL)
Processing Grouped Data
Presentation transcript:

1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

2 Topics Creating data with loops in data steps Creating variables Modifying variables

3 Making 100 records Once you have the 100 records you can add in details based on the value of dude. You can easily add in a random number for everyone’s height and make half the people male. Click here to add new variables.

4 Case Clauses You can add new variables using functions and simple assignment statements inside case- when-else-end phrases within the SQL. This is what we are building. Notice the computed column, the name of the new node and the name of the new data set. Click Preview to see the SQL code.

5 Computed At the end of the process we want a character variable with dudes 1 to 50 to be male and 51 to 100 to be female and anything else labeled as *** BAD SEX ***. 1 st 2 nd

6

7 Be sure to specify a character column if you are making strings of characters. Always specify a value for when it does not know what value to use. Add in the rules on how to replace.

8

9 Click Modify Task and click Computed Column to add in the random height.

10 Display 5 digits including the. Two are after the decimal To change how the new values appear click here.

11 You can find functions here. Use OnLineDoc to find more information. rand("normal", 67, 3)

12 More Complex Variables You can compute a new value for different levels of a existing variable. Say I want to add 2 inches to all the males. Open the with height and gender Click computed columns Recode the sex column Specify it is a numeric column Replace female with the height variable Replace male with the height variable + 2

13

14 Nested Loops How to create data. – Use loops. – Be sure to include an end with every do. – Include an output inside the innermost loop. If you forget the output, the only time it will write a record to the new dataset is at the end of the data step.

15 Advanced Expressions If you get sick of clicking you can write complex case statements yourself using: Computed Column Advanced Expression Type the case logic case when (treat = "Placebo") then rand("norm", 10, 1) when (treat = "Low") then rand("norm", 19, 2) when (treat = "High") then rand("norm", 20, 2) else.E end

16

17 Fixing Bad Values You will eventually need to fix bad data. – Say you want to set Placebo5 to be a score of 10. Name the node and output. Select the variables that are not modified.

18 Fixing Bad Values Tell it to compute a column and choose either Recode column or do a case-when-else-end statement in an Advanced expression.

19

20 To get a better look click validate

21 Hang on to this syntax case when (logic check) then new value else originalVariable end

22 Collapsing Groups Often you will have a categorical variable and you will want to reduce the number of groups. – High Dose and Low Dose are the same as being on a drug. You can create a new variable or just use a custom format to change how the values appear.

23 Adding a New Column Choose Computed Column and recode a column.

24 Adding a User Defined Format Here we are changing characters to appear as other characters.

Repeat until you have filled in all the values you want to appear differently.

26 Using Formats The formats are not automatically associated with any variables. You need to tell SAS to apply the format when it is creating a dataset or when it is processing a variable. Some processing nodes do better if you have assigned the format in a previous step.

27 Select the variable that needs the format and click properties. Click Change… and then pick the User Defined format.

28 Same Information Formatted

29 Combining When you have data in two tables, you need to tell SQL how the two tables are related to each other. – Typically you have a subject ID number in both files. The variable that can be used to link information is called the key.

30 Demographics Response to Treatment Here the two tables have different variables (except ID) and they are in a different sort order. We want the favorite color merged in to see if it is related to response to treatment.

31 Merging Merging is trivially easy with EG. Choose a table and do the Query Builder…. And push the Join Tables button.

32 Double click on the dividing lines to make the columns wide enough to read.

33 Notice the name t1. In the SQL statements, variables from this table will have the prefix t1. This table will be referred to as t2. It noticed that the two tables have the common variable ID. Therefore it is going to match records that have a common value in ID. Double click the link for details.

34 Joins You will typically do inner joins and left joins. – Inner Joins: select the marching records – Left Joins: select all records on the left side and any records that match on the right.

35 Inner Joins Inner Joins are useful when you want to keep the information from the tables, if and only if, there are matches in both tables. – Here you keep the records where you have demographic and response to treatment information on people.

36 Left Joins Left joins are useful when you have a table with everybody on the left side of the join and not everyone has records in the right table. – A typical example has the left side with the IDs of everyone in a family and the right table has information on diagnoses. Not everyone is sick so you want to keep all the IDs on the left and add in diagnoses where you can.

37 Typical Left Join Notice the numeric variable is formatted to display with words.

38

39 Coalesce The previous example leaves NULL for the people who are disease free. You probably want to list the rest as healthy. The coalesce function returns the first non- missing value. – Coalesce works on numeric lists. – Coalesce works on character lists.

40

41 Coalesce If you are using left joins from multiple tables, coalesce can be really useful. – Say you have people who have reported disease, other people have verified disease and the rest are assumed to be healthy. You can coalesce an indicator variable from the verified table and reported table and call everybody else healthy.

42 If the tables have indicator variables, once the tables are linked, the coalesce function is easy: COALESCEC(t3.status2, t2.status1, "Healthy"))

43 No indicator variables? If the tables you are coalescing do not have indicator variables, just make them as part of the query by adding a column which has the ID in the child tables (e.g., reported and verified) recoded to a word like “reported” or “verified”.

44 The two new indicator columns.

45 Coalesce the new columns Once the new columns are created, create a new variable using the Advanced expression option for a new computed column. Then do coalesce on the new variables. Double click on the new variables and it will insert the code.

46 After double clicking the ver variable the code is inserted. Don’t forget the comma before double clicking the rep variable. After inserting reported and verified, put in another comma and the “healthy” option.

47