Data Manipulation (with SQL)

Slides:



Advertisements
Similar presentations
Database Basics. What is Access? Database management system Computer-based equivalent of a manual database Makes it easy to organize and update information.
Advertisements

CC SQL Utilities.
Jeopardy Objects Navigation Buttons True/False Parts of a Report Vocabulary Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300 Q $400 Q $500 Final.
Final Thoughts HRP 223 – 2013 December 4 th, 2013 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
Working with Data in Windows HRP223 – 2010 October 4 th, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 SAS Formats and SAS Macro Language HRP223 – 2011 November 9 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
Beginning Data Manipulation HRP Topic 4 Oct 19 th 2011.
1 Merging with SQL HRP223 – 2011 October 31, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
1 Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
1 Processing Grouped Data HRP223 – 2011 November 14 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
1 Combining (with SQL) HRP223 – 2010 October 27, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Final Thoughts. When you get data… Check for Viruses Lock the files – Right click> properties>click on read only Assume the data has not been cleaned.
1 Lab 1 HRP223 – 2010 October 6, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
1 Database Theory and Normalization HRP223 – 2010 November 14 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
1 Lab 1 HRP223 – 2011 Oct 10, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
XP New Perspectives on Microsoft Office Excel 2003, Second Edition- Tutorial 11 1 Microsoft Office Excel 2003 Tutorial 11 – Importing Data Into Excel.
1 Windows and Beginning Data Manipulation HRP223 – 2013 Oct 9, 2012 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
HPR Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 Data List Spreadsheets or simple databases - a different use of Spreadsheets Bent Thomsen.
Working with Data in Windows HRP223 – 2009 Sept 28 th, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
PowerBuilder Online Courses - by Prasad Bodepudi
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
1 Summary HRP223 – 2009 November 1 st, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Working with Data Lists.
Gold – Crystal Reports Introductory Course Cortex User Group Meeting New Orleans – 2011.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 Lab 1 HRP223 – 2011 Oct 10, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
1 Lab 1 HRP223 – 2009 October 5, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
An electronic document that stores various types of data.
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Beginning Data Manipulation HRP Topic 4 Oct 14 th 2012 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
For Datatel and other applications Presented by Cheryl Sullivan.
Copyright 2009 The Little Engine That Could: Using EXCEL LIBNAME Engine Options to Enhance Data Transfers between SAS® and Microsoft® Excel Files William.
In cataloging module in the 01 Library, go to Services menu; choose “Retrieve Catalog Records” by highlighting it; then choose “Retrieve Catalog Records.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Practical Office 2007 Chapter 10
GO! with Microsoft Office 2016
* Lecture # 7 Instructor: Rida Noor Department of Computer Science
SAS Programming Introduction to SAS.
Working with Data in Windows
SAS Output Delivery System
Instructor: Raul Cruz-Cano
EndNote by: fatimah alotaibi.
MODULE 7 Microsoft Access 2010
Learning about Taxes with Intuit ProFile
Combining (with SQL) HRP223 – 2013 October 30, 2013
Benchmark Series Microsoft Word 2016 Level 2
Word offers a number of features to help you streamline the formatting of documents. In this chapter, you will learn how to use predesigned building blocks.
Learning about Taxes with Intuit ProFile
Working with Data in Windows and Descriptive Statistics
Lab 3 and HRP259 Lab and Combining (with SQL)
Lab 2 and Merging Data (with SQL)
Combining (with SQL) HRP223 – 2012 November 05, 2011
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Lab 1 HRP223 – 2009 October 5, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
A Bit About SAS/Macro Language Database Theory and Normalization
File Sharing and Processing Grouped Data
Instructor: Raul Cruz 9/4/13
Working with Data in Windows and Descriptive Statistics
Bent Thomsen Institut for Datalogi Aalborg Universitet
Final Thoughts.
Processing Grouped Data
Presentation transcript:

Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © 1999-2009 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

Topics For Today Organization Sharing a SAS dataset Renaming As .sas7bdat files or other formats Renaming Datasets Variables Subsetting a dataset Select a few variables Select a few records SQL reports for a single table of data Selecting/renaming variables Applying labels and formats Creating tables with SQL

Avoiding Spaghetti Code Organization Avoiding Spaghetti Code Programmers refer to unstructured, poorly thought-through, unorganized code as spaghetti code. Your EG projects will literally look like a tangled mess of spaghetti if you do not structure them in advance. Use several named process flows Use lots of notes in the project Include a lot of comments if you write code This is bad.

Organization Process Management Typically you will have a process flow that creates the library and does importing from the source file(s) and does data cleaning and splits the data into subsets. If you do different sets of analyses to the subsets, add in a process flow for each subset. Create a dataset called analysis that has all the information used in the analysis.

Organization You may want to link the library to the dataset and then uncheck Auto Arrange to have it show you the arrow. Right click on the process flow and give it a meaningful name.

The Greater Right of the Left Organization The Greater Right of the Left Your process flows should have the source of the data on the left. The left margin should have: A note saying what the flowchart does A code node that creates a toy dataset or a library (or libraries) that contains the data

Organization A Good Process Flow

Organization in Programs All my SAS code begins with the same header information. The /* */ are used to mark large comments.

Specify where output will be stored. Display manager deletes output text and log. Do not show the name of the procedures in output. Do X commands ASAP. Don’t show the date in output and reset page # to 1. Delete graphics in the work library. Make the folder where output will be stored if it does not exist. Delete what is there if it exists. Set file path to that directory. Make a library to store output datasets. Make a web page to display all output. Make pretty graphics. Run other programs. Turn off graphics and output.

Sharing Data You can share SAS data sets just like Excel files. Create a library. Copy the data into the library. If the data has formats associated with it, be sure to send the formats. More on this on a later date.

Sharing Exporting the Easy Way Double click the data set you want to export and use the Export context dependent menu. data humans; input isDead $ age; datalines; true 20 false 45 false 21 true 67 ; run;

With Code…. Create a library with the GUI or use the libname statement Sharing With Code…. Create a library with the GUI or use the libname statement libname blah "C:\blah"; Write a little program: proc copy in = work out = blah; select humans; run;

Sharing This code is efficient.

Sharing Alternatives Novices underuse proc copy. Instead they typically write less efficient data steps. For example, data blah.humans; set work.humans; run; Or they may write: data "C:\blah\humans.sas7bdat";

Sharing

Export Code for a Different Format Sharing Export Code for a Different Format proc export outfile = "c:\projects\hrp223\lab1\humans.xlsx" data = blah.humans replace; run;

Sharing Note that you have to manually connect the code node to the right place in the flow chart and the exported item does not show up on the process flow.

Copy and Rename Renaming datasets If you want to copy and rename a file, use the GUI or write code. Double click the data set. Choose Query Builder from the context sensitive menu.

Renaming datasets

Renaming datasets With code… data blah.test; set work.humans; run;

Select a Few Variables From Fake Data The next task is to select a couple of variables from a data set that has a LOT of variables. If you get a premade dataset with lots of extra variables, you want to drop the ones you will never use. Do this as soon as you can. First I will make some fake data. The data set will have a simulated test value filled into 6 “month” variables.

How to make a fake subject Fake data How to make a fake subject Comments data fakeData; * make 12 empty numeric variables; array month [12]; * put the record number into a variable named subject; subject = _N_; * repeat the code below 12 times; do x = 1 to 12; * put a random number from list #1 into month1, month2, etc; month[x] = ranuni(1); end; * write the information to the fake dataData data set; output; run;

Fake data

Selecting variables and renaming Rename and label variables Selecting variables and renaming You can use the Filter and Sort context sensitive menu to select a few variables. To rename a variable or change how it prints in reports you need to use the Query Builder or write code.

Rename and label variables Drag and drop the variables you want into the Select Data windowpane. Click on a variable name. Then use the properties button to change the name and the display label.

Rename and label variables

Rename and label variables I usually display the variable names instead of the labels.

Rename and label variables What it did…

Notice where the ; is found. This is one long statement. Rename and label variables Data Step Version Notice where the ; is found. This is one long statement. data firstLast (keep = subject january june); set fakedata2 (rename = (month1 = January month6 = June)); label January = "First Month"; label June = "Last Month"; run; * have subject apepar first by referencing it first; subject = .;

SQL reports Minimal SQL Print a report showing the contents of variables from a single data set. Note that there is no create table ____ as Put a comma-delimited list of variables here or * for all variables. Specify a library.table here.

SQL reports What variables? Typically you will use a coma delimited list but you can use an * to indicate that you want all variables selected instead of typing them all. There is no syntax to specify variables based on position in the source files. That is, you can not specify that you want to select the 2nd and 7th variables (from left to right) or to select the first 3 variables.

SQL reports – selecting variables Use of Minimal SQL Note that the order of the list sets the order in the report (or the order in a new dataset).

SQL reports – rename/label Renaming and Labels You can rename a variable in the list with an as statement. You can also specify variable labels.

SQL reports – format Using Formats Labels affect column headings and similar titles, and formats affect how values appear without changing the values themselves. Notice the lowercase i. The capitalization is set when the variable is created.

Preview of User Defined Formats SQL reports – format Preview of User Defined Formats Note the $ means a character format.

SQL tables blah Original table New table.

SQL reports – table aliases More Tweaks The from line references tables which are in libraries. Complex queries require you to reference the table name over and over again. Instead of having to type the long library and dataset names repeatedly, you can refer to the files as an alias. Print the column called dude from the table blah which is in the fakedata library. Here the b. is optional because dude is only in one table (the query only uses one table).

Rename label and format variables Data Step Version….