Topics in Data Management SAS Data Step. Combining Data Sets I - SET Statement Data available on common variables from different sources. Multiple datasets.

Slides:



Advertisements
Similar presentations
Summary Statistics/Simple Graphs in SAS/EXCEL/JMP.
Advertisements

Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.
The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
SAS Programming:File Merging and Manipulation. Reading External Files (review) data barf; * create the dataset BARF; infile ’s:\mysas\Table7.1'; * open.
Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.
A guide to the unknown…  A dataset is longitudinal if it tracks the same type of information on the same subjects at multiple points in time or space.
I OWA S TATE U NIVERSITY Department of Animal Science Modifying and Combing SAS Data Sets (Chapter in the 6 Little SAS Book) Animal Science 500 Lecture.
SAS Programming: Working With Variables. Data Step Manipulations New variables should be created during a Data step Existing variables should be manipulated.
1 Spreadsheet Modeling & Decision Analysis: A Practical Introduction to Management Science, 3e by Cliff Ragsdale.
Introduction to SQL Session 1 Retrieving Data From a Single Table.
Basic And Advanced SAS Programming
PROC SQL – Select Codes To Master For Power Programming Codes and Examples from SAS.com Nethra Sambamoorthi, PhD Northwestern University Master of Science.
SAS Programming SAS Data Mart. Outline Access different format of data for SAS SAS data mart SAS data manipulation 2.
1 Access Lesson 6 Integrating Access Microsoft Office 2010 Introductory Pasewark & Pasewark.
Into to SAS ®. 2 List the components of a SAS program. Open an existing SAS program and run it. Objectives.
Creating SAS® Data Sets
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
Lecture 5 Sorting, Printing, and Summarizing Your Data.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Text Processing and More about Wrapper Classes. Contents I.The Test Score Problem II.Exercise.
PROC REPORT organizes the output in many ways, from the simple to highly complex… PROC REPORT NOWINDOWS HEADLINE HEADSKIP; COLUMN variable-list; DEFINE.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Chapter 21 Reading Hierarchical Files Reading Hierarchical Raw Data Files.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
SAS SQL Part 2 Alan Elliott. Dealing with Missing Values Title "Dealing with Missing Values in SQL"; PROC SQL; select INC_KEY,GENDER, RACE, INJTYPE, case.
Niraj J. Pandya, Element Technologies Inc., NJ.  Summarize all possible combinations of class level variables even if few categories are altogether missing.
Chapter 20 Creating Multiple Observations from a Single Record Objectives Create multiple observations from a single record containing repeating blocks.
Report Management Using the ODS DOCUMENT Destination and Report Metadata Brit Harvey February 2010.
1 Spreadsheet Modeling & Decision Analysis: A Practical Introduction to Management Science, 3e by Cliff Ragsdale.
Learning to love the SAS LAG function Phuse 9-12 October 2011 Herman Ament, MSD, Oss NL Phuse 9-12 October 2011.
Chapter 6 SAS ® OLAP Cube Studio. Section 6.1 SAS OLAP Cube Studio Architecture.
1 Experimental Statistics - week 2 Review: 2-sample t-tests paired t-tests Thursday: Meet in 15 Clements!! Bring Cody and Smith book.
Chapter 15: Combining Data Horizontally 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the.
Math 3400 Computer Applications of Statistics Lecture 1 Introduction and SAS Overview.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Use the UPDATE statement to: –update a master dataset with new transactions (e.g. a bank account updated regularly with deposits and withdrawals…). Not.
I OWA S TATE U NIVERSITY Department of Animal Science Getting Your Data Into SAS (Chapter 2 in the Little SAS Book) Animal Science 500 Lecture No. 3 September.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Text Processing and More about Wrapper Classes
Grant Brown.  AIDS patients – compliance with treatment  Binary response – complied or no  Attempt to find factors associated with better compliance.
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
1 Filling in the blanks with PROC FREQ Bill Klein Ryerson University.
Preparing your data for analysis using SAS Landon Sego 24 April 2003 Department of Statistics UW-Madison.
Priya Ramaswami Janssen R&D US. Advantages of PROC REPORT -Very powerful -Perform lists, subsets, statistics, computations, formatting within one procedure.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
Here’s another problem (see section 2.13 on page 54). A file contains two different types of records (say A’s and B’s) and we only want to read in the.
SAS Basics. Windows Program Editor Write/edit all your statements here. Log Watch this for any errors in program as it runs. Output Will automatically.
By max guerrero,bryan hernandez,caleb Portales  Spreadsheets are set up like tables with information running across rows and down columns. You could.
SAS Basics. Windows Program Editor Write/edit all your statement here.
Lecture 4 Ways to get data into SAS Some practice programming
Time Series Data Processes by Tai Yu April 15, 2013.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
Other Types of t-tests Recapitulation Recapitulation 1. Still dealing with random samples. 2. However, they are partitioned into two subsamples. 3. Interest.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
An Introduction to Proc Transpose David P. Rosenfeld HR Consultant, Workforce Planning & Data Management City of Toronto.
Chapter 10: Working with Large Data Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
Use the SET statement to: –create an exact copy of a SAS dataset –modify an existing SAS dataset by creating new variables, subsetting (using a subsetting.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
DEVRY BIS 155 F INAL E XAMS Check this A+ tutorial guideline at For more classes visit
Chapter 6: Modifying and Combining Data Sets
Match-Merge in the Data Step
SAS Essentials How SAS Thinks
Introduction to DATA Step Programming: SAS Basics II
Presentation transcript:

Topics in Data Management SAS Data Step

Combining Data Sets I - SET Statement Data available on common variables from different sources. Multiple datasets with common variable names, possibly different sampling/experimental units –Exam scores from students in various sections of STA 2023 –County level data from different state databases –Flight departure/arrival data from different months

Combining Data Sets I - SET Statement options nodate nonumber ps=54 ls=80; data one; input student $ 1-8 idnum 9-12 exam exam exam ; section=1; cards; Amy Zed ; run; data five; input student $ 1-8 idnum 9-12 exam exam exam ; section=5; cards; Alex Zach ; run; data all; set one five; run; proc print; run; quit;

Combining Data Sets I - SET Statement The SAS System Obs student idnum exam1 exam2 exam3 section 1 Amy Zed Alex Zach

Combining Data Sets II - MERGE Statement Data on common sampling/experimental units, different variables/characteristics measured in different datasets. –County data from different government sources –Store sales data updated over time

Combining Data Sets II - MERGE Statement options nodate nonumber ps=54 ls=80; data s2003; input store $ 1-8 sales ; cards; Atlanta 1459 Zurich 1383 ; run; data s2004; input store $ 1-8 sales ; cards; Atlanta 1459 Zurich 1383 ; run; proc sort data=s2003; by store; proc sort data=s2004; by store; data s0304; merge s2003 s2004; by store; run; proc print; run; quit; The SAS System Obs store sales03 sales04 1 Atlanta Zurich

Creating New Variables From Existing Ones Creating Final Grade for Students (Exams 1 and 2 Each Count 30%, Exam 3 40%) –Total = (0.3*Exam1)+(0.3*Exam2)+(0.4*Exam3) Obtaining Sales Growth (%) for stores –Grow0403=100*(sales04-sales03)/sales03

Grades Example data all; set one five; total=(0.3*exam1)+(0.3*exam2)+(0.4*exam3); run; proc print; var student idnum total; run; quit; The SAS System Obs student idnum total 1 Amy Zed Alex Zach

Building Case Histories Have multiple observations of same variable on individual units (not necessarily the same number across individuals). Want to summarize the measurements for each individual and obtain single “record”. –Summary of all Delta flights for each ATL route to other cities for October 2004 –Arrest record for juveniles over a 5 year period –Sales histories for individual stores in a retail chain

Building Case Histories Step 1: SORT dataset on the variable(s) that define(s) the individual units/cases. Step 2: Set the previous dataset into a new one, using the same BY statement as in the SORT. –The new dataset “sees” the old dataset as a series of “blocks” of measurements by individual cases Step 3: Define any variables you want to use to summarize cases in RETAIN statement. Step 4: At beginning of each individual, reset variables in Step 3 (typically to 0) Step 5: At end of each individual OUTPUT record

Example - Brookstone Store Sales&Inventory 8 EXCEL Spreadsheets - 4 Quarters X 2 Measures 520 stores observed over 52 weeks Typical Spreadsheet Portion (4 stores X 6 weeks): Note that the company provides 13 columns representing the 13 weeks in the quarter for each store…not the way we want to analyze it. Also, got rid of commas in EXCEL before exporting to text file.

Reading the Data in SAS Data inv1; infile ‘filename’; input storeid 6-8 storename $ do week=1 to 13; input output; end; run; This creates 13 “observations” per store and single inv variable

Reading the Data in SAS SET MERGE

Building a Store Record for Year Suppose Management wants following summary measures for each store: –Total sales –Average sales to inventory ratio –Mean and standard deviation of sales –Correlation between sales and inventory We need the following quantities counted across weeks: –SALES, SALES 2, INV, INV 2, SALES*INV, SALES/INV

SAS Code to Obtain Measures by Store (P1) Data inv; set inv1-inv4; run; proc sort; by storeid; run; Data sales; set sales1-sales4; run; proc sort; by storeid; run; Data invsales; merge inv sales; by storeid; run; proc sort; by storeid; run; Data invsales1; set invsales; by storeid; retain sumsales sumsales2 suminv suminv2 salesxinv sales_inv;

If first.storeid then do; sumsales=0; sumsales2=0; suminv=0; suminv2=0; salesxinv=0; sales_inv=0; end; sumsales=sumsales+sales; sumsales2=sumsales2+(sales**2); suminv=suminv+inv; suminv2=suminv2+(inv**2); salesxinv=salesxinv+(sales*inv); sales_inv=sales_inv+(sales/inv); if last.storeid then do; totsales=sumsales; meansal_inv=sales_inv/52; meansales=totsales/52; varsales=(sumsales2-(sumsales**2)/52)/51; stdsales=sqrt(varsales); varinv=(suminv2-(suminv**2)/52)/51; stdinv=sqrt(varinv); covslinv=(salesxinv-(sumsales*suminv)/52)51; corrslinv=covslinv/(stdsales*stdinv); output; end; run;