Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;

Slides:



Advertisements
Similar presentations
Haas MFE SAS Workshop Lecture 3:
Advertisements

The SAS ® System Additional Information on Statistical Analysis Programming.
SAS Programming:File Merging and Manipulation. Reading External Files (review) data barf; * create the dataset BARF; infile ’s:\mysas\Table7.1'; * open.
Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.
Chapter 9: Introducing Macro Variables 1 © Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
I OWA S TATE U NIVERSITY Department of Animal Science Modifying and Combing SAS Data Sets (Chapter in the 6 Little SAS Book) Animal Science 500 Lecture.
I OWA S TATE U NIVERSITY Department of Animal Science Getting Started Using SAS Software Animal Science 500 Lecture No. 2.
Beginning Data Manipulation HRP Topic 4 Oct 19 th 2011.
Quick Data Summaries in SAS Start by bringing in data –Use permanent data set for these examples Proc Tabulate –Produces summaries very quickly and easily.
Introduction to SQL Session 1 Retrieving Data From a Single Table.
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
Lecture 5 Sorting, Printing, and Summarizing Your Data.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
PROC REPORT organizes the output in many ways, from the simple to highly complex… PROC REPORT NOWINDOWS HEADLINE HEADSKIP; COLUMN variable-list; DEFINE.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
18b. PROC SURVEY Procedures in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Key Data Management Tasks in Stata
1 Performing Spreadsheet What-If Analysis Applications of Spreadsheets.
Chapter 6 SAS ® OLAP Cube Studio. Section 6.1 SAS OLAP Cube Studio Architecture.
Chapter 15: Combining Data Horizontally 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Use the UPDATE statement to: –update a master dataset with new transactions (e.g. a bank account updated regularly with deposits and withdrawals…). Not.
Chapter 16: Using Lookup Tables to Match Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Haas MFE SAS Workshop Lecture 2: The Data Management Alex Vedrashko For sample code and these slides, see Peng Liu’s page
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Priya Ramaswami Janssen R&D US. Advantages of PROC REPORT -Very powerful -Perform lists, subsets, statistics, computations, formatting within one procedure.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 8 Lists and Tuples.
BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.
5 1 Data Files CGI/Perl Programming By Diane Zak.
The Power of the BY Statement SVSUG Paul Choate, California Developmental Services (& Toby Dunn, U.S. Army Medical Department Center & School)
Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.
Chapter 17: Formatting Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Summer SAS Workshop Lecture 3. Summer SAS Workshop Website
1 Statistical Software Programming. STAT 6360 –Statistical Software Programming Modifying and Combining Datasets For most tasks we need to work with multiple.
Lesson 8 - Topics Creating SAS datasets from procedures Using ODS and data steps to make reports Using PROC RANK Programs in course notes LSB 4:11;5:3.
Controlling Input and Output
An Introduction Katherine Nicholas & Liqiong Fan.
14b. Accessing Data Files in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
“LAG with a WHERE” and other DATA Step Stories Neil Howard A.
LISTS and TUPLES. Topics Sequences Introduction to Lists List Slicing Finding Items in Lists with the in Operator List Methods and Useful Built-in Functions.
17b.Accessing Data: Manipulating Variables in SAS ®
BMTRY 789 Lecture 6: Proc Sort, Random Number Generators, and Do Loops Readings – Chapters 5 & 6 Lab Problem - Brain Teaser Homework Due – HW 2 Homework.
LISA SHORT COURSE SERIES: INTRODUCTION TO SAS UNIVERSITY William DeShong Fall 2015.
Chapter 23: Selecting Efficient Sorting Strategies 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Use the SET statement to: –create an exact copy of a SAS dataset –modify an existing SAS dataset by creating new variables, subsetting (using a subsetting.
Chapter 21: Controlling Data Storage Space 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
BMTRY 789 Lecture9: Proc Tabulate Readings – Chapter 11 & Selected SUGI Reading Lab Problems , 11.2 Homework Due Next Week– HW6.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 8, 13, & 24 By Tasha Chapman, Oregon Health Authority.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 7 & 10 By Tasha Chapman, Oregon Health Authority.
SAS and Other Packages SAS can interact with other packages in a variety of different ways. We will briefly discuss SPSSX (PASW) SUDAAN IML SQL will be.
Chapter 6: Modifying and Combining Data Sets
Chapter 3: Working With Your Data
ECONOMETRICS ii – spring 2018
Chapter 18: Modifying SAS Data Sets and Tracking Changes
Former Chapter 23: Selecting Efficient Sorting Strategies
Chapter 4: Sorting, Printing, Summarizing
SAS Essentials How SAS Thinks
How are your SAS Skills? Chapter 1: Accessing Data (Question # 1)
Topics Sequences Introduction to Lists List Slicing
Topics Sequences Introduction to Lists List Slicing
Presentation transcript:

Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run; 1 © Fall 2011 John Grego and the University of South Carolina

The SET statement  Its main use is to read in a previously created SAS data set (either in WORK or another library) to be modified and saved as a new data set  We could stack (not concatenate) multiple data sets by listing several data sets in the SET statement  If one data set contains variable(s) not included in the other data set(s), the observations from the other sets will have missing values for those variables in the combined data set

The SET statement   If input data sets are sorted by a specific variable, stacking them may not preserve the sorting   To preserve the sorting, we can interleave the data sets with a BY statement (but we must sort all data sets first)

Merging Data Sets   When observations in two or more data sets are connected by having at least one common variable, it is possible to merge the data sets together DATA combineddataname; MERGE dataset1.. datasetk; BY common_variable;

Merging Data Sets   Note: If the data sets have identically named variables (other than the BY variable), then the merged data will contain only the values from the last data set   All data sets need to be sorted by the BY variable before they can be merged

Merging Data Sets   We can also merge each observation in a smaller data set with several observations from a larger data set (one-to-many match-merge)   MERGE without a BY statement saves the input data side-by-side (compare to SET command)

Merging Data Sets   The IN option is typically used to track which data set an observation in a combined data set came from   Variables in the IN option only exist during that data step, but can be used to create other variables

Merging Summary Statistics and Data Often we want to merge summary statistics (either statistics for entire data set or often for groups within the data set) with the observations themselves) First calculate the summary statistics using PROC MEANS (after sorting if necessary)

Merging Summary Statistics and Data Output the summary statistics to another data set with an OUTPUT statement Give the statistics meaningful names in this output data set Use a MERGE statement to combine the original data with the OUTPUT data from PROC MEANS

Merging Summary Statistics and Data Once summary statistics are merged with original data, we can calculate: centered data observations, standardized data observations, or data expressed as a percentage of BY variable sums This is done by transforming data through functions involving the summary statistics This can be tricky in R too (I have to use match() twice to make it work)

Merging the Grand Total with the Original Data When PROC MEANS is used without a BY statement, you can get the grand total, the grand mean, etc., rather than the BY group variables’ summary statistics

Merging the Grand Total with the Original Data Merging is more difficult because the original data and summary data do not have a common variable We need to trick SAS with the SET command DATA newdataset; IF _N_=1 THEN SET summarydataset; SET olddataset;

Merging the Grand Total with the Original Data Variables read from the summary data set with the first SET command are retained with all observations This is a general trick for merging one (or a few) observations with many, where no common variables are present

Merging the Grand Total with the Original Data The UPDATE statement is similar to MERGE, but is typically used when a data set changes over time—new variables are added, values of variables are changed for old observations, etc (See Pages )

Data Set Options We have already seen many of these options incorporated into our earlier examples System options are specified in the OPTIONS statement (they affect SAS’s operation, often its formatting)

Data Set Options Statement options affect the running of a step NOPRINT is often used when you use a PROC to create an output data set NOPRINT option in PROC MEANS NOWINDOWS option in PROC REPORT DATA=.. option in any procedure

Data Set Options Data set options affect the reading/writing of the data set We can use these data set options in DATA steps (with statements like DATA, SET, MERGE, UPDATE )

Data Set Options We can use data set options in PROC steps (with DATA=.. option) KEEP = keepvar1..keepvark; DROP=dropvar1..dropvark; RENAME=(oldname=newname); FIRSTOBS=firstrownumber OBS=numberobservations

Creating several data sets with the OUTPUT statement A single DATA step can create several SAS data sets (this is a trick I don’t use nearly enough) The DATA line must give multiple data set names DATA set1 set2 set3;

Creating several data sets with the OUTPUT statement The OUTPUT statement is ofen used with IF-THEN statements or within a DO loop IF..THEN OUTPUT set1; ELSE OUTPUT set2;

Creating several data sets with the OUTPUT statement The OUTPUT statement can also be used to create several observations from one It transforms “wide” data sets into “long” data sets It is often used with repeated-measures data (several values observed for each individual)

Creating several data sets with the OUTPUT statement OUTPUT is also useful for generating function values Used in a DO loop, OUTPUT will tell SAS to create an observation at each iteration of the DO loop

Using PROC TRANSPOSE to Flip Observations and Variables PROC TRANSPOSE DATA=.. OUT=..; (names the new transposed data set) BY..; (identifies variables you don’t want transposed) ID..; (values of this variable will become variable names) VAR..; (the values of thse variables will be transposed—placed as rows for each level of the BY variable)

Using PROC TRANSPOSE to Flip Observations and Variables We must first sort by the BY variable Note: If the ID statement is missing, the newly created variables will have default names col1, col2, etc. PROC TRANSPOSE is handy for converting “wide” data files into “long” data files (or vice versa), especially with longitudinal data

Automatic Variables in SAS During the DATA step, SAS creates temporary “automatic” variables. These are not typically saved as part of the data set, but they can be used in the DATA step

Automatic Variables in SAS _N_ keeps track of the number of times SAS has looped through the DATA step (i.e., the number of observations that have been read) It may be different from “obs #” if data has been “subsetted” The automatic variable _ERROR_ is binary; 1 if observation has an error, 0 otherwise

Automatic Variables in SAS FIRST.groupvariable is 1 for the first observation with a new value for groupvariable; 0 otherwise LAST.groupvariable is 1 for the first observation with a new value for groupvariable; 0 otherwise

Automatic Variables in SAS   These commands can be useful for picking out the highest or lowest values for each level of groupvariable   Sort by the groupvariable, then use a subsetting IF along with a BY statement to save only the first or last occurrences for each level of groupvariable