Appending and Concatenating Files

Slides:



Advertisements
Similar presentations
The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!
Advertisements

SAS Programming:File Merging and Manipulation. Reading External Files (review) data barf; * create the dataset BARF; infile ’s:\mysas\Table7.1'; * open.
S ORTING WITH SAS L ONG, VERY LONG AND LARGE, VERY LARGE D ATA Aldi Kraja Division of Statistical Genomics SAS seminar series June 02, 2008.
SAS Programming: Working With Variables. Data Step Manipulations New variables should be created during a Data step Existing variables should be manipulated.
I OWA S TATE U NIVERSITY Department of Animal Science Getting Started Using SAS Software Animal Science 500 Lecture No. 2.
Today: Run SAS programs on Saturn (UNIX tutorial) Runs SAS programs on the PC.
Biostatistical Methods II PubH 6415 Spring PubH 6415 – Biostatistics I Instructor: Susan Telke (office hours: lecture.
Quick Data Summaries in SAS Start by bringing in data –Use permanent data set for these examples Proc Tabulate –Produces summaries very quickly and easily.
Basic And Advanced SAS Programming
Continuous Moderator Variables
SAS Programming SAS Data Mart. Outline Access different format of data for SAS SAS data mart SAS data manipulation 2.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Introduction to SAS Lecture 2 Brian Healy.
Topics in Data Management SAS Data Step. Combining Data Sets I - SET Statement Data available on common variables from different sources. Multiple datasets.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
SAS PROC REPORT PROC TABULATE
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze Data Using.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
1 Filling in the blanks with PROC FREQ Bill Klein Ryerson University.
Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 8 Lists and Tuples.
Lecture 3 Topic - Descriptive Procedures Programs 3-4 LSB 4:1-4.4; 4:9:4:11; 8:1-8:5; 5:1-5.2.
SAS Basics. Windows Program Editor Write/edit all your statements here. Log Watch this for any errors in program as it runs. Output Will automatically.
Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.
Lesson 12 More SGPLOT examples Exporting data Macro variables Table Generation - PROC TABULATE Miscellaneous Topics.
Controlling Input and Output
SAS Basics. Windows Program Editor Write/edit all your statement here.
Time Series Data Processes by Tai Yu April 15, 2013.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
GRAPH Definition: A PICTORIAL REPRESENTATION OF INFORMATION RECORDED IN A DATA TABLE. USED TO SHOW A RELATIONSHIP BETWEEN TWO OR MORE FACTORS.
An Introduction to Proc Transpose David P. Rosenfeld HR Consultant, Workforce Planning & Data Management City of Toronto.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
Thinking about Graphs The Grammar of Graphics and SAS.
Longitudinal Data Techniques: Looking Across Observations Ronald Cody, Ed.D., Robert Wood Johnson Medical School.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
ODS Graphics By Example March 16, 2016 Rocio Lopez.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Session 1 Retrieving Data From a Single Table
Introduction to Graphing in SAS
Applied Business Forecasting and Regression Analysis
Chapter 6: Modifying and Combining Data Sets
Basic Queries Specifying Columns
SAS Programming I Matthew A. Lanham Doctoral Student
Chapter 18: Modifying SAS Data Sets and Tracking Changes
Lesson 9 - Topics Restructuring datasets LSB: 6:14
Lesson 8 - Topics Creating SAS datasets from procedures
Match-Merge in the Data Step
SAS Essentials How SAS Thinks
Creating the Example Data
Continuous Moderator Variables
Lesson 7 - Topics Reading SAS data sets
Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”,
Working With Dates: Dates Come in Many Ways
Demonstrating the Linear Model
Introduction to DATA Step Programming SAS Basics II
Introduction to DATA Step Programming: SAS Basics II
Inner Joins.
Combining Data Sets in the DATA step.
Type=Corr SAS.
Working With Dates: Dates Come in Many Ways
Producing Descriptive Statistics
Framingham, Exam 5 subset
Set Axis macro.
Let’s continue to review some of the statistics you’ve learned in your first class: Bivariate analyses (two variables measured at a time on each observation)
Hans Baumgartner Penn State University
Presentation transcript:

Appending and Concatenating Files APPEND Procedure DATA STEP SET Statement Uses two data sets. Uses any number of data sets. Proc Append Uses all variables in the BASE= data set and assigns missing values to observations from the DATA= data set where appropriate. Cannot include variables found only in the DATA= data set. SET Statement Uses all variables and assigns missing values where appropriate. With appending, the base data set was not read so the descriptor portion could not change. That is not the case with concatenating.

DATA dsname; SET SAS-data-set1 SAS-data-set2 . . . ; /* <SAS statements>*/ RUN;

Example -- five data sets proc contents data=class.sbp1;run; proc contents data=class.sbp2;run; proc contents data=class.sbp3;run; proc contents data=class.sbp4;run; proc contents data=class.sbp5;run;

Data are from a longitudinal study of over 5,000 participants The variable is systolic blood pressure (there were many more variables)

How one formats the files for longitudinal data depends on what analysis one is doing. Sometimes one wants all data for an individual to be a row Sometimes one wants a separate row for each measurement

Wide format – each row contains information for a single participant – variables are often numbered accordingly. ID SBP1 SBP2 SBP3 SBP4 SBP5

Long format – each row contains a single measurement and the time it was taken. So each participant has multiple rows on the file. ID 1 SBP 2 3 4 5 SBO

Wide Format proc sort data=class.sbp1 out=sbp1;by id; data wide; merge sbp1 sbp2 sbp3 sbp4 sbp5; by id; run;

This would be the format for examining correlation among the measurements. proc corr data=wide; var sbp:; run;

Long Format data long (keep=id exam sbp); set sbp1 (in=a rename=(sbp1=sbp)) sbp2 (in=b rename=(sbp2=sbp)) sbp3 (in=c rename=(sbp3=sbp)) sbp4 (in=d rename=(sbp4=sbp)) sbp5 (in=e rename=(sbp5=sbp)); if a then exam=1; else if b then exam=2; else if c then exam=3; else if d then exam=4; else exam=5; run; proc sort data=long;by id exam;run; proc print data=long noobs;run;

First 15 rows This is the format required for numerous longitudinal analyses including “spaghetti” plots.

The SG Procs

Proc SGPLOT proc sgplot data=long; series x=exam y=sbp/group=id; run; series x=exam y=sbp/group=id markers; series x=exam y=sbp/group=id markers markerattrs=(symbol=circlefilled); markerattrs=(symbol=circlefilled size=12) lineattrs=(thickness=3); title "Spaghetti Plot for 15 Participants"; title; xaxis label="Measurement Number" labelattrs=(family=swiss weight=bold); yaxis label="Systolic Blood Pressure" labelattrs=(family=swiss weight=bold);