Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.

Slides:



Advertisements
Similar presentations
Chapter 2: referencing Files and Setting Options.
Advertisements

SAS Programming:File Merging and Manipulation. Reading External Files (review) data barf; * create the dataset BARF; infile ’s:\mysas\Table7.1'; * open.
Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.
Statistics in Science  Introducing SAS ® software Acknowlegements to David Williams Caroline Brophy.
I OWA S TATE U NIVERSITY Department of Animal Science Modifying and Combing SAS Data Sets (Chapter in the 6 Little SAS Book) Animal Science 500 Lecture.
Today: Run SAS programs on Saturn (UNIX tutorial) Runs SAS programs on the PC.
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
Introduction to SAS Lecture 2 Brian Healy.
Creating SAS® Data Sets
Data Cleaning 101 Ron Cody, Ed.D Robert Wood Johnson Medical School Piscataway, NJ.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
Week 3 Topic - Descriptive Procedures Program 3 in course notes Cody & Smith (Chapter 2)
Lecture 5 Sorting, Printing, and Summarizing Your Data.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
BMTRY 789 Lecture 3: Categorical Data and Dates Readings – Chapter 3 & 4 Lab Problems 3.1, 3.2, 3.19, 4.1, 4.3, 4.5 Homework – HW 2 Book Problems Due 6/24!
Lesson 5 - Topics Formatting Output Working with Dates Reading: LSB:3:8-9; 4:1,5-7; 5:1-4.
SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze Data Using.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
I OWA S TATE U NIVERSITY Department of Animal Science Getting Your Data Into SAS (Chapter 2 in the Little SAS Book) Animal Science 500 Lecture No. 3 September.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Introduction to SAS Essentials Mastering SAS for Data Analytics
Summer SAS Workshop Lecture 2. Summer Summer SAS Workshop Lecture 2 I’ve got Data…how do I get started? Libname Review How do you do arithmetic.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
Lecture 3 Topic - Descriptive Procedures Programs 3-4 LSB 4:1-4.4; 4:9:4:11; 8:1-8:5; 5:1-5.2.
SAS Basics. Windows Program Editor Write/edit all your statements here. Log Watch this for any errors in program as it runs. Output Will automatically.
Lesson 4 - Topics Creating new variables in the data step SAS Functions.
Chapter 17: Formatting Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
Lesson 8 - Topics Creating SAS datasets from procedures Using ODS and data steps to make reports Using PROC RANK Programs in course notes LSB 4:11;5:3.
Lesson 12 More SGPLOT examples Exporting data Macro variables Table Generation - PROC TABULATE Miscellaneous Topics.
SAS Basics. Windows Program Editor Write/edit all your statement here.
Lecture 4 Ways to get data into SAS Some practice programming
Today: March 7 Data Transformations Rank Tests for Non-Normal data Solutions for Assignment 4.
SAS for Data Management and Analysis
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 Introduction to SAS Available at
Use the SET statement to: –create an exact copy of a SAS dataset –modify an existing SAS dataset by creating new variables, subsetting (using a subsetting.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
The Urban Institute - SAS Training6/9/20161 SAS Training This SAS Training Course was designed to introduce users at The Urban Institute to SAS programming.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Today: Feb 28 Reading Data from existing SAS dataset One-way ANOVA
Lesson 2 Topic - Reading raw data into SAS
Lesson 12 Topics Macro example Exporting data Character Functions
SAS Programming Introduction to SAS.
Lecture 2 Topics - Descriptive Procedures
Lesson 9 - Topics Restructuring datasets LSB: 6:14
Instructor: Raul Cruz-Cano
Lesson 8 - Topics Creating SAS datasets from procedures
Match-Merge in the Data Step
SAS Essentials How SAS Thinks
Lesson 7 - Topics Reading SAS data sets
Working With Dates: Dates Come in Many Ways
Introduction to DATA Step Programming: SAS Basics II
Working With Dates: Dates Come in Many Ways
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Appending and Concatenating Files
Hans Baumgartner Penn State University
Introduction to SAS Essentials Mastering SAS for Data Analytics
Presentation transcript:

Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets

Working With SAS Data Sets Reading SAS dataset –SET Statement Merging SAS datasets –MERGE Statement Done within a DATA step

SET STATEMENT Reads SAS data set Replaces INFILE and INPUT statements used when reading in raw data KEEP brings in selected variables (columns) Where brings in selected observations (rows) DATA new; SET old (KEEP = varlist); WHERE = condition; RUN; This creates a new data set called new that has the variables in varlist and selected observations from old.

PROGRAM 9 Making SAS Datasets from Other SAS Datasets; DATA tdata; INFILE ‘C:\SAS_Files\tomhs.data' ; 1 ptid 12 clinic 25 group 30 sex 123 sbp12 14 randdate $10. ; RUN; * Making a new dataset containing only men; DATA men; SET tdata; * reads the existing dataset; WHERE sex = 1; This does the selection; if group in(1,2,3,4,5) then active = 1; else if group in(6) then active = 2; KEEP ptid clinic group sbp12 randdate active; RUN;

* Making a new dataset containing only women; DATA women; SET tdata; WHERE sex = 2; if group in(1,2,3,4,5) then active = 1; else if group in(6) then active = 2; KEEP ptid clinic group sbp12 randdate active; RUN; We now have 3 datasets “active” tdata men women

DATA clinic; INFILE DATALINES; INPUT id $ sbp ; DATALINES; C B B D A B B B A … more data ; DATA lab; INFILE DATALINES; INPUT id $ glucose; DATALINES; C B D A B B B A D … more data ; PROGRAM 11 - Merging SAS Datasets

* Creating merged dataset; PROC SORT DATA= clinic; BY id; PROC SORT DATA= lab; BY id; DATA study; MERGE clinic lab; BY id ; RUN; Note: The BY statement is very important!

Merged Dataset Obs id sbp glucose 1 A A A B B B B B D D

What if you want observations that are in both datasets? DATA study; MERGE clinic (IN=in1) lab (IN=in2); BY id; if in1 and in2; RUN; PROC PRINT DATA=study; TITLE ‘Patients with Clinic and Lab'; RUN;

Logical Statements * Must be in 1st dataset; if in1; * Same as: if in1 = 1; * Must be in 2nd dataset; if in2; * Must be in both datasets; if in1 and in2;

Things to Remember When Merging Datasets Need to have common variable name is each dataset to use as linking variable Variables in dataset with no match will be set to missing Rows matched that have same variable names will be assigned right-most dataset value Always remember the BY statement in the merge!

Temporary vs Permanent SAS Datasets Temporary (or working) SAS dataset - After SAS session is over the dataset is deleted. DATA bp; * bp is deleted after SAS session; (rest of program) Permanent SAS dataset - After program is run the dataset is saved and is available for use in future programs. You need to tell SAS where to store/retrieve the dataset. Note: For PC SAS the working dataset is available until you end the SAS session.

Reasons to Create Permanent SAS Datasets Read raw data and compute calculated variables only once All variables have assigned names and labels. Data is ready to be analyzed. Dataset can be sent to other computers or users.

Creating a Permanent Dataset LIBNAME mylib ‘C:\My SAS Datasets’ ; DATA mylib.sescore; LIBNAME – assigns a directory (folder) reference name. In this example the directory ‘C:\My SAS Datasets’ is assigned a reference name of mylib. DATA mylib.sescore; Tells SAS to create a dataset called sescore in the directory referenced by mylib, which is ‘C:\My SAS Datasets’.

Examples of LIBNAME Statements LIBNAME mylib ‘C:\My SAS Files'; LIBNAME class ‘C:\My SAS Files' ; LIBNAME ph6420 'C:\My SAS Files\SASClass\' ; LIBNAME points to a directory (folder) DATA mylib.datasetname; DATA class.datasetname; DATA ph6420.datasetname; On UNIX and PC the file will be called datasetname.sas7bdat

PROGRAM 11 LIBNAME mylib ‘C:\SAS_Files' ; DATA mylib.sescore; INFILE ‘C:\SAS_Files\tomhs.data' LRECL =400; 1 ptid 12 clinic 14 randdate 25 group 49 educ 85 wtbl 97 wt12 sbpbl sbp12 (sebl_1-sebl_20) (1. (se12_1-se12_20) (1. +1) ;

wtd12 = wt12 - wtbl; sbpd12 = sbp12 - sbpbl; sescrbl = MEAN (OF sebl_1 - sebl_20) ; sescr12 = MEAN (OF se12_1 - se12_20) ; sescrd12 = sescr12 - sescrbl ; LABEL educ = 'Highest Education Level'; LABEL wt12 = 'Weight (lbs) at 12 Months'; LABEL wtbl = 'Weight (lbs) at Baseline'; LABEL wtd12 = 'Weight Change at Baseline'; LABEL sbpbl = 'Systolic BP (mmHg) at Baseline'; LABEL sbp12 = 'Systolic BP (mmHg) at 12 Months'; LABEL sbpd12 = 'Systolic BP Change at 12 Months'; LABEL group = 'Treatment Group (1-6)'; LABEL sescrbl = 'Side Effect at Baseline'; LABEL sescr12 = 'Side Effect at 12 Months'; LABEL sescrd12 = 'Side Effect Change Score'; FORMAT randdate mmddyy10. ; DROP sebl_1-sebl_20 se12_1-se12_20 ;

60 LIBNAME mylib 'C:\SAS_Files'; NOTE: Libref MYLIB was successfully assigned as follows: Engine: V9 Physical Name: C:\SAS_Files DATA mylib.sescore; NOTE: The infile 'C:\SAS_Files\tomhs.data' is: File Name=C:\SAS_Files\tomhs.data, RECFM=V,LRECL=400 NOTE: 100 records were read from the infile 'C:\SAS_Files\tomhs.data'. NOTE: The data set MYLIB.SESCORE has 100 observations and 14 variables.

PROC CONTENTS DATA=mylib.sescore VARNUM ; TITLE 'Description of Variables in Dataset SESCORE' ; RUN; What is inside a SAS dataset? Data Names, labels, and formats of all variables PROC CONTENTS reads the descriptor portion of the dataset

Description of Variables in Dataset SESCORE The CONTENTS Procedure Data Set Name: MYLIB.SESCORE Observations: 100 Member Type: DATA Variables: 14 Engine: V9 Indexes: 0 Created: 10:59 Wednesday, August 11,2004 Observation Length: 112 Last Modified: 10:59 Wednesday, August 11,2004 Deleted Observations: 0 Protection: Compressed: NO Data Set Type: Sorted: NO Label: -----Engine/Host Dependent Information----- File Name: C:\SAS_Files\sescore.sas7bdat Release Created: Host Created: XP_PRO File Size (bytes): Note: mylib is not a part of the dataset name

# Variable Type Len Pos Format Label ptid Char Patient ID 2 clinic Char Clinical Center 3 randdate Num 8 0 MMDDYY10. Randomization Date 4 group Num 8 8 Treatment Group (1-6) 5 educ Num 8 16 Highest Education Level 6 wtbl Num 8 24 Weight (lbs) at Baseine 7 wt12 Num 8 32 Weight (lbs) at 12 Months 8 sbpbl Num 8 40 Systolic BP (mmHg) at Baseline 9 sbp12 Num 8 48 Systolic BP (mmHg) at 12 Months 10 wtd12 Num 8 56 Weight Change at Baseline 11 sbpd12 Num 8 64 Systolic BP Change at 12 Months 12 sescrbl Num 8 72 Side Efect at Baseline 13 sescr12 Num 8 80 Side Efect at 12 Months 14 sescrd12 Num 8 88 Side Efect Change Score Variables listed in creation order This becomes the documentation of the dataset

LIBNAME mylib ‘C:\SAS_Files' ; DATA sescore; …. RUN; PROC COPY IN=work OUT=mylib; SELECT sescore; RUN; Using PROC COPY to copy work dataset to permanent dataset Make a work dataset first – then when you know that is working correctly copy the work dataset to a permanent dataset.

Reading Permanent SAS Dataset LIBNAME class ‘C:\SAS_Files' ; * Tells SAS where to find the SAS dataset; PROC MEANS DATA=class.sescore ; TITLE 'Means of All Numeric Variables on SAS Permanent Dataset'; RUN; PROC CORR DATA=class.sescore; VAR wtd12 sbpd12 sescrd12; TITLE 'Correlation Matrix of 3 Change Variables'; RUN; What if dataset was moved to a different folder? Just need to change LIBNAME

Means of All Numeric Variables on SAS Permanent Dataset The MEANS Procedure Variable Label N Mean randdate Randomization Date group Treatment Group (1-6) educ Highest Education Level wtbl Weight (lbs) at Baseline wt12 Weight (lbs) at 12 Months sbpbl Systolic BP (mmHg) at Baseline sbp12 Systolic BP (mmHg) at 12 Months wtd12 Weight Change at Baseline sbpd12 Systolic BP Change at 12 Months sescrbl Side Effect at Baseline sescr12 Side Effect at 12 Months sescrd12 Side Effect Change Score

Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations wtd12 sbpd12 sescrd12 wtd Weight Change at Baseline sbpd Systolic BP Change at 12 Months sescrd Side Efect Change Score

* * Often you will read the permanent SAS dataset in a DATA step to modify or add variables. Usually these will be put on a new temporary SAS dataset. The SET statement reads a SAS dataset * *; LIBNAME class 'C:\SAS_Files' DATA rxdata; SET class.sescore; if group in(1,2,3,4,5) then rx = 1; else rx = 2; RUN; PROC MEANS DATA=rxdata N MEAN MAXDEC=2 FW=7; CLASS group; VAR sbpd12 wtd12 sescrd12; TITLE 'Change in SBP, Weight, and Side Effect Score by Treatment'; RUN;