SET statement in DATA step

Slides:



Advertisements
Similar presentations
A Short Review Arrays, Pointers and Structures. What is an Array? An array is a collection of variables of the same type and placed in memory contiguously.
Advertisements

Introduction to Data Set Options Mark Tabladillo, Ph.D. Software Developer, MarkTab Consulting Associate Faculty, University of Phoenix January 30, 2007.
Haas MFE SAS Workshop Lecture 3:
A MATLAB function is a special type of M-file that runs in its own independent workspace. It receives input data through an input argument list, and returns.
Examples from SAS Functions by Example Ron Cody
Chapter 7 Introduction to Procedures. So far, all programs written in such way that all subtasks are integrated in one single large program. There is.
Combining Lags and Arrays [and a little macro] Lynn Lethbridge SHRUG OCT 28, 2011.
CS107 Introduction to Computer Science Lecture 3, 4 An Introduction to Algorithms: Loops.
AN INTRODUCTION TO PL/SQL Mehdi Azarmi 1. Introduction PL/SQL is Oracle's procedural language extension to SQL, the non-procedural relational database.
CS0004: Introduction to Programming Repetition – Do Loops.
Chapter 11 Group Functions
CSE 1301 Lecture 6B More Repetition Figures from Lewis, “C# Software Solutions”, Addison Wesley Briana B. Morrison.
Chapter 13: Creating Samples and Indexes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Section 3.2 Measures of Variation Range Standard Deviation Variance.
Chapter 14: Generating Data with Do Loops OBJECTIVES Understand iterative DO loops. Construct a DO loop to perform repetitive calculations Use DO loops.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Chapter 21 Reading Hierarchical Files Reading Hierarchical Raw Data Files.
1 CSC 221: Introduction to Programming Fall 2012 Functions & Modules  standard modules: math, random  Python documentation, help  user-defined functions,
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Haas MFE SAS Workshop Lecture 2: The Data Management Alex Vedrashko For sample code and these slides, see Peng Liu’s page
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Loops (cont.). Loop Statements  while statement  do statement  for statement while ( condition ) statement; do { statement list; } while ( condition.
Introduction to Loops For Loops. Motivation for Using Loops So far, everything we’ve done in MATLAB, you could probably do by hand: Mathematical operations.
Tips & Tricks From your fellow SAS users 9/30/2004.
Chapter 17: Formatting Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS Essentials - Elliott & Woodward1.
YET ANOTHER TIPS, TRICKS, TRAPS, TECHNIQUES PRESENTATION: A Random Selection of What I Learned From 15+ Years of SAS Programming John Pirnat Kaiser Permanente.
1 Statistical Software Programming. STAT 6360 –Statistical Software Programming Modifying and Combining Datasets For most tasks we need to work with multiple.
Controlling Input and Output
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
Functions Functions, locals, parameters, and separate compilation.
“LAG with a WHERE” and other DATA Step Stories Neil Howard A.
BMTRY 789 Lecture 6: Proc Sort, Random Number Generators, and Do Loops Readings – Chapters 5 & 6 Lab Problem - Brain Teaser Homework Due – HW 2 Homework.
Do not put content on the brand signature area NOBS for Noobs David B. Horvath, CCP, MS PhilaSUG Winter 2015 Meeting NOBS for Noobs.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
Chapter 14: Combining Data Vertically 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Basic SAS Functions in Version 8.2 Kim Michalski Office of the Actuary Rick Andrews Office of Research, Development, and Information.
Longitudinal Data Techniques: Looking Across Observations Ronald Cody, Ed.D., Robert Wood Johnson Medical School.
Chapter 11 Reading SAS Data
Types CSCE 314 Spring 2016.
Applied Business Forecasting and Regression Analysis
Chapter 6: Modifying and Combining Data Sets
Chapter 7 Multidimensional Arrays
Chapter 4: Using Lookup Tables to Match Data: Arrays
Chapter 3: Working With Your Data
Lecture 07 More Repetition Richard Gesick.
Chapter 5: Using DATA Step Arrays
Chapter 13: Creating Samples and Indexes
Siti Nurbaya Ismail Senior Lecturer
By Don Henderson PhilaSUG, June 18, 2018
Chapter 22 Reading Hierarchical Files
After ANOVA If your F < F critical: Null not rejected, stop right now!! If your F > F critical: Null rejected, now figure out which of the multiple means.
SAS Essentials How SAS Thinks
Defining and Calling a Macro
Introduction to DATA Step Programming SAS Basics II
SAS Basics: Statement and Data Set
How are your SAS Skills? Chapter 1: Accessing Data (Question # 1)
3 Iterative Processing.
Writing Functions( ) (Part 4)
Maths Unit 1 - Algebra Order of operations - BIDMAS Algebraic Notation
Chapter 7 Multidimensional Arrays
Using C++ Arithmetic Operators and Control Structures
Hans Baumgartner Penn State University
Introduction to SAS Essentials Mastering SAS for Data Analytics
REPETITION Why Repetition?
Finding Statistics from Data
Presentation transcript:

SET statement in DATA step Based on S. David Riba’s The Set statement and beyond: Uses and Abuses of the SET statement

Simple SET statement *Simple Set statement; data temp1b; set temp1; run; *Concatenate; data temp1x3; set temp1 temp1 temp1; *Interleave; data temp12a; set temp1 temp2;by i; *Combine; data temp12b; set temp2; *Attach first observation of temp1 to all observations of temp2; data temp12c; if (_n_ eq 1) then set temp1;

Data set options Inside SET statement /*Data step options in SET Statement DROP = varlist KEEP = varlist FIRSTOBS = num IN = var OBS = num RENAME = varlist WHERE = condition */ *Combine data with itself to calculate change in a variables' value; data temp3; set temp1 ( keep = x ) ; set temp1 ( firstobs = 2 rename = ( x=frwx ) ); delta = x-frwx; run; *The IN = data set option is used with multiple data sets where it is important to know which data set contributed an observation; data temp4 ; set temp1 ( in = in_1 ) temp2 ( in = in_2 ) ;by i; if ( in_1 ) then x2=x**2 ; else if ( in_2 ) then yexp=exp(y); DATA temp5 ; Set temp1 ( where = ( x>.5 ) ) temp2 ( where = ( y<.5 ) ) ;

SET statement options /*SET statement OPTIONS END = var KEY = index NOBS = var POINT = var */ *END statement; *The END = option is used to identify the last observation processed by a SET statement.; data temp6; set temp1 end = eof ; set temp2; if ( eof ) then do ; lx=x; ly=y; end; run;

can be either a simple key or a composite key; *KEY statement; *The KEY = option retrieves observations from an indexed data set based on the index key, which can be either a simple key or a composite key; data pan1; do i=1 to 30; k=i; x=rand('unif'); output; end; run; data pan2(index=(k)); do i=1 to 20; k=i+10; y=rand('unif'); data pan3 ; set pan1; set pan2 key = k; xymax=max(x,y);

*NOBS statement; *The NOBS = option creates a variable which contains the total number of observations in the input data set(s). If multiple data sets are listed in the SET statement, the value in the NOBS = variable are the total number of observations in all the listed data sets.; *use a data set if nobs is what you want; data temp7; if (0) then set temp1 ( drop=_ALL_ ) temp2 ( drop=_ALL_ ) nobs=totobs; if ( totobs ) then set temp1 temp2; else abort ; run; *just figure out the nobs of your data set; data _null_; call symput( 'n_obs' , put ( n_obs, 5. ) ) ; stop; set temp1 temp2 nobs = n_obs; %put &n_obs;

*POINT statement; *The POINT = option uses a numeric variable for direct (or random) access into a SAS data set. The value of the POINT = variable must be specified before it can be used.; *use the third observation; data temp8; ptr = 3; set temp1 point = ptr ; if ( _error_ ) then abort ; output; stop; run; *reverse the order of your data; data temp9; do ptr = lastrec to 1 by -1 ; set temp1 point = ptr nobs = lastrec ; end;

*Random replicates of data set; data john1; do i = 1 to 20; x=rand('unif'); output; end; run; data john2; do _i_ = 1 to 10; ptr = ceil ( totobs * ranuni ( totobs ) ) ; set john1 point = ptr nobs = totobs ; if ( _error_ ) then abort ; stop;

*Replicates of observations; data kevin1; do i=1 to 10; start=1; stop=i; output; end; run; data kevin2; x=rand('unif'); data kevin3; set kevin1; do ptr = start to stop ; set kevin2 point = ptr ; if ( _error_ ) then abort ;

*Input Min, Max, Sum etc. in your data set; data voytek; retain minval maxval sumval ; if ( _N_ eq 1 ) then do until (lastrec) ; set temp1 (keep = x) end = lastrec; minval = min ( minval, x ) ; maxval = max ( maxval, x ) ; sumval = sum ( sumval, x ) ; end; set temp1 ; run;