Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”,

Slides:



Advertisements
Similar presentations
How to Grade Wikis Ways to look for and grade evidence of collaboration & build strong partnerships.
Advertisements

Research Methods Lecture 3 More STATA Ian Walker Room S2.109   Slides available at:
Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.
Writing Reader-Focused Letters, Memos, and
I OWA S TATE U NIVERSITY Department of Animal Science Modifying and Combing SAS Data Sets (Chapter in the 6 Little SAS Book) Animal Science 500 Lecture.
1 SAS Formats and SAS Macro Language HRP223 – 2011 November 9 th, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
1 Merging with SQL HRP223 – 2011 October 31, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
1 Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”,
Hash vs Join A case study evaluating the use of the data step hash object to replace a SQL join Geoff Ness Sep 2014.
WRDS User Guide West Virginia University. Three Ways of Working with WRDS Web – Based PC – SAS The WRDS UNIX server will be accessed using SSH Secure.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
New Mexico Computer Science For All Statements and Expressions in NetLogo Maureen Psaila-Dombrowski.
1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”
Introduction to Standard Reports. Standard Reports 2 How to get information out of AQS Standard Reports Site / Monitor Metadata Detail Data Reports “
1 Performing Spreadsheet What-If Analysis Applications of Spreadsheets.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Ts_print IN A FEW EASY STEPS. C L E A N, Q U A L I T Y D A T A F O R E X C E L L E N C E I N R E S E A R C H ts_print is CRSP’s flexible report writer.
WRDS CCM User Guide West Virginia University. CRSP/Compustat Merged (CCM) CCM is comprised of CRSP and Compustat® data together with the link between.
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
Grant Brown.  AIDS patients – compliance with treatment  Binary response – complied or no  Attempt to find factors associated with better compliance.
1 Efficient SAS Coding with Proc SQL When Proc SQL is Easier than Traditional SAS Approaches Mike Atkinson, May 4, 2005.
Using Weighted Data Donald Miller Population Research Institute 812 Oswald Tower, December 2008.
CollegeBoard SAT Online Course Student Registration.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Welcome We've developed this visual, interactive training guide to help you understand the Drug and Alcohol Treatment Waiting Times Database. It provides.
Slide 9.1 Confirmatory Factor Analysis MathematicalMarketing In This Chapter We Will Cover Models with multiple dependent variables, where the independent.
Lesson 1.4 Equations and Inequalities Goal: To learn how to solve equations and check solutions of equations and inequalities.
14b. Accessing Data Files in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Davisware GlobalEdge 2008 Payroll Main Menu Time Entry and Payroll Processing.
VB Conditionals If Then, Select Case. If Then Useful computer programs typically have to make a lot of decisions. In VB, If…Then code is used for decision.
Chapter 11: Sequential File Merging, Matching, and Updating Programming Logic and Design, Third Edition Comprehensive.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
Oracle sql Online Training By SMART MIND ONLINE TRAINING Website:
THE POWER OF A POINT. WHAT’S THE POINT? Use PowerPoint software to create informative or picture slides that summarize information effectively and efficiently.
Chapter 6: Modifying and Combining Data Sets
Getting Started with R.
Arithmetic operations & assignment statement
ECONOMETRICS ii – spring 2018
Chapter 18: Modifying SAS Data Sets and Tracking Changes
Match-Merge in the Data Step
R Data Manipulation Bootstrapping
SAS Essentials How SAS Thinks
Introduction to SAS A SAS program is a list of SAS statements executed in order Every SAS statement ends with a semicolon! SAS statements can be in caps.
Do Now 1) t + 3 = – 2 2) 18 – 4v = 42.
The RGB LED.
Example 1: Finding Solutions of Equations with Two Variables
Claire Osgood November 2017
3 Iterative Processing.
Lisa Mendez, PhD & Andrew Kuligowski
CSCI N317 Computation for Scientific Applications Unit R
Solving Equations Containing Decimals
Lab 2 and Merging Data (with SQL)
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
3-Variable K-map AB/C AB/C A’B’ A’B AB AB’
Introduction to SAS Essentials Mastering SAS for Data Analytics
Appending and Concatenating Files
Solving Equations with Variables on Both Sides
Solving Equations with Variables on Both Sides
Inverse of a Matrix Solving simultaneous equations.
Data Manipulation (with SQL)
1. How do I Solve Linear Equations
Using the CRSP/Compustat Merged Database (CCM)
Presentation transcript:

Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”, “SAS Language Reference: Dictionary” > “Data step options” > “IN=“ In the slides, the red data goes into the merged data set. The greyed out observations are left out.

The perfect merge Dataset A Dataset B ID V1 V2 V3 V4 1 123 343 2 421 434 85 4234 3 129 436 325 4 122 767 763 234 5 232 34 229 324 6 534 435 554 7 89 884 8 6787 895 342

Not so perfect (if a or b;) Dataset A (in=a) Dataset B (in=b) ID V1 V2 V3 V4 1 343 2 421 434 85 4234 3 129 436 4 122 767 763 234 5 229 324 6 534 435 554 7 89 8 6787 895 342

If a=b; (both datasets contribute) Dataset A (in=a) Dataset B (in=b) ID V1 V2 V3 V4 1 343 2 421 434 85 4234 3 129 436 4 122 767 763 234 5 229 324 6 534 435 554 7 89 8 6787 895 342

If a; (must be in dataset A) Dataset A (in=a) Dataset B (in=b) ID V1 V2 V3 V4 1 343 2 421 434 85 4234 3 129 436 . 4 122 767 763 234 5 229 324 6 534 435 554 7 89 8 6787 895 342

If b; (must be in dataset B) Dataset A (in=a) Dataset B (in=b) ID V1 V2 V3 V4 . 1 343 2 421 434 85 4234 3 129 436 4 122 767 763 234 5 229 324 6 534 435 554 7 89 8 6787 895 342

Notes The examples assume there is a unique identifier. This can be either one variable (ex, CRSP's PERMNO or Compustat's GVKEY) or more than one variable (for example, PERMNO and DATE for a panel dataset). Assumption: Both data sets are sorted by the unique identifier(s).

Sample code

Typical problems If both datasets were complete (they both have the same observed units, then the IF statements would be unnecessary; "if a and b" would be equivalent to leaving the statement out altogether) If you do not have a BY statement (no identifier -- you somehow know that each row of one datasets corresponds to the same one row in the other dataset), the datasets are just "glued" side-by-side. Common mishaps: the by variables have different formats across datasets, SAS will merge the datasets, but will put a WARNING in the log. Another common mishap is to have variables with the same name (that are not the ID) -- one of the will be overwritten.

References Good references are http://ftp.sas.com/techsup/download/technote/ts644.html and a manual called "Combining and modifying SAS data sets: examples", which is in the RC library. It has a lot of example. Unfortunately, it does not exist in an online version (only the code is available, but the explanations are very good).