Download presentation
Presentation is loading. Please wait.
Published byJean Williams Modified over 6 years ago
1
Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”, “SAS Language Reference: Dictionary” > “Data step options” > “IN=“ In the slides, the red data goes into the merged data set. The greyed out observations are left out.
2
The perfect merge Dataset A Dataset B ID V1 V2 V3 V4 1 123 343 2 421
434 85 4234 3 129 436 325 4 122 767 763 234 5 232 34 229 324 6 534 435 554 7 89 884 8 6787 895 342
3
Not so perfect (if a or b;)
Dataset A (in=a) Dataset B (in=b) ID V1 V2 V3 V4 1 343 2 421 434 85 4234 3 129 436 4 122 767 763 234 5 229 324 6 534 435 554 7 89 8 6787 895 342
4
If a=b; (both datasets contribute)
Dataset A (in=a) Dataset B (in=b) ID V1 V2 V3 V4 1 343 2 421 434 85 4234 3 129 436 4 122 767 763 234 5 229 324 6 534 435 554 7 89 8 6787 895 342
5
If a; (must be in dataset A)
Dataset A (in=a) Dataset B (in=b) ID V1 V2 V3 V4 1 343 2 421 434 85 4234 3 129 436 . 4 122 767 763 234 5 229 324 6 534 435 554 7 89 8 6787 895 342
6
If b; (must be in dataset B)
Dataset A (in=a) Dataset B (in=b) ID V1 V2 V3 V4 . 1 343 2 421 434 85 4234 3 129 436 4 122 767 763 234 5 229 324 6 534 435 554 7 89 8 6787 895 342
7
Notes The examples assume there is a unique identifier. This can be either one variable (ex, CRSP's PERMNO or Compustat's GVKEY) or more than one variable (for example, PERMNO and DATE for a panel dataset). Assumption: Both data sets are sorted by the unique identifier(s).
8
Sample code
9
Typical problems If both datasets were complete (they both have the same observed units, then the IF statements would be unnecessary; "if a and b" would be equivalent to leaving the statement out altogether) If you do not have a BY statement (no identifier -- you somehow know that each row of one datasets corresponds to the same one row in the other dataset), the datasets are just "glued" side-by-side. Common mishaps: the by variables have different formats across datasets, SAS will merge the datasets, but will put a WARNING in the log. Another common mishap is to have variables with the same name (that are not the ID) -- one of the will be overwritten.
10
References Good references are
and a manual called "Combining and modifying SAS data sets: examples", which is in the RC library. It has a lot of example. Unfortunately, it does not exist in an online version (only the code is available, but the explanations are very good).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.