Presentation is loading. Please wait.

Presentation is loading. Please wait.

Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”,

Similar presentations


Presentation on theme: "Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”,"— Presentation transcript:

1 Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”, “SAS Language Reference: Dictionary” > “Data step options” > “IN=“ In the slides, the red data goes into the merged data set. The greyed out observations are left out.

2 The perfect merge Dataset A Dataset B ID V1 V2 V3 V4 1 123 343 2 421
434 85 4234 3 129 436 325 4 122 767 763 234 5 232 34 229 324 6 534 435 554 7 89 884 8 6787 895 342

3 Not so perfect (if a or b;)
Dataset A (in=a) Dataset B (in=b) ID V1 V2 V3 V4 1 343 2 421 434 85 4234 3 129 436 4 122 767 763 234 5 229 324 6 534 435 554 7 89 8 6787 895 342

4 If a=b; (both datasets contribute)
Dataset A (in=a) Dataset B (in=b) ID V1 V2 V3 V4 1 343 2 421 434 85 4234 3 129 436 4 122 767 763 234 5 229 324 6 534 435 554 7 89 8 6787 895 342

5 If a; (must be in dataset A)
Dataset A (in=a) Dataset B (in=b) ID V1 V2 V3 V4 1 343 2 421 434 85 4234 3 129 436 . 4 122 767 763 234 5 229 324 6 534 435 554 7 89 8 6787 895 342

6 If b; (must be in dataset B)
Dataset A (in=a) Dataset B (in=b) ID V1 V2 V3 V4 . 1 343 2 421 434 85 4234 3 129 436 4 122 767 763 234 5 229 324 6 534 435 554 7 89 8 6787 895 342

7 Notes The examples assume there is a unique identifier. This can be either one variable (ex, CRSP's PERMNO or Compustat's GVKEY) or more than one variable (for example, PERMNO and DATE for a panel dataset). Assumption: Both data sets are sorted by the unique identifier(s).

8 Sample code

9 Typical problems If both datasets were complete (they both have the same observed units, then the IF statements would be unnecessary; "if a and b" would be equivalent to leaving the statement out altogether) If you do not have a BY statement (no identifier -- you somehow know that each row of one datasets corresponds to the same one row in the other dataset), the datasets are just "glued" side-by-side. Common mishaps: the by variables have different formats across datasets, SAS will merge the datasets, but will put a WARNING in the log. Another common mishap is to have variables with the same name (that are not the ID) -- one of the will be overwritten.

10 References Good references are
and a manual called "Combining and modifying SAS data sets: examples", which is in the RC library. It has a lot of example. Unfortunately, it does not exist in an online version (only the code is available, but the explanations are very good).


Download ppt "Merging in SAS These slides show alternatives regarding the merge of two datasets using the IN data set option (check in the SAS onlinedoc > “BASE SAS”,"

Similar presentations


Ads by Google