Merging SAS Data Sets |
Lesson
Summary![]() ![]() |
This page contains
|
I. Text Summary
To go to the page where a task, programming feature, or concept was presented, select a link. |
Results of Match-Merging Basic DATA step match-merging produces an output data set that contains values from all observations in all input data sets. If an input data set doesn't have any observations for a given value of the common variable, the observation in the output data set contains missing values for the variables that are unique to that input data set. |
Preparing Data for Match-Merging To prepare for match-merging, you first reference all SAS libraries where input data sets are stored, or where you will store the results of the match-merge. Then you determine whether your input data sets are in the appropriate form to be merged. All data sets to be merged must be sorted or indexed by the values of a common variable with the same type and length in all data sets to be merged. If your data are not appropriately sorted or indexed, you can use PROC SORT to sort data before match-merging. You can sort the existing data set permanently or create a separate, sorted output data set. |
Performing a Basic Match-Merge After all input data sets are sorted or indexed by the value of a BY variable, you can merge the data sets using a DATA step that contains MERGE and BY statements. |
Renaming Variables If variables with the same name appear in more than one input data set, DATA step match-merging overwrites values of the like-named variable in the first data set in which it appears with values of the like-named variable in subsequent data sets. To prevent overwriting, you can rename variables using the RENAME= option in the MERGE statement. |
Excluding Unmatched Observations By default, DATA step match-merging combines all observations in all input data sets. However, you may want to select only observations that match for two or more specific input data sets. To exclude unmatched observations from your output data set, you can use the IN= data set option and the subsetting IF statement in your DATA step. The IN= data set option creates and names a variable that indicates whether the data set contributed data to the current observation. The subsetting IF statement checks the IN= values and outputs only those observations that appear in the data sets for which IN= is specified. |
Selecting Variables To select variables to drop or keep in your output data set, you use the DROP= and KEEP= data set options. You can use the KEEP= option instead of the DROP= option if more variables are dropped than kept. You can specify these options in either the DATA statement or the MERGE statement, depending on whether or not you want to process values of the variables in that DATA step. If you don't process certain variables and you don't want them to appear in the new data set, specify them in the DROP= option in the MERGE statement. If you do need to process a variable in the original data set, you must specify the variable in the DROP= option in the DATA statement. Used in the DATA statement, the DROP= option simply drops the variables from the new data set. However, they are still read from the original data set and are available within the DATA step. |
Match-Merge Processing To predict the results of match-merging correctly, you need to understand how the DATA step processes data in match-merges. Compiling |
|
II. Syntax
To go to the page where a statement or option was presented, select a link. |
![]() |
You can use the KEEP= option instead of the DROP= option if more variables are dropped than kept. |
|
III. Sample Program
libname clinic 'c:\stress\labdata'; data clinic.combined(drop=id); merge clinic.demog(in=indemog rename=(date=BirthDate)) clinic.visit(drop=weight in=invisit rename=(date=VisitDate)); by id; if indemog and invisit; run;
|
IV. Points to Remember
|
![]() |
![]() |
Copyright © 2002 SAS Institute Inc., Cary, NC, USA. All rights reserved.