25 March 2014

How to merge different dataset observations in SAS using Merge concept?

SAS is statistical analysis system, used for analysis of the data and generate the data into a standard report. To do the analysis of the data, the data should be imported into SAS environment and saved in the form of a dataset. A SAS dataset is special structure file that contains columns and rows.

These columns and rows are used to store the data; the data that is stored in the SAS environment is stored in the form of tabular structure, this table which contains data within it is called as a dataset. Sometime to manage these datasets together, programmer has to join these datasets together into a single dataset, so that he/she can use it for further analysis. To merge these datasets merging concept is used. The basic idea of this merge is to combine the values/contents of two or more datasets that contain different variables and might have nearly the same number of observations. There are different methods of merging like one to one merge, one to many merge and many to many merge. However compared to all other methods in this merging concept, many to many merge concept is a very difficult process. To make this process simple, the following statement can be used in merging. POINT = keyword on the SET statement is used to select each observation correct number of times which required pre-processing of code.

Merging of datasets can be done using two methods that is by using SQL processing or by using many to many merge concepts and many to many merge concept is the best method apart from using SQL. But while using many to many merge concept programmers will face a huge difficulty in merging because sometimes there might be some miss matching values of the data which is not taken for merging. Here is the explanation to solve problems when many to many merge is done.

While merging the datasets, at least one common variable should be present within the datasets and common values should be present in the same common variable, because merging concept purely depends on common values and at least one common variable. Here in merge concept the data is merged depending on observations, so that’s the reason the data should be properly sorted out using sorting procedure, so that while merging the common values, these values will be in the proper sorted order. Then by using “start pointer” and an “end pointer” which will indicate the starting observation and ending observation of common variable can be used for accepting each observation line for merging.

Here is the syntax using pointers:

by < common_variable >;
retain start end 0;
if first.< common_variable > then start=_n_; 
(This statement is used to call the starting observation number of the common variable)
if last. < common_variable > then do; (This statement is used to call the last observation number of the common variable)
end=_n_;    (observation number of end for common_variable);
output;    (This output statement is used for 1 observation per common_variable;
end;

Explanation: Using By statement, a common variable from the datasets can be taken, while retain statement is used to call all the variables value for each iteration of data step.

Now using these above statements, the observations are called into new datasets and after calling all the individual observations using pointers, the datasets can be merged using merge concept easily. These statements are mainly important while merging many observations with many observations, and this concept of merging is called many to many merge.

Here is the syntax to merge the datasets:

merge < dataset1 (in=m1)  dataset2 (in=m2) > ;
   by < Common Variable >;
   if m1 and m2;
run;


Now using the pointers to merge the observation of dataset1 and dataset2;

set < dataset1 >;
do i=start to end;
set < dataset2 > point=i;
output;
end;
run;


Using these statements in a new dataset, observations can be merged easily using merging concept.

Conclusion: This concept of merging makes programmer works easier than using SQL language to merge the datasets.


Clinnovo is a clinical innovation company. It is pioneer CRO industry in India. Clinnovo offers professional clinical research course , clinical data management course , SAS Courses and imaging training. Clinnovo has been serving different bio-pharma industries across the world with excellence and high quality. For more information contact at +91 9912868928, 040 64635501

No comments:

Post a Comment