25 April 2014

Solving Problems Using SQL Procedure While Merging Data Files

SAS is statistical analysis software that is used not only for analysis of data, but also to manage the data into datasets. Managing datasets is a huge process, and sometimes programmers have to perform merging of many datasets into one single dataset and then utilizing that dataset for the generation of reports.

To do all the functionalities in SAS, usually base SAS is used. But when trying to merge datasets, programmers might face certain problem in managing the datasets together that is, merging the datasets into single dataset using base SAS procedure. For such type of merging, instead of using the merge concept of base SAS, the programmer can use the SQL procedure to merge the datasets together. Because SQL has different options for merging, like one to one merging and one to many merging.

When this merging concept is used in SQL, depending on one key variable, datasets can be merged together. But if merging is used in Base SAS, merging is done based on one common variable, that is same variable names should be present in both datasets. If variables names are different in the datasets then any one of dataset variable names should be changed, so that it will match to the merging variable of the next dataset.

Here are some of the examples of merging using SQL:

One to one merging concept:


In this concept of merging the datasets, SQL merge depends on a key variable, and also depends on total number of observations. In this concept observations are joined together. Let’s consider an example, if we take two datasets, first dataset having 3 variables and 3 observations and second dataset having 4 variables and 3 observations, but from these two datasets, one key variable are considered for merging, but it’s not mandatory that two key variable names should be common, because SQL merge can work on different variable names also. Here is the example to merge using SQL.

Syntax to write the SQL procedure for one to one merge:

Proc sql ;
Create table < Newtablename >
as select < * >
from < 1st dataset > , < 2nd dataset >
where < 1st dataset . variable = 2nd dataset . variable >
order by < 1st dataset . variable > ;
quit ;


Explanation: In this SQL procedure “create” statement will create a new table, where star operator (*) will call all the values from 1st dataset and 2nd dataset. But “where” statement is used to merge the observation from both the datasets depending on key variables where by each observation is taken and depending on the common values, each line of observation is added to one another.

One to Many merge concept:

In one to many merge concept, SQL merge depends on key variable, and also depends on total number of observations. For example if we consider two datasets in which 1st dataset has 3 variables and 3 observations, whereas in the 2nd dataset 3 variables and 9 observations, where key variable observation will be common and they are in multiples of 3 that is one value is repeated 3 times. For merging one observation of 1st dataset with common value of 3 observations of 2nd dataset, the following syntax can be used:

Syntax to write the SQL procedure for one to many merge:

Proc sql ;
Create table < New dataset name > as
Select < * >
Form  < 1st dataset > , < 2nd dataset >
Where < 1st dataset . variable = 2nd dataset . variable >
order by < 1st dataset . variable name> , < 2nd dataset .  variable name> ;
quit;

Explanation:
In this SQL procedure “create” statement will create a new table, where star operator (*) will call all the values from 1st dataset and 2nd dataset. But “where” statement is used to merge the observation from both the datasets depending on key variables whereby, one observation of first dataset is merged with many common observations of the second dataset.

Conclusion: Using SQL process, merging of datasets can be done very easily without depending on common variable names of the datasets. But if merging concept is used in base SAS, merging dataset variables should have at least one common variable name, where as taking same common variables can be avoided in SQL process while merging datasets. If datasets has different variable names also merging can be done easily using SQL.


Clinnovo is a clinical innovation company. It is pioneer CRO industry in India. Clinnovo offers professional Clinical Research Course, clinical data management course , SAS and imaging training. Clinnovo has been serving different bio-pharma industries across the world with excellence and high quality. For more information contact at +91 9912868928, 040 64635501

No comments:

Post a Comment