11 March 2014

How to merge several CSV files with same format into a single SAS dataset



SAS is statistical analysis software, used to analyse data and generate standard report files using the data. Before applying any of the analysis processes to the data values, these values must be present in the SAS environment as a dataset. Talking about the data values, most of the programmers get the data values in different formats, for example; the data values might be saved in an excel sheet in normal format or might be saved in an CSV (Comma separated values) file.

CSV is a comma separated values file, most of the applications support CSV file format when transferring data from one application to another. For example, a database program stores the data in a proprietary format, and when a user transfers this data to a spreadsheet, the database most likely will export its data into a spreadsheet as CSV file format. Internally these values are saved in a spreadsheet as comma separated values and this spreadsheet will be saved in the CSV format.

Usually, a SAS programmer can import successfully only one comma separated values file into SAS environment and can save it as a single dataset. But when the programmer wants to import several CSV files with same format into SAS and store them into a single dataset, then it becomes a huge difficulty as SAS can import only one file at a time and save that as a new dataset each time.

However, there are certain steps to import several CSV files with same format into single SAS dataset:

First Step: The first and foremost step is to keep all the CSV files that are ready to be imported into SAS in a specific folder on the local system, which makes the work of importing process easier.

Second Step: Now assigning the current location folder path to SAS environment which allows taking the data from that specific folder. There are three methods to change the current folder location in SAS environment, they are as follows

Method1: This current location folder of SAS can be changed by using menu bar within the SAS environment. Within the menu bar, in tools options, clicking on options menu will open a new pop-up menu, where the option to change location of the folder is present. Clicking on that will change the current location of the SAS environment.

Method2: Current location of the SAS environment can be changed during the SAS session, by double clicking on the current folder in the status bar.

Method3: Current location of SAS directory can be changed by using SAS code;

 x 'cd C:\Users\csvfolder';

Explanation: current directory (CD) command can be given with “X” statement which can 
change the current location of SAS environment.

Third Step: After the path is set, the next step is to call all the CSV files into SAS reference name for a temporary period of time, because all the CSV files cannot be called into a single dataset at a time. So by applying FILENAME statement all the CSV files should be called into a reference name which is given in the filename statement.
For example, following statement calls all the CSV files into a reference name.

filename csv ('*.csv');

Explanation: “csv” is the file reference name which is given with FILENAME statement, this can call all the files with “*.csv” as file extension. By running this code will call all CSV files that are present in the specific folder into a reference name “csv”.

Forth step: Now after calling all CSV files into reference name, the final step is to write the Import procedure, where calling all the CSV files which are present in the reference name of filename statement.

For example, following import procedure calls all the files into a single dataset.

proc import out=work.allcsv
datafile = csv DBMS=CSV REPLACE;
GETNAMES=yes;
run;

Explanation: Using Import procedure, importing all the CSV files with same file format into a single dataset name “work.allcsv”. But when importing all the CSV files into a single SAS dataset, within the import procedure datafile name should be given the same name of the file reference name, so that all the files are referred from the temporary filename statement.

Conclusion: Thus, using simple steps in SAS, that is changing the current location of SAS and then applying filename statement, and then finally using import procedure, several CSV files can be imported into a single SAS dataset very easily.

Clinnovo is a clinical innovation company. It is pioneer CRO industry in India. Clinnovo offers professional clinical research course, clinical data management course, SAS Courses and Imaging Training. Clinnovo has been serving different bio-pharma industries across the world with excellence and high quality. For more information contact at +91 9912868928, 040 64635501.

No comments:

Post a Comment