16 May 2014

Dealing with Missing values in SAS



Before knowing about the missing values in SAS, first we need to know, what is SAS? And what is a dataset? SAS is statistical analysis system, which is used to analyse the data and to generate a report. Dataset is a special structured file which is used to store the data within the SAS environment. These values are in the form of character values and numeric values and these values are utilised to arrange the data in a structured format to be analysed to generate a finalised report.

Data usually contains values and these values can be in the form of real characters or numeric values. Sometimes within the data there might be missing values, and these missing values are not utilized for analysis. There are different types of missing values in the data and their representation is different in the dataset, they are as follows:

Numeric missing values are represented by a single decimal point, whereas character missing values are represented by single blank space. And the special missing values are represented by decimal point followed by a letter or a decimal point followed by an underscore.

How to identify a missing value in a variable?

There are different syntax’s to identify missing values in a variable, because there are different types of missing values in a variable. They are as follows:

If the programmer wants to know how many missing values are present in a variable, programmer can use NMISS function to identify total missing values in a variable, where this function is only used to know how many missing values are present in variable.

To extract only missing values from the numerical data, the programmer can use conditional statements to extract the missing values. Here is the code to completely extract the numerical missing data value from the variable.

Syntax: If < variable name > = . then output ;

And to extract the character missing values from the character variable, conditional statement should be used with blank value specified within the quotes. Here is the code to extract the character missing data values from the variable.

Syntax: If < Variable name >=” ” then output ;

If the data contains special missing values that can be checked by normal condition that is used for character missing values, or that can also be checked by conditional statement where missing value is written with dot and character value. Here is an example for that code:

Syntax: If < Variable name >lessthan .z then output ;

Sometimes the programmer has no idea whether the missing value is a numerical variable or a character variable, then he/she can utilize one of the function “Missing” that works for both numeric and character variables. Here is an example to utilize that function.

Syntax: If Missing ( < Variable name > ) then do ;

How to handle missing values while updating a dataset:

Handling of datasets should be in a proper order, while updating two datasets; first data set is taken as a master dataset, where as second dataset is taken as transaction dataset, this is a dataset that just follows the master dataset. During this process of update, the two datasets should have matching variable names and non-matching variable names will be added to the master dataset. This update not only depends on variable names but also depends on observations. So while updating the two datasets, there must be common values too. If uncommon values are present they get added to the master dataset.
During the update process, if transaction dataset has missing values, these missing values will not be updated to the master dataset because during update process it doesn’t update the missing values, which is been stopped by the “UPDATEMODE” of the update concept. But if the programmer really needs to update the data with missing values also, then he/she can change the update mode, not to check for any missing values while updating. This option is always written in this syntax.

“UPDATEMODE= nomissingcheck”

Conclusion: Thus, to deal with missing values, the programmer has to first know what type of missing values are present, and how he/she can control them at different situations of analysis.

Clinnovo is a clinical innovation company. It is pioneer CRO industry in India. Clinnovo offers professional clinical research course , clinical data management course , SAS Courses and imaging training. Clinnovo has been serving different bio-pharma industries across the world with excellence and high quality. For more information contact at +91 9912868928, 040 64635501

No comments:

Post a Comment