SAS OnlineTutor HomeFAQ PageSuggested Learning PathsContents+Searchback||next

Reading Free-Format Data
Reading Missing Values


Reading Missing Values at the Beginning or Middle of a Record

It's important to remember that the MISSOVER option only works for missing values that occur at the end of the record. A different method is required when using list input to read raw data that contains missing values at the beginning or middle of a record. To handle such data, you need to make sure that missing values can be recognized by the DATA step. For example, you may need to edit a raw data data file to replace a missing value with a period (.) instead of a blank.

Let's take a closer look at what happens when a missing value occurs at the beginning or middle of a record.

Suppose the value for Age is missing in the second record.


1---+----10---+----20
MALE 27 1 8 0 0  
FEMALE    3 14 5 10  
FEMALE 34 2 10     
MALE 35 2 12 4 8  
FEMALE 36 4 16 3 7  


When the program below executes, each field in the raw data file is read one by one. The INPUT statement specifies that six data values be read from each record. However, the second record contains only five values.

     data perm.survey;
        infile credit;
        input Gender $ Age Bankcard FreqBank 
              Deptcard FreqDept;
     run;
     proc print data=perm.survey;
     run; 

To find the sixth value in the second record, the column pointer goes to the next record and tries to read FEMALE as the value for FreqDept. But FEMALE is a character value and FreqDept is a numeric variable. Thus, an invalid data error occurs and the value for FreqDept is set to missing.


1---+-V--10---+----20
MALE 27 1 8 0 0  
FEMALE    3 14 5 10  
FEMALE 34 2 10     
MALE 35 2 12 4 8  
FEMALE 36 4 16 3 7  


Program Data Vector

At the end of the DATA step, the program loops back to the top and reads the next record. In this case, the next record is the fourth record. The SAS log shows that only four observations are created.


1---V----10---+----20
MALE 27 1 8 0 0  
FEMALE    3 14 5 10  
FEMALE 34 2 10     
MALE 35 2 12 4 8  
FEMALE 36 4 16 3 7  


SAS Log
NOTE: Invalid data for FreqDept in line 3 1-6.
RULE:     ----+----1----+----2----+----3----+----4--
3         FEMALE 34 2 10 3 3 18
Gender=FEMALE Age=3 Bankcard=14 FreqBank=5 Deptcard=10
FreqDept=. _ERROR_=1 _N_=2
NOTE: 5 records were read from the infile MISSING.
      The minimum record length was 15.
      The maximum record length was 19.
NOTE: SAS went to a new line when INPUT statement reached
      past the end of a line.
NOTE: The data set WORK.SURVEY has 4 observations and
      6 variables.


PROC PRINT output shows the incorrect values for the second record. Note that the values for the third record do not appear.


Obs Gender Age Bankcard FreqBank Deptcard FreqDept
1 MALE 27 1 8 0

0

2 FEMALE

3

14 5 10

.

3 MALE 35 2 12 4

8

4 FEMALE 36 4 16 3

7



When the same program is resubmitted on the edited file, the data values are read into the correct variables, and the missing values are marked as missing in the data set.


1---+----10---+----20
MALE 27 1 8 0 0  
FEMALE  . 3 14 5 10  
FEMALE 34 2 10     
MALE 35 2 12 4 8  
FEMALE 36 4 16 3 7  


Obs Gender Age Bankcard FreqBank Deptcard FreqDept
1 MALE 27 1 8 0

0

2 FEMALE

.

3 14 5

10

3 FEMALE 34 2 10 3

3

4 MALE 35 2 12 4

8

5 FEMALE 36 4 16 3

7



back||next


Copyright © 2002 SAS Institute Inc., Cary, NC, USA. All rights reserved.

Terms of Use & Legal Information | Privacy Statement