Reading Free-Format Data | |
Reading Missing Values |
Reading Missing Values at the Beginning or
Middle of a Record
It's important to remember that the MISSOVER option only works for missing values that occur at the end of the record. A different method is required when using list input to read raw data that contains missing values at the beginning or middle of a record. To handle such data, you need to make sure that missing values can be recognized by the DATA step. For example, you may need to edit a raw data data file to replace a missing value with a period (.) instead of a blank. Let's take a closer look at what happens when a missing value occurs at the beginning or middle of a record.
Suppose the value for |
MALE 27 1 8 0 0
FEMALE 3 14 5 10
FEMALE 34 2 10
MALE 35 2 12 4 8
FEMALE 36 4 16 3 7
When the program below executes, each field in the raw data file is read one by one. The INPUT statement specifies that six data values be read from each record. However, the second record contains only five values. |
data perm.survey; infile credit; input Gender $ Age Bankcard FreqBank Deptcard FreqDept; run; proc print data=perm.survey; run; |
To find the sixth value in the second record, the column
pointer goes to the next record and tries to read
FEMALE as the value for FreqDept .
But FEMALE is a character value and FreqDept
is a numeric variable. Thus, an invalid data error
occurs and the value for FreqDept is set to missing.
|
MALE 27 1 8 0 0
FEMALE 3 14 5 10
FEMALE 34 2 10
MALE 35 2 12 4 8
FEMALE 36 4 16 3 7
At the end of the DATA step, the program loops back to the top and reads the next record. In this case, the next record is the fourth record. The SAS log shows that only four observations are created. |
MALE 27 1 8 0 0
FEMALE 3 14 5 10
FEMALE 34 2 10
MALE 35 2 12 4 8
FEMALE 36 4 16 3 7
SAS Log
NOTE: Invalid data for FreqDept in line 3 1-6. RULE: ----+----1----+----2----+----3----+----4-- 3 FEMALE 34 2 10 3 3 18 Gender=FEMALE Age=3 Bankcard=14 FreqBank=5 Deptcard=10 FreqDept=. _ERROR_=1 _N_=2 NOTE: 5 records were read from the infile MISSING. The minimum record length was 15. The maximum record length was 19. NOTE: SAS went to a new line when INPUT statement reached past the end of a line. NOTE: The data set WORK.SURVEY has 4 observations and 6 variables. |
PROC PRINT output shows the incorrect values for the second record. Note that the values for the third record do not appear. |
Obs | Gender | Age | Bankcard | FreqBank | Deptcard | FreqDept |
1 | MALE | 27 | 1 | 8 | 0 |
0 |
2 | FEMALE |
3 |
14 | 5 | 10 |
. |
3 | MALE | 35 | 2 | 12 | 4 |
8 |
4 | FEMALE | 36 | 4 | 16 | 3 |
7 |
When the same program is resubmitted on the edited file, the data values are read into the correct variables, and the missing values are marked as missing in the data set. |
MALE 27 1 8 0 0 |
FEMALE . 3 14 5 10 |
FEMALE 34 2 10 |
MALE 35 2 12 4 8 |
FEMALE 36 4 16 3 7 |
Obs | Gender | Age | Bankcard | FreqBank | Deptcard | FreqDept |
1 | MALE | 27 | 1 | 8 | 0 | 0 |
2 | FEMALE | . |
3 | 14 | 5 | 10 |
3 | FEMALE | 34 | 2 | 10 | 3 | 3 |
4 | MALE | 35 | 2 | 12 | 4 | 8 |
5 | FEMALE | 36 | 4 | 16 | 3 | 7 |
Copyright © 2002 SAS Institute Inc.,
Cary, NC, USA. All rights reserved.