SAS OnlineTutor HomeFAQ PageSuggested Learning PathsContents+Searchback||next

Reading Raw Data
Lesson Summary

 

This page contains


I. Text Summary

To go to the page where a task, programming feature, or concept was presented, select a link.

Raw Data Files
A raw data file is an external file whose records contain data values organized in fields. The raw data files in this lesson contain fixed fields.

Steps to Create a SAS Data Set
You need to follow several steps to create a SAS data set using raw data. You need to

  • reference the raw data file to be read
  • name the SAS data set
  • identify the location of the raw data
  • describe the data values to be read.

Referencing a SAS Data Library
To begin your program, you use a LIBNAME statement to reference the SAS data library where your data set will be stored.

Referencing a Raw Data File
Before you can read your raw data, you must reference the raw data file by creating a fileref. Just as you assign a libref using a LIBNAME statement, you assign a fileref using a FILENAME statement.

Viewing Active Librefs and Filerefs
You can view the librefs and filerefs currently defined for your SAS session by using the Explorer window.

Writing the DATA step Program
The DATA statement indicates the beginning of the DATA step and names the SAS data set to be created.

Next, you specify the raw data file by using the INFILE statement. The OBS= option in the INFILE statement enables you to process only through record n. You can also use the PAD option with the INFILE statement if the raw data file contains data lines of different lengths.

This lesson teaches column input, the most common input style. Column input specifies actual column locations for data values. The INPUT statement describes the raw data to be read and placed into the SAS data set.

Submitting the Program
When you submit the program, you can use the OBS= option with the INFILE statement to verify that the correct data is being read before reading the entire data file.

After you submit the program, view the log to check the DATA step processing. You can then list the data set by using the PROC PRINT procedure.

Once you've checked the log and verified your data, you can modify the DATA step to read the entire raw data file by removing the OBS= option from the INFILE statement.

If you are working with a raw data file that contains invalid data, the DATA step continues to execute. Unlike syntax errors, invalid data errors do not cause the SAS System to stop processing a program. If you have a way to edit the invalid data, it's best to correct the problem and rerun the DATA step.

Subsetting Data
To subset data, you can use a subsetting IF statement in any DATA step to process only those observations that meet a specified condition.


II. Syntax

To go to the page where a statement or option was presented, select a link.

LIBNAME libref  'SAS-data-library';
FILENAME fileref 'filename';
DATA SAS-data-set;
        INFILE file-specification <OBS=n>;
        INPUT variable <$> startcol-endcol . . . ;
RUN;
PROC PRINT DATA=SAS-data set;
RUN;


III. Sample Program

The example shown below uses a column input style to read the first five observations in the raw data file C:\Clinic\Patients\admit.dat. The SAS data set named Admittance will be stored in the SAS library C:\Bethesda\Patients\Admit.


Raw Data File Admit
1---+----10---+----20
58MOD M
29LOW F
34LOW M
41HIGHF
30MOD F
22HIGHM


     libname clinic 'c:\bethesda\patients\admit';
     filename admit 'c:\clinic\patients\admit.dat';
     data clinic.admittance;
        infile admit obs=5;
        input Age 1-2 Actlevel $ 3-6 Sex $ 7;
     run;
     proc print data=clinic.admittance;
     run;


IV. Points to Remember

  • FILENAME statements are global.  Filerefs remain in effect until you change them, cancel them, or end your SAS session.

  • For each field of raw data you read into your SAS data set, you must assign the following in the INPUT statement: a valid SAS variable name, a type (character or numeric), and a length.

  • When you use column input, you can read any or all fields from the raw data file, read the fields in any order, and specify only the starting column  for values that occupy only one column.

  • Column input is appropriate only in certain situations. When you use column input, your data must be standard character and numeric values and they must be in fixed fields.  That is, values for a particular variable must be in the same location on all records.


back||next

 

Copyright © 2002 SAS Institute Inc., Cary, NC, USA. All rights reserved.
Terms of Use & Legal Information | Privacy Statement