SAS OnlineTutor HomeFAQ PageSuggested Learning PathsContents+Searchback||next

Reading Raw Data
Writing the DATA Step Program


Describing the Data

The INPUT statement describes the fields of raw data to be read and placed into the SAS data set.


To do this...   Use this SAS statement...
Reference SAS data librarylibname clinic 'c:\users\may\data';
Reference external filefilename tests 'c:\users\tmill.dat';
Name SAS data setdata clinic.stress;
Identify external fileinfile tests obs=10;
Describe DataINPUT statement
Execute the DATA stepRUN statement


General form, INPUT statement using column input:
INPUT variable  <$> startcol-endcol . . . ;

where

  • variable is the SAS name you assign to the field
  • the dollar sign ($) identifies the data set type as character
    (nothing appears here if the data set is numeric)
  • startcol represents the starting column location in the
    data line for this variable
  • endcol represents the ending column location in the data
    line for this variable


Take a look at the small raw data file illustrated below. For each field of raw data that you want to read into your SAS data set, you must assign the following in the INPUT statement:
  • a valid SAS variable name
  • a type (character or numeric)
  • a length (starting column and ending column).

Raw Data File Exercise

1---+----10---+----20
2810 61 MOD  F
2804 38 HIGH F 
2807 42 LOW  M 
2816 26 HIGH M
2833 32 MOD  F
2823 29 HIGH M 


Caution: The INPUT statement creates a variable using the name that you assign to each field. Therefore, when you write an INPUT statement, you need to specify the variables in the case that you want them to appear in the SAS data set.


The INPUT statement below assigns the character variable ID to the data in columns 1-4, the numeric variable Age to the data in columns 6-7, the character variable ActLevel to the data in columns 9-12, and the character variable Sex to the data in column 14.

Notice that the variables in the data set appear in mixed case, exactly as they are specified in the INPUT statement.

     input ID $ 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14;

SAS Data Set Work.Exercise

Obs ID Age ActLevel Sex
1 2810 61 MOD F
2 2804 38 HIGH F
3 2807 42 LOW M
4 2816 26 HIGH M
5 2833 32 MOD F
6 2823 29 HIGH M


When you use column input, you can
  • read any or all fields from the raw data file
  • read the fields in any order
  • specify only the starting column for values that occupy only one column
     input ActLevel $ 9-12 Sex $ 14 Age 6-7;


Specifying Variable Names

Each variable has a name that conforms to SAS naming conventions. Variable names must

  • be 1 to 32 characters in length
  • begin with a letter (A-Z, including mixed case characters) or an underscore (_)
  • continue with any combination of numbers, letters, or underscores.

These are examples of valid variable names:

  • Height
  • GLUCOSE_TOLERANCE_READING
  • AmountBudgeted_1999

Take a look at an INPUT statement that uses column input to read the three data fields in the raw data file below.


Raw Data File Admit
1---+----10---+----20
58MOD M 
29LOW F 
34LOW M 
41HIGHF 
30MOD F 
22HIGHM  


The values for the variable that you're naming Age are located in columns 1-2. Because Age is a numeric variable, you do not specify a dollar sign ($) after the variable name.
    input Age 1-2...; 

The values for the variable ActLevel are located in columns  3-6. You specify a $ to indicate that  ActLevel is a character variable.

    input Age 1-2 ActLevel $ 3-6...;

The values for the variable Sex are located in column 7. The $ indicates that Sex is a character variable. Notice that you specify only a single column.

    input Age 1-2 ActLevel $ 3-6 Sex $ 7;

Note: Your site may choose to restrict variables names to those valid in Version 6 SAS software, to uppercase variable names automatically, or to remove all restrictions on variable names.


back||next


Copyright © 2002 SAS Institute Inc., Cary, NC, USA. All rights reserved.

Terms of Use & Legal Information | Privacy Statement