SAS OnlineTutor HomeFAQ PageSuggested Learning PathsContents+Searchback||next

Transforming Data with SAS Functions
Lesson Summary

This page contains


I. Text Summary

To go to the page where a task, programming feature, or concept was presented, select a link.


Using SAS Functions
SAS functions can be used to convert data and manipulate character variable values. Functions are written by specifying the function name, then its arguments in parentheses. Arguments can include variables, constants, or expressions; although they are typically separated by commas, they can also be specified as variable lists or arrays.

Automatic Character-to-Numeric Conversion
When character variables are used in a numeric context, SAS software tries to convert the character values to numeric values. Numeric context includes arithmetic operations, comparisons with numeric values, and assignment to previously defined numeric variables. The original character values are not altered; the conversion creates temporary numeric values and places a note in the SAS log.

Explicit Character-to-Numeric Conversion
The INPUT function allows direct, controlled conversion of character values to numeric values. When a character variable is named by using a numeric informat, the INPUT function generates numeric values without a note being placed in the SAS log.

Automatic Numeric-to-Character Conversion
When numeric variables are used in a character context, SAS software tries to convert the numeric values to character values. Character context includes concatenation operations, use in functions that require character arguments, and assignment to previously defined character variables. The original numeric values are not altered; the conversion creates temporary character values and places a note in the SAS log.

Explicit Numeric-to-Character Conversion
The PUT function allows direct, controlled conversion of numeric values to character values. The format specified in a PUT function must match the source, so use an appropriate numeric format to create the new character values. No note will appear in the SAS log.

SAS Date and Time Values
SAS date values are stored as the number of days from January 1, 1960; time values are stored as the number of seconds since midnight. These values can be displayed in a variety of forms by associating them with SAS formats.

MONTH Function
To extract the month value from a SAS date value, specify the function MONTH followed by the SAS date value in parentheses. The value returned will be a number from 1 to 12.

YEAR Function
To extract the year value from a SAS date value, specify the function YEAR followed by the SAS date value in parentheses. The value returned will be a four-digit number.

MDY Function
To create a SAS date value for a month, day, and year, specify the MDY function followed by the date values. The result can be displayed in several ways by applying a SAS date format.

TODAY Function
To convert the current date to a SAS date value, specify the TODAY function without arguments. The TODAY and DATE functions can be used interchangeably.

SCAN Function
The SCAN function separates a character string to return a word based on its position. It defines words by counting delimiters, which are characters used as word separators. The name of the function is followed, in parentheses, by the name of the character variable, the number of delimiters to count, and the specified delimiters enclosed in quotation marks.

SUBSTR Function
The SUBSTR function can be used to extract or replace any portion of a character string. To extract values, place the function on the right side of an assignment statement and specify, in parentheses, the name of the character variable, the starting character position, and the number of characters to extract. To replace values, place the function on the left side of an assignment statement and specify, in parentheses, the name of the variable being modified, the starting character position, and the number of characters to replace.

SCAN versus SUBSTR
Both the SCAN and SUBSTR functions can extract a substring from a character value. While SCAN relies on delimiters, SUBSTR pulls values from specified locations. Use SCAN when you know the delimiter and the order of words. Use SUBSTR when the position of characters doesn't vary.

TRIM Function
Because SAS software pads the length of character values, unwanted spaces can sometimes appear after string concatenation. To remove trailing blanks from character values, specify the TRIM function with the name of a character variable. Remember, that trimmed values will be padded with blanks again if they are shorter than the length of the new variable.

INDEX Function
To test character values for the presence of a string, use the INDEX function and specify, in parentheses, the name of the variable and the string in quotation marks. The INDEX function can be used with an IF statement when creating a data set. However, only those observations where the function finds the string, and returns a value greater than 0 are written to the new data set.

UPCASE Function
Lowercase letters in character values can be converted to uppercase by using the UPCASE function. Include the function in an assignment statement, and specify the variable name in parentheses.

LOWCASE Function
Uppercase letters in character values can be converted to lowercase by using the LOWCASE function. Include the function in an assignment statement, and specify the variable name in parentheses.


II. Syntax

To go to the page where a function or argument was presented, select a link.


INPUT( source,informat)

PUT( source,format)

MONTH(date)

YEAR(date)

MDY(month,day,year)

TODAY()

SCAN(argument,n,delimiters)

SUBSTR(argument,position,n)

TRIM(argument)

INDEX(source,excerpt)

UPCASE(argument)

LOWCASE(argument)


III. Sample Programs
     data hrd.newtemp;
        set hrd.temp;
        Salary=input(payrate,2.)*hours;
     run;
     data hrd.newtemp;
        set hrd.temp;
        Assignment=put(site,2.)||'/'||department;
     run;
     data hrd.tempnov;
        set hrd.temp;
        if month(begindate)=11;
     run;
     data hrd.temp98;
        set hrd.temp;
        if year(begindate)=1998;
     run;
     data hrd.newtemp(drop=month day year);
        set hrd.temp;
        Date=mdy(month,day,year);
     run;
     data hrd.newtemp;
        set hrd.temp;
        EditDate=today();
     run;
     data hrd.newtemp(drop=name);
        set hrd.newtemp;
        length lastname firstname middlename $ 10;
        LastName=scan(name,1,' ,');
        FirstName=scan(name,2,' ,');
        MiddleName=scan(name,3,' ,');
     run;
     data hrd.temp2(drop=exchange);
        set hrd.temp;
        Exchange=substr(phone,1,3);
        if exchange='622' then substr(phone,1,3)='433';
     run;
     data hrd.newtemp(drop=address city state zip);
        set hrd.temp;
        NewAddress=trim(address)||', '||trim(city)||', '||zip;
     run;
     data hrd.datapool;
        set hrd.temp;
        if index(job,'word processing') > 0;
     run;
     data hrd.newtemp;
        set hrd.temp;
        Job=upcase(job);
     run;
     data hrd.newtemp;
        set hrd.temp;
        Contact=lowcase(contact);
     run;


IV. Points to Remember
  • Even if a function doesn't require arguments, the function name must still be followed by parentheses.

  • When specifying function arguments with a variable list or an array, be sure to precede the list or the array with the word OF.

  • To remember which function requires a format versus an informat, note that the INPUT function requires the informat.

  • If you specify an invalid date in the MDY function, a missing value is assigned to the target variable.

  • The SCAN function treats contiguous delimiters as one delimiter; leading delimiters have no effect.

  • To save storage space, add a LENGTH statement to your DATA step to set an appropriate length when using the SCAN function. Place the LENGTH statement before the assignment statements that contain the SCAN function.

  • When the SUBSTR function is on the left side of an assignment statement, it replaces variable values. When SUBSTR is on the right side of an assignment statement, it extracts variable values. The syntax of the function is the same; only the placement of the function changes.

  • The INDEX function is case sensitive. To ensure that all forms of a character string are found, use the UPCASE or the LOWCASE function with the INDEX function.



back||next

 

Copyright © 2002 SAS Institute Inc., Cary, NC, USA. All rights reserved.

Terms of Use & Legal Information | Privacy Statement