SAS PROGRAMMING HANDOUT #4 INFILE and INPUT STATEMENTS The INFILE statement is used when reading in an external file. The FILE statement is used when outputing data to an external file. The INPUT statement is used to tell SAS how to read the data. It can be used with the INFILE statement (in which case the data is in an ASCI file), or it can be used if the data is in the program (a CARDS statement is then required) Generally SAS reads data in free format, assuming there is at least one space between values. Missing numerical values should be entered as periods (.). Character values have length of the first value or length 8 which ever is greatest. EXAMPLES: --------- INFILE 'A:\ONE.DAT' PAD; PAD is needed if some lines are shorter than others. INFILE 'A:\ONE.DAT' MISSOVER; MISSOVER is needed if there are missing values at the end of a line INFILE 'A:\ONE.DAT' FIRSTOBS=n; Tells SAS to start reading the data at the nth line, eg n=3 INFILE 'A:\ONE.DAT' DLM=','; The DELIMITER statement tells SAS that variables are separated by a comma. INPUT A $ X; This reads in A as a character variable and X as a numerical variable There must be at least one space between the A and X values Each observation needs to be on a new line. INPUT A $ X @@; This does the same as above, except each line can contain more than one observation. INPUT A : COMMA. B: MMDDYY.; Used when data is free formatted with different lengths. INPUT A $ @4 X; This reads A as a character variable, and reads X as a numerical variable starting in column 4. INPUT A $ 1-5 @10 X; This reads A as a character variable in columns 1-5, and X as a numerical variable starting in column 10 INPUT #1 A $ X #2 B $; This indicates the data is on two lines, variables A and X are on line 1, and variable B is on line 2. INPUT A $ X / B $; Same as above. INPUT A $ X; INPUT B $; Same as above INPUT A $ +2 X; This reads A as a character, then moves 2 spaces to right and reads X. LENGTH A $30.; This indicates the variable A is a character variable of length 30. SAS also has lots of formats/informats that are avialable for reading or writing data. FUNCTIONS: ---------- $ indicates a character variable $ & indicates a character variable with blanks, leave two (2) spaces at end : used when data is in free format @10 points to the column to read variable @@ trailing @@ at end on line indicates more than one observation per line #1, #2 indicates data is on more than one line. / indicates new line of input +2 indicates moving the pointer two spaces to right 1-10 indicates the data is in column 1 to 10 FORMATS/INFORMATS: (there are many others and many variations) ------------------ $CHAR6. AbCdEf MMDDYY8. 09/07/43 COMMA7.2 34,543.12 DOLLAR10.2 $3,451.23 DATE7. 25DEC97 YYQ4. 98Q1 TIME8. 14:34:26 9.2 These can be used in the INPUT statement for reading data, in the PUT statement for writing data, and in the FORMAT statment for printing data. PROC PRINT NOOBS U; VAR A X; This will print the data to the OUTPUT window. FORMAT A $.; RUN; DATA _NULL_; SET JIM; FILE 'A:\ONE.DAT'; This will put the data in a text file called ONE.DAT PUT @1 A @20 X; FORMAT A $7.; RUN;