SAS PROGRAMMING HANDOUT #3 STATISTICAL PROCEDURES This handout describes several of the basic procedures in SAS SAS always uses the most recently created data set, but you can also specify the data set to be used in all procedures. DATA ONE; INPUT Y1 Y2 A $ B $; CARDS; 1 2 A B 2 3 A D 3 4 C D 4 5 C B 5 6 C B ; RUN; *comments are written like this; PROC CONTENTS DATA=ONE; RUN; * This will give you information about the; * data set ONE ; PROC PRINT DATA=ONE; * This will print the data set ONE. VAR Y1 Y2; FORMAT Y2 DOLLAR.; RUN; PROC MEANS DATA=ONE; * This will compute the mean and std ; VAR Y1 Y2; * of the variables Y1 and Y2, and then put them ; OUTPUT OUT=TWO MEAN=MY1 MY2 STD=SY1 SY2; * in a data set called TWO ; PROC PRINT DATA=TWO; RUN; DATA FOUR; IF _N_=1 THEN SET TWO; * This will merge the data set TWO (one observation) ; SET ONE; * into the data set ONE. ; PROC PRINT DATA=FOUR; RUN; PROC SORT DATA=ONE OUT=THREE; BY A; * This will sort your data set by X and rename it as ; PROC PRINT DATA=THREE; RUN; * THREE. Need in order to use BY later ; * Can also do: BY X Y or BY DESCENDING X; PROC SORT; BY A B; * This will compute the mean and std ; PROC MEANS; BY A B; VAR Y1 Y2; * of Y1 and Y2 for each level of A and B. ; OUTPUT OUT=BB MEAN=MY1 MY2 STD=SY1 SY2; PROC PRINT DATA=BB; RUN; PROC MEANS DATA=ONE; CLASS A B; * This is similar to using the BY statement ; VAR Y1 Y2; * but also produces overall and ; OUTPUT OUT=CC MEAN=MY1 MY2 STD=SY1 SY2; * main effect means _TYPE_ specifies the level; PROC PRINT DATA=CC; RUN; PROC FREQ DATA=ONE; * This will create a 2x2 table for A and B ; TABLES A*B/CHISQ OUT=EE; * frequency counts and do a Chi-squared test ; PROC PRINT DATA=EE; RUN; * and puts the counts in the data set EE ; PROC UNIVARIATE PLOT NORMAL DATA=ONE; * This will compute several statistics for Y1 ; VAR Y1; * percentiles, box-plots, stem-leaf, and a test of normality; OUTPUT OUT=FF MEDIAN=MED P5=P5; * put the median and 5th percentile in the data set FF ; HISTOGRAM Y1 / NORMAL; * This is make a nice histogram ; QQPLOT Y1 / NORMAL; * This will make a nice normal probability plot. PROC PRINT DATA=FF; RUN; PROC CORR DATA=ONE OUTP=GG; * This will compute the correlation between ; VAR Y1 Y2; * Y1 and Y2 and puts the correlations in GG ; PROC PRINT DATA=GG; RUN; PROC REG DATA=ONE; * This will preform linear regression y2=a+b*y1 ; MODEL Y2=Y1/P; * compute the predicted values, and residuals ; OUTPUT OUT=HH P=P R=R; * and put them in the data set HH ; PROC PRINT DATA=HH; RUN; You are strongly encouraged to use the SAS help to learn more about these basic procedures. HELP -> SAS System help -> Help using SAS software products -> BASE SAS -> Using Base SAS Software -> SAS Procedures -> PROC MEANS etc or -> SAS/STAT -> SAS/STAT Procedures -> PROC REG etc Look at -> SYNTAX to find out about various commands and options NOTE: Good SAS programming requires proper indentation. DATA and PROC start in column 1, everything else starts in Column 2 or higher. Always specify the data set in a PROC Include comments, use sensible data set names and variable names *revised 12/15/2002;