SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心
Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze Data Using Statistical Procedures Data Step PROCs
Structure of Data Made up of rows and columns Rows in SAS are called observations Columns in SAS are called variables An observation is all the information for one entity (patient, patient visit, clinical center, county) SAS processes data one observation at a time
Example of Data 12 observations and 5 variables F 23 S 15 MN F 21 S 15 WI F 22 S 09 MN F 35 M 02 MN F 22 M 13 MN F 25 S 13 WI M 20 S 13 MN M 26 M 15 WI M 27 S 05 MN M 23 S 14 IA M 21 S 14 MN M 29 M 15 MN
Example of Data 12 observations and 5 variables F 23 S 15 MN F 21 S 15 WI F 22 S 09 MN F 35 M 02 MN F 22 M 13 MN F 25 S 13 WI M 20 S 13 MN M 26 M 15 WI M 27 S 05 MN M 23 S 14 IA M 21 S 14 MN M 29 M 15 MN
Example of Data 12 observations and 5 variables ? F23S15MN F21S15WI F22S09MN F35M02MN F22M13MN F25S13WI M20S13MN M26M15WI M27S05MN M23S14IA M21S14MN M29M15MN Need to know the starting and ending position for each variable.
Types of Data Numeric (e.g. age, blood pressure) Character (patient ID, diagnosis) You need to tell SAS if the data is character. The default is numeric.
Rules for SAS Statements and Variables SAS statements end with a semicolon (;) SAS statements can be entered in lower or uppercase Multiple SAS statements can appear on one line A SAS statement can use multiple lines Variable names can be from 1-32 characters and must begin with A-Z or an underscore (_)
* This is a short example program to demonstrate what a SAS program looks like. This is a comment statement because it begins with a * and ends with a semi-colon ; DATA demo; INFILE DATALINES; INPUT gender $ age marstat $ credits state $ ; if credits > 12 then fulltime = 'Y'; else fulltime = 'N'; if state = 'MN' then resid = 'Y'; else resid = 'N'; DATALINES; F 23 S 15 MN F 21 S 15 WI F 22 S 09 MN F 35 M 02 MN F 22 M 13 MN F 25 S 13 WI M 20 S 13 MN M 26 M 15 WI M 27 S 05 MN M 23 S 14 IA M 21 S 14 MN M 29 M 15 MN ; RUN; TITLE 'Running the Example Program'; PROC PRINT DATA=demo ; VAR gender age marstat credits fulltime state ; RUN;
1 DATA demo; Create a SAS dataset called demo 2 INFILE DATALINES; Where is the data? 3 INPUT gender $ What are the variable age names and types? marstat $ credits state $ ; 4 if credits > 12 then fulltime = 'Y'; else fulltime = 'N'; 5 if state = 'MN' then resid = 'Y'; else resid = 'N'; Statements 4 and 5 create 2 new variables
6 DATALINES; Tells SAS the data is coming F 23 S 15 MN F 21 S 15 WI F 22 S 09 MN F 35 M 02 MN F 22 M 13 MN F 25 S 13 WI M 20 S 13 MN M 26 M 15 WI M 27 S 05 MN M 23 S 14 IA M 21 S 14 MN M 29 M 15 MN ; Tells SAS the data is ending 7 RUN; Tells SAS to run the statements above
Main SAS Windows (PC) Editor Window – where you type your program Log Window –lists program statements processed, giving notes, warnings and errors. Always look at the log window ! Tells how SAS understood your program Output Window – gives the output generated from the PROCs Submit program by clicking on run icon
PC SAS WINDOWS (OUTPUT WINDOW IS HIDDEN)
Main SAS Files Program file – type your program in text editor –fname.sas Log file – lists program statements processed, giving notes, warnings and errors. –fname.log Output file – gives the output generated from the PROCs –fname.lst Submit program by typing: sas fname.sas
Messages in SAS Log Notes – messages that may or may not be important Warnings – messages that are usually important Errors – fatal in that program will abort (notes and warnings will not abort your program)
* This is a short example program to demonstrate what a SAS program looks like. This is a comment statement because it begins with a * and ends with a semi-colon ; DATA demo; INFILE DATALINES; INPUT gender $ age marstat $ credits state $ ; if credits > 12 then fulltime = 'Y'; else fulltime = 'N'; if state = 'MN' then resid = 'Y'; else resid = 'N'; DATALINES; F 23 S 15 MN F 21 S 15 WI F 22 S 09 MN F 35 M 02 MN F 22 M 13 MN F 25 S 13 WI M 20 S 13 MN M 26 M 15 WI M 27 S 05 MN M 23 S 14 IA M 21 S 14 MN M 29 M 15 MN ; RUN; TITLE 'Running the Example Program'; PROC PRINT DATA=demo ; VAR gender age marstat credits fulltime state ; RUN;
LOG WINDOW (or file) NOTE: Copyright (c) by SAS Institute Inc., Cary, NC, USA. NOTE: SAS (r) Proprietary Software Release 8.2 (TS2M0) Licensed to UNIVERSITY OF MINNESOTA, Site NOTE: This session is executing on the WIN_NT platform. NOTE: SAS initialization used: real time 7.51 seconds cpu time 0.89 seconds 1 * This is a short example program to demonstrate what a 2 SAS program looks like. This is a comment statement because 3 it begins with a * and ends with a semi-colon ; 4 5 DATA demo; 6 INFILE DATALINES; 7 INPUT gender $ age marstat $ credits state $ ; 8 9 if credits > 12 then fulltime = 'Y'; else fulltime = 'N'; 10 if state = 'MN' then resid = 'Y'; else resid = 'N'; 11 DATALINES; NOTE: The data set WORK.DEMO has 12 observations and 7 variables. NOTE: DATA statement used: real time 0.38 seconds cpu time 0.06 seconds
25 RUN; 26 TITLE 'Running the Example Program'; 27 PROC PRINT DATA=demo ; 28 VAR gender age marstat credits fulltime state ; 29 RUN; NOTE: There were 12 observations read from the data set WORK.DEMO. NOTE: PROCEDURE PRINT used: real time 0.19 seconds cpu time 0.02 seconds 30 PROC MEANS DATA=demo N SUM MEAN; 31 VAR age credits ; 32 RUN; NOTE: There were 12 observations read from the data set WORK.DEMO. NOTE: PROCEDURE MEANS used: real time 0.25 seconds cpu time 0.03 seconds 33 PROC FREQ DATA=demo; TABLES gender; 34 RUN; NOTE: There were 12 observations read from the data set WORK.DEMO. NOTE: PROCEDURE FREQ used: real time 0.15 seconds cpu time 0.03 seconds
OUTPUT WINDOW (OR LST FILE) Running the Example Program Obs gender age marstat credits fulltime state 1 F 23 S 15 Y MN 2 F 21 S 15 Y WI 3 F 22 S 9 N MN 4 F 35 M 2 N MN 5 F 22 M 13 Y MN 6 F 25 S 13 Y WI 7 M 20 S 13 Y MN 8 M 26 M 15 Y WI 9 M 27 S 5 N MN 10 M 23 S 14 Y IA 11 M 21 S 14 Y MN 12 M 29 M 15 Y MN The MEANS Procedure Variable N Sum Mean age credits The FREQ Procedure Cumulative Cumulative gender Frequency Percent Frequency Percent F M
Some common procedures PROC PRINT print out your data - always a good idea!! PROC MEANS descriptive statistics for continuous data PROC FREQ descriptive statistics for categorical data PROC UNIVARIATE very detailed descriptive statistics for continuous data PROC TTEST performs t-tests (continuous data)