Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.

Similar presentations


Presentation on theme: "Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida."— Presentation transcript:

1 Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida

2 Section 5.1 Introduction University of Central Florida wants to analyze their applicants’ SAT score and age.

3 Scenario The university must –write the relevant data to a SAS data set so that it can be processed further –determine the minimum, maximum, and average age and SAT score of applicants –assign each applicant a decision (acceptable, unacceptable, and undecided) based on SAT score continued...

4 Scenario determine the number of applicants with acceptable, unacceptable, and undecided status determine the minimum, maximum, and average SAT score for each group of applicants create an enhanced report that lists all of the applicants.

5 Calculating Summary Statistics for All Applicants Variable Label N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ sat SAT Score 67574 984.13 227.49 200.00 1600.00 act ACT Score 22520 22.86 3.73 8.00 35.00 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

6 Creating a New Variable Using Conditional Processing if sat >= 1200 then decision1=1; else if sat > 800 and sat < 1200 then decision1 = 2; else if sat <= 800 then decision1=3; else decision1 = 4; < 800? TRUE

7 Calculating Summary Statistics for Each Type of Applicants Analysis Variable : sat SAT Score SAT Based Decision N Obs N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Acceptable 10037 10037 1278.14 69.66 1200.00 1600.00 Undecide 45895 45895 1019.04 101.70 810.00 1190.00 Unaccaptable 50361 11642 593.02 152.08 200.00 800.00

8 Creating an Enhanced List Report Age Decision Average SAT Score Teenage Acceptab 1277.17 Undecide 1032.34 Unaccapt 732.72 Twenty Acceptab 1281.25 Undecide 1014.48 Unaccapt 575.11 Thirty Acceptab 1296.76 Undecide 1015.23 Unaccapt 502.51 Over 40 Acceptab 1278.82 Undecide 1003.91 Unaccapt 501.37

9 Section 5.2 Creating a SAS Data Set Objectives Create a SAS data set using another SAS data set as input. Read and create permanent SAS data sets. Select variables to store in a SAS data set.

10 Reading a SAS Data Set In order to create a SAS data set using a SAS data set as input, you must –start a DATA step and name the SAS data set being created (DATA statement) –identify the input SAS data set (SET statement).

11 Reading a SAS Data Set General form of a DATA step: DATA SAS-data-set; SET SAS-data-set; additional SAS statements RUN; Example: data work.revised; set work.rawdat; run;

12 Reading a SAS Data Set By default, the SET statement reads –all of the observations from the input SAS data set –all of the variables from the input SAS data set.

13 Selecting Variables to Store in a SAS Data Set You can use a DROP= or KEEP= data set option in a DATA statement to control what variables are written to the new SAS data set. General form of the DROP= and KEEP= data set options: SAS-data-set(DROP=variables) or SAS-data-set(KEEP=variables)

14 Selecting Variables to Store in a SAS Data Set General form of the DROP= and KEEP= data set options in a DATA statement: DATA output-SAS-data-set (DROP=variables); SET input-SAS-data-set; RUN; Example: data work.revised ( drop = a b) ; set work.rawdat; run;

15 Temporary SAS data set Reading a SAS Data Set Input Data SetOutput Data Set SET StatementDATA Statement Temporary SAS data set Permanent SAS data set Permanent SAS data set

16 What Is a SAS Data Library? Regardless of which host operating system you use, you identify SAS data libraries by assigning each one a libref. libref

17 What Is a SAS Data Library? By default, SAS creates two SAS data libraries: –a temporary library called WORK –a permanent library called SASUSER. SASUSER WORK

18 What Is a SAS Data Library?

19 Assigning a Libref You can use the LIBNAME statement to assign a libref to a SAS data library. General form of the LIBNAME statement: Example: Windows libname sta4102 "C:\sta 4102\prog1\chapter5\sasdata"; LIBNAME libref ‘SAS-data-library’ ;

20 Assigning a Libref More examples: UNIX libname mydata '/users/company/datalib'; OS/390 libname mydata 'edc.company.datalib'; Selected option: DISP=OLD|SHRspecifies the disposition of the file. The default is OLD.

21 What Is a SAS Filename? General form of a SAS filename: libref refers to the SAS data library (library reference). SAS-data-set-namerefers to a SAS data set in the library. libref.SAS-data-set-name

22 What Is a SAS Filename? The data set EMPLOYEE is a SAS file in the IA data library. The SAS filename is IA.EMPLOYEE. DATA IA WORK SASUSER

23 What Is a SAS Filename? The libref WORK can be omitted when you refer to a file in the WORK library. Example: work.revised revised

24 Examining a SAS Data Library Generate a listing of all SAS files in a SAS data library. IA WORK SASUSER

25 Examining a SAS Data Library General form of the CONTENTS procedure to access the contents of a data library: _ALL_requests a listing of all the files in the library NODSsuppresses the printing of detailed information about each file. PROC CONTENTS DATA=libref._ALL_ NODS; RUN;

26 Examining the IA Data Library proc contents data=sta4102._all_ nods; run; IA WORK SASUSER

27 Reading and Creating Permanent SAS Data Sets data sta4102.revised (drop = a b); ; set sta4102.rawdat; run;

28 Creating a New Column The SUM function adds the values of the arguments and ignores missing values. General form of the SUM function to create a new variable: variablevariable you want to create argumentvariables, literals, or expressions to be summed. variable = SUM(argument1, argument2);

29 Reading and Creating Permanent SAS Data Sets data sta4102.example1 (keep = sat gender year semester); set sta4102.revised; sat = sum(verbal,quant); run;

30 Assigning Librefs and Creating SAS Data Sets File: pg1-ch5-ex03.sas This demonstration illustrates how to assign a libref and create a SAS data set from another SAS data set.

31 Section 5.3 Calculating Simple Descriptive Statistics Objectives –Generate simple descriptive statistics using the MEANS procedure. –Restrict the variables processed by the MEANS procedure using the VAR statement. –Group observations of a SAS data set for analysis using the CLASS statement in the MEANS procedure.

32 Business Scenario University of Central Florida wants to determine the minimum, maximum, and average SAT scorefor their applicants. Minimum SAT: 1600 Maximum SAT:200 Average SAT:984.13

33 Calculating Summary Statistics By default, PROC MEANS –analyzes every numeric variable in the SAS data set –prints the statistics N, MEAN, STD, MIN, and MAX –excludes missing values before calculating statistics.

34 Selecting Variables The VAR statement restricts the variables processed by PROC MEANS. Variables that are included in the statistical analysis are listed in the VAR statement. General form of the VAR statement: VAR SAS-variables;

35 Selecting Variables General form of the VAR statement in a PROC MEANS step: Example: proc means data=sta4102.example1; label sat='SAT Score'; var sat act; run; PROC MEANS DATA=SAS-data-set; VAR variables; RUN;

36 The MAXDEC= Option The PROC MEANS statement option MAXDEC= gives the maximum number of decimal places for PROC MEANS to use in printing results. General form of PROC MEANS with the MAXDEC= option: PROC MEANS DATA=SAS-data-set MAXDEC=number; RUN;

37 The MAXDEC= Option proc means data=sta4102.example1 maxdec=2; label sat='SAT Score'; var sat act; run; Variable Label N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ sat SAT Score 67574 984.13 227.49 200.00 1600.00 act ACT Score 22520 22.86 3.73 8.00 35.00 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

38 Grouping Observations The CLASS statement in the MEANS procedure groups the observations of the SAS data set for analysis. General form of the CLASS statement: CLASS SAS-variables;

39 Grouping Observations General form of a CLASS statement in a PROC MEANS step: Example: proc means data=sta4102.example1 maxdec=2; label sat='SAT Score'; var sat act; class gender; run; PROC MEANS DATA=SAS-data-set; CLASS SAS-variables; RUN;

40 Grouping Observations The MEANS Procedure Analysis Variable : sat SAT Score Gender N Obs N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Female 59999 36653 962.11 223.76 200.00 1600.00 Male 46210 30893 1010.33 229.00 200.00 1600.00 Not Report 84 28 906.79 318.95 350.00 1490.00 Partial SAS Output

41 Creating Simple Descriptive Statistics Reports File: pg1-ch5-ex05.sas This demonstration illustrates how to create a simple descriptive statistics report.

42 Section 5.4 Creating Variables through Conditional Processing Objectives –Execute statements conditionally using IF-THEN logic. –Control the length of character variables explicitly with the LENGTH statement.

43 Scenario University of Central Florida wants to identify applicants based on their SAT score as unacceptable, acceptable, and undecided. Unacceptable: SAT  800 Acceptable:SAT  1200 Undecided:800 < SAT <1200

44 Scenario University of Central Florida wants to calculate descriptive statistics for each type of applicants. Analysis Variable : sat SAT Score SAT Based Decision N Obs N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Acceptable 10037 10037 1278.14 69.66 1200.00 1600.00 Undecide 45895 45895 1019.04 101.70 810.00 1190.00 Unaccaptable 50361 11642 593.02 152.08 200.00 800.00

45 Conditionally Executing Statements General form of IF-THEN and ELSE statements: IF expression THEN statement; ELSE statement;

46 Conditionally Executing Statements Expression contains operands and operators that form a set of instructions that produce a resulting value. Operands are variable names functions constants.

47 Conditionally Executing Statements Operators are symbols that request a comparison a logical operation an arithmetic calculation.

48 Conditionally Executing Statements Example: if sat >= 1200 then decision1=1; else if sat > 800 and sat < 1200 then decision1 = 2; else if sat <= 800 then decision1=3; else decision1 = 4;

49 Conditionally Creating a New Variable if sat >= 1200 then decision1=1; else if sat > 800 and sat < 1200 then decision1 = 2; else if sat <= 800 then decision1=3; else decision1 = 4;...

50 Conditionally Creating a New Variable if sat >= 1200 then decision1=1; else if sat > 800 and sat < 1200 then decision1 = 2; else if sat <= 800 then decision1=3; else decision1 = 4; >= 1200? TRUE

51 Conditionally Creating a New Variable if sat >= 1200 then decision1=1; else if sat > 800 and sat < 1200 then decision1 = 2; else if sat <= 800 then decision1=3; else decision1 = 4; >= 1200? FALSE

52 Conditionally Creating a New Variable if sat >= 1200 then decision1=1; else if sat > 800 and sat < 1200 then decision1 = 2; else if sat <= 800 then decision1=3; else decision1 = 4; > 800 and < 1200? TRUE

53 Conditionally Creating a New Variable if sat >= 1200 then decision1=1; else if sat > 800 and sat < 1200 then decision1 = 2; else if sat <= 800 then decision1=3; else decision1 = 4; > 1200? FALSE

54 Conditionally Creating a New Variable if sat >= 1200 then decision1=1; else if sat > 800 and sat < 1200 then decision1 = 2; else if sat <= 800 then decision1=3; else decision1 = 4; > 800 and < 1200? TRUE

55 Conditionally Creating a New Variable if sat >= 1200 then decision1=1; else if sat > 800 and sat < 1200 then decision1 = 2; else if sat <= 800 then decision1=3; else decision1 = 4; > 1200? FALSE

56 Conditionally Creating a New Variable if sat >= 1200 then decision1=1; else if sat > 800 and sat < 1200 then decision1 = 2; else if sat <= 800 then decision1=3; else decision1 = 4; > 800 and < 1200? FALSE

57 Conditionally Creating a New Variable if sat >= 1200 then decision1=1; else if sat > 800 and sat < 1200 then decision1 = 2; else if sat <= 800 then decision1=3; else decision1 = 4; < 800? TRUE

58 Conditionally Executing Statements File: pg1-ch5-ex05.sas This demonstration illustrates how to create a SAS variable conditionally.

59 The LENGTH Statement You can use the LENGTH statement to define the length of a variable explicitly. General form of the LENGTH statement: Example: length Size $ 6; /* Character variable */ length decision1 3; /* Numeric variable */ LENGTH variable(s) ($) length;

60 The LENGTH Statement data sta4102.example2; set sta4102.example1; length decision1 3; if sat >= 1200 then decision1=1; else if sat > 800 and sat < 1200 then decision1 = 2; else if sat <= 800 then decision1=3; else decision1 = 4; label decision1 = 'SAT Based Decision'; format decision1 desfmt.; run;

61 Calculating SAT Statistics for Each Type of Applicants proc means data=sta4102.example2 maxdec=2; label sat='SAT Score'; var sat; class decision1; run;

62 Calculating SAT Statistics for Each Type of Applicants Analysis Variable : sat SAT Score SAT Based Decision N Obs N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Acceptable 10037 10037 1278.14 69.66 1200.00 1600.00 Undecide 45895 45895 1019.04 101.70 810.00 1190.00 Unaccaptable 50361 11642 593.02 152.08 200.00 800.00

63 Conditionally Executing Multiple Statements You can use DO and END statements to execute a group of statements based on a condition. General form of DO and END statements: IF expression THEN DO; executable statements END; ELSE DO; executable statements END;

64 Explicitly Assigning a Length to a Variable and Calculating Summary Statistics File: pg1-ch5-ex05.sas This demonstration illustrates how to explicitly assign a length to a variable and calculate capacity statistics for each type of plane.

65 Section 5.5 Creating a Detail Report with the REPORT Procedure Objectives –Create a list report using the REPORT procedure in a nonwindowing environment. –Select variables to include in a list report using the COLUMN statement. –Enhance a report using the DEFINE statement.

66 Scenario University of Central Florida wants to create an enhanced list report. Age Decision Average SAT Score Teenage Acceptab 1277.17 Undecide 1032.34 Unaccapt 732.72 Twenty Acceptab 1281.25 Undecide 1014.48 Unaccapt 575.11 Thirty Acceptab 1296.76 Undecide 1015.23 Unaccapt 502.51 Over 40 Acceptab 1278.82 Undecide 1003.91 Unaccapt 501.37

67 67 PROC REPORT The REPORT procedure combines features of the PRINT, MEANS, and TABULATE procedures with features of the DATA step in a single report-writing tool that can produce a variety of reports. Detail Report: detail row –A simple PROC REPORT step produces a report similar to one produced by a simple PROC PRINT step. Summary Report: summary line

68 PROC REPORT Features In addition to creating list reports, PROC REPORT enables you to –create custom reports –request separate subtotals and grand totals –calculate columns –create and store report definitions –generate reports in windowing or nonwindowing environments.

69 69 Simple Detailed Report with Each Row for Each Observation Ordered Detail Report with Default and Customized Summaries

70 Creating a List Report General form of a simple PROC REPORT step: Selected options: WINDOWS|WDinvokes the procedure in a windowing mode (default). NOWINDOWS|NOWDdisplays a listing of the report in the OUTPUT window. PROC REPORT DATA=SAS-data-set ; RUN;

71 Creating a List Report... proc report data=sta4102.example3 nowd; run; Semeste Entranc SAT Based Gender r e Age SAT Score Decision Male Fall Twenty 930 Undecide Female Fall Twenty 1200 Acceptable Female Fall Twenty 900 Undecide Female Fall Twenty 680 Unaccaptable Male Fall Twenty 960 Undecide Female Fall Twenty 830 Undecide

72 Printing Selected Variables You can use a COLUMN statement to –select the variables to appear in the report –select the order in which the variables appear in the report. –defines headers that span multiple columns. A General form of the COLUMN statement: COLUMN SAS-variables;

73 Creating a List Report proc report data=sta4102.example3 nowd; column year semester entry_age decision1 sat; run;

74 The REPORT Procedure The listing displays each data value as it is stored in the data set variable names as report column headings a default width for the report columns character values left-justified numeric values right-justified observations in the order in which they are stored in the data set.

75 The DEFINE Statement You can enhance the report by using DEFINE statements to –define how each variable is used in the report –assign formats to variables –specify report column headers and column widths –justify the variable values and column headings within the report columns –change the order of the rows in the report.

76 The DEFINE Statement General form of the DEFINE statement: You can define options (usage and attributes) in the DEFINE statement in any order. Default usage for character variables is DISPLAY. –The report lists all of the variable’s values from the data set. DEFINE variable / ;

77 77 Usage of Variables in PROC REPORT Usage of Variables in a Report Much of a report’s layout is determined by the usages that you specify for variables in the DEFINE statements or DEFINITION windows. For data set variables, these usages areDISPLAYORDERACROSSGROUPANALYSIS COMPUTEDA report can contain variables that are not in the input data set. These variables must have a usage of COMPUTED.

78 78 Display Variables A report that contains one or more display variables has a row for every observation in the input data set. Display variables do not affect the order of the rows in the report. If no order variables appear to the left of a display variable, then the order of the rows in the report reflects the order of the observations in the data set. By default, PROC REPORT treats all character variables as display variables.

79 79 Order Variables A report that contains one or more order variables has a row for every observation in the input data set. If no display variable appears to the left of an order variable, then PROC REPORT orders the detail rows according to the ascending, formatted values of the order variable. You can change the default order with ORDER= and DESCENDING in the DEFINE statement or with the DEFINITION window. If the report contains multiple order variables, then PROC REPORT establishes the order of the detail rows by sorting these variables from left to right in the report. PROC REPORT does not repeat the value of an order variable from one row to the next if the value does not change, unless an order variable to its left changes values.

80 80 Across Variables PROC REPORT creates a column for each value of an across variable. PROC REPORT orders the columns by the ascending, formatted values of the across variable. You can change the default order with ORDER= and DESCENDING in the DEFINE statement or with the DEFINITION window. If no other variable helps define the column, then PROC REPORT displays the N statistic (the number of observations in the input data set that belong to that cell of the report). Across variables are class variables that are used in the column dimension.

81 81 Group Variables If a report contains one or more group variables, then PROC REPORT tries to consolidate into one row all observations from the data set that have a unique combination of formatted values for all group variables. When PROC REPORT creates groups, it orders the detail rows by the ascending, formatted values of the group variable. You can change the default order with ORDER= and DESCENDING in the DEFINE statement or with the DEFINITION window. With multiple group variables, PROC REPORT establishes the order of the detail rows by sorting these variables from left to right. PROC REPORT does not repeat the values of a group variable from one row to the next if the value does not change, unless a group variable to its left changes values. Group variables are class variables that are used in the row dimension.

82 82 Analysis Variables An analysis variable is a numeric variable that is used to calculate a statistic for all the observations represented by a cell of the report. Across variables, in combination with group variables or order variables, determine which observations a cell represents. You associate a statistic with an analysis variable in the variable’s definition or in the COLUMN statement. By default, PROC REPORT uses numeric variables as analysis variables that are used to calculate the Sum statistic.

83 83 Computed Variables Computed variables are variables that you define for the report. They are not in the input data set, and PROC REPORT does not add them to the input data set. However, computed variables are included in an output data set if you create one. In the nonwindowing environment, you add a computed variable by –including the computed variable in the COLUMN statement –defining the variable’s usage as COMPUTED in the DEFINE statement –computing the value of the variable in a compute block associated with thevariable.

84 The DEFINE Statement Default usage for a numeric variables is ANALYSIS. If the report contains at least one display variable and no group variables, the report lists all of the values of the numeric variable. If the report contains only numeric variables, the report displays grand totals for the numeric variables. If the report contains group variables, the report displays the sum of the numeric variables’ values for each group.

85 The DEFINE Statement Other available statistics include

86 The DEFINE Statement Additional usage:

87 The DEFINE Statement Selected attributes: continued...

88 The DEFINE Statement continued...

89 The DEFINE Statement

90 Creating an Enhanced List Report The enhanced aircraft capacity list report includes –appropriate report column headings –formatted values for the INSERVICE variable –column widths wide enough for the headings –values and headings centered within the columns –rows of the report ordered by descending values of the variable SIZE.

91 Enhancing a List Report File: pg1-ch5-ex06.sas This demonstration illustrates how to enhance a list report.

92 Chapter Summary Create a SAS data set from an existing SAS data set DATA SAS-data-set; SET SAS-data-set; additional SAS statements RUN; Selecting variables to store in a SAS Data set SAS-data-set(DROP=variables) or SAS-data-set(KEEP=variables)

93 Chapter Summary SAS data library  Default library: WORK and SASUSER  Assign a libref: LIBNAME libref ‘SAS-data-library’ ; Examining a SAS data library PROC CONTENTS DATA=libref._ALL_ NODS; RUN; Creating new variables Variable = expression

94 Chapter Summary Creating summary statistics with PROC MEANS PROC MEANS DATA=SAS-data-set; VAR variables; RUN; CLASS statement and MAXDEC option in PROC Means

95 Chapter Summary Conditionally executing statement in DATA step LENGTH statement in DATA step Conditionally executing multiple statements in DATA step

96 Chapter Summary Creating a list report with PROC REPORT Use DEFINE statement in PROC REPORT


Download ppt "Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida."

Similar presentations


Ads by Google