Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida.

Slides:



Advertisements
Similar presentations
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Advertisements

Outline Proc Report Tricks Kelley Weston. Outline Examples 1.Text that spans columnsText that spans columns 2.Patient-level detail in the titlesPatient-level.
Creating a Compact Columnar Output with PROC REPORT Walter R. Young Principal Clinical Programmer Analyst Wyeth.
INTRODUCTION TO STATA Võ Tuấn Khoa Trần Thế Trung.
CSC110 Fall Chapter 5: Decision Visual Basic.NET.
2 Copyright © 2004, Oracle. All rights reserved. Restricting and Sorting Data.
Introduction to SQL Session 1 Retrieving Data From a Single Table.
IS 1181 IS 118 Introduction to Development Tools VB Chapter 03.
WRITING BASIC SQL SELECT STATEMENTS Lecture 7 1. Outlines  SQL SELECT statement  Capabilities of SELECT statements  Basic SELECT statement  Selecting.
Creating SAS® Data Sets
Ceng 356-Lab2. Objectives After completing this lesson, you should be able to do the following: Limit the rows that are retrieved by a query Sort the.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
1 Copyright © Oracle Corporation, All rights reserved. Writing Basic SQL SELECT Statements.
Chapter 5 new The Do…Loop Statement
Spreadsheets Objective 6.02
11 Chapter 2: Working with Data in a Project 2.1 Introduction to Tabular Data 2.2 Accessing Local Data 2.3 Importing Text Files 2.4 Editing Tables in the.
SAS SQL SAS Seminar Series
Chapter 8 Producing Summary Reports. Section 8.1 Introduction to Summary Reports.
SAS PROC REPORT PROC TABULATE
Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
October 2003Bent Thomsen - FIT 3-21 IT – som værktøj Bent Thomsen Institut for Datalogi Aalborg Universitet.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
1 Data List Spreadsheets or simple databases - a different use of Spreadsheets Bent Thomsen.
©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina Chapter 17 supplement: Review of Formatting Data STAT 541.
1 Experimental Statistics - week 2 Review: 2-sample t-tests paired t-tests Thursday: Meet in 15 Clements!! Bring Cody and Smith book.
© Copyright 1992–2005 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. Tutorial 5 – Dental Payment Application: Introducing.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1.
SQL (DDL & DML Commands)
SQL Chapter Two. Overview Basic Structure Verifying Statements Specifying Columns Specifying Rows.
 Pearson Education, Inc. All rights reserved Introduction to Java Applications.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
Chapter 12: String Manipulation Introduction to Programming with C++ Fourth Edition.
Priya Ramaswami Janssen R&D US. Advantages of PROC REPORT -Very powerful -Perform lists, subsets, statistics, computations, formatting within one procedure.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
Copyright © 2004, Oracle. All rights reserved. Retrieving Data Using the SQL SELECT Statement Satrio Agung Wicaksono, S.Kom., M.Kom.
Copyright © 2004, Oracle. All rights reserved. Lecture 4: 1-Retrieving Data Using the SQL SELECT Statement 2-Restricting and Sorting Data Lecture 4: 1-Retrieving.
Queries SELECT [DISTINCT] FROM ( { }| ),... [WHERE ] [GROUP BY [HAVING ]] [ORDER BY [ ],...]
2 Copyright © 2004, Oracle. All rights reserved. Restricting and Sorting Data.
DATA RETRIEVAL WITH SQL Goal: To issue a database query using the SELECT command.
Chapter 9 Cleansing and Augmenting the Data Xiaogang Su Department of Statistics University of Central Florida.
Performing Advanced Queries Using PROC SQL Chapter 2 1.
Chapter 17: Formatting Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Working with Data Lists.
Chapter 14 Formatting Readable Output. Chapter Objectives  Add a column heading with a line break to a report  Format the appearance of numeric data.
Chapter 6 Looping Structures. Do…LoopDo…Loop Statement Can operate statements repetitively Do intx=intx + 1 Loop While intx < 10 –The Loop While operates.
SAS for Data Management and Analysis
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
1 Chapter 3: Getting Started with Tasks 3.1 Introduction to Task Dialogs 3.2 Creating a Listing Report 3.3 Creating a Frequency Report 3.4 Creating a Two-Way.
1 Copyright © Oracle Corporation, All rights reserved. Writing Basic SQL SELECT Statements.
2 Copyright © 2009, Oracle. All rights reserved. Restricting and Sorting Data.
CHAPTER 2 PROBLEM SOLVING USING C++ 1 C++ Programming PEG200/Saidatul Rahah.
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
Chapter 4: Creating List Reports
Chapter 14: Combining Data Vertically 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
An Introduction to Programming with C++ Sixth Edition Chapter 5 The Selection Structure.
1 Checking Data with the PRINT and FREQ Procedures.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
Restricting and Sorting Data
Writing Basic SQL SELECT Statements
The Selection Structure
Variables In programming, we often need to have places to store data. These receptacles are called variables. They are called that because they can change.
Introduction to C++ Programming
Restricting and Sorting Data
Restricting and Sorting Data
Restricting and Sorting Data
Presentation transcript:

Chapter 6 Concatenating SAS Data Sets and Creating Summary Reports Xiaogang Su Department of Statistics University of Central Florida

Section 6.1 Introduction For P&E Web Site, University of Central Florida needs to combine three data sets to create a detailed three year report report entrance age and SAT score by semester and year subset the data to create a report for the fall semester.

Scenario Creating new variables Eliminating variables Writing detail report Writing summary report Subsetting report

Section 6.2 Concatenating SAS Data Set Objectives Define concatenation. Use the SET statement in a DATA step to concatenate two SAS data sets. Use the INT function, the SUBSTR function and the SUM function in an assignment statement to create new variables. Use the KEEP= option (or DROP = option) to write only selected variables to the output data set. Use the REPORT procedure to create a detail report.

Scenario For its academic year report, University of Central Florida must include the enrolment data for all three semester.

Scenario It will be necessary to combine the two SAS data sets create new variables based on calculated values eliminate unwanted variables from the newly created data set create a presentation-quality list report.

Scenario Sales and Passenger Data for 1999 Sales Total Total Date Flight Passengers Revenue _____________________________________________________ 01JAN1999 IA $123, JAN1999 IA $117, JAN1999 IA $127, JAN1999 IA $140, JAN1999 IA $141, JAN1999 IA $15, JAN1999 IA $15, JAN1999 IA $14, JAN1999 IA $16, JAN1999 IA $15,570.00

Concatenating SAS Data Sets A concatenation  combines two or more SAS data sets one after the other, into a single SAS data set  uses the SET statement  is one of several methods for combining SAS data sets.

Concatenating SAS Data Sets Use the SET statement in a DATA step to concatenate SAS data sets. General form of a DATA step concatenation: DATA SAS-data-set; SET SAS-data-set-1 SAS-data-set-2; RUN;

Concatenating SAS Data Sets Example program to concatenate several SAS data sets: data sta4102.ch6data; set sta4102.summer2000 sta4102.fall2000 sta4102.SPRING2001; More SAS Statements; run;

Concatenating Two SAS Data Sets Concatenate STA4102.summer2000 and STA4102.fall2000 : SummerFall STA4102.summer2000STA4102.fall2000 STA4102.ch6data Summer Fall

Variables in STA4102.CH6DATA -----Alphabetic List of Variables and Attributes----- # Variable Type Len Pos Format Informat Label ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 3 ACT_SCORE Char 2 7 $2. $2. ACT_SCORE 4 BIRTH_YEAR Char 4 9 $4. $4. BIRTH_YEAR 5 CLASSIF_LEVEL Char 1 13 $1. $1. CLASSIF_LEVEL 6 COUNTY_RESIDENCY Char 4 14 $4. $4. COUNTY_RESIDENCY 7 ETHNIC_ORIGIN Char 1 18 $1. $1. ETHNIC_ORIGIN 8 EXCEPTION_FLAG Char 1 19 $1. $1. EXCEPTION_FLAG 9 FINAL_ADM_ACTION Char 1 20 $1. $1. FINAL_ADM_ACTION 10 QUANT_SCORE Char 3 21 $3. $3. QUANT_SCORE 11 REG_CODE Char 1 24 $1. $1. REG_CODE 12 SEX Char 1 25 $1. $1. SEX 2 STU_TYPE_APPL Char 1 6 $1. $1. STU_TYPE_APPL 1 TERM_CYM Char 6 0 $6. $6. TERM_CYM 13 TEST_INDICATOR Char 1 26 $1. $1. TEST_INDICATOR 14 UCF_COLLEGE Char 2 27 $2. $2. UCF_COLLEGE 15 UCF_ENTRY_CLASIF Char 2 29 $2. $2. UCF_ENTRY_CLASIF 16 UCF_FULL_PART_TIM Char 1 31 $1. $1. UCF_FULL_PART_TIM 17 UCF_MAJOR Char 6 32 $6. $6. UCF_MAJOR 18 VERBAL_OTHER_SCR Char 3 38 $3. $3. VERBAL_OTHER_SCR

Use the SET statement in a DATA step to perform the concatenation. Name the new data set STA4102.ch6data. data sta4102.ch6data; set sta4102.summer2000 sta4102.fall2000 sta4102.SPRING2001; More SAS Statements; run; Concatenating Three SAS Data Sets

Using the SUBSTR Function in an Assignment Statement Use the SUBSTR function in an assignment statement to return a value for the year. Assign those values to the new variable YEAR. data sta4102.ch6data; set sta4102.summer2000 sta4102.fall2000 sta4102.SPRING2001; year=substr(term_cym,1,4); More SAS Statements; run;

Use the SUM function in an assignment statement to sum the values of the Verbal and the Quantitative portion to create the new variable SAT. data sta4102.ch6data; set sta4102.summer2000 sta4102.fall2000 sta4102.SPRING2001; sat=sum(verbal,quant); More SAS Statements; run; Using the SUM Function in an Assignment Statement

data sta4102.ch6data; set sta4102.summer2000 sta4102.fall2000 sta4102.SPRING2001; verbal=int(substr(verbal_other_scr,1,3)); if verbal=999 then verbal=.; More SAS Statements; run; Using the INT and SUBSTR Function in an Assignment Statement

Eliminating Variables with the DROP ststement Use the DROP statement to drop the unwanted variables. drop a b term_cym stu_type_Appl act_Score Birth_year Classif_level County_Residency ethnic_origin Exception_Flag Final_adm_action quant_score reg_code sex test_indicator ucf_college verbal_other_scr ucf_entry_clasif ucf_major ucf_full_part_tim;

Adding Options to Enhance Report Appearance Selected PROC REPORT options: HEADLINEunderlines all column headers and the spaces between them. HEADSKIPwrites a blank line beneath all column headers.

Write a PROC REPORT step. Use the HEADLINE and HEADSKIP options to underline the column headers and insert a blank line COLUMN statement to specify which variables to include. proc report data=sta4102.ch6data nowd headline headskip; column year semester gender sat ; More SAS Statements; run; Writing a PROC REPORT Step

Use DEFINE statements to define the variables as display variables. Add column headers and specify column width. Add formats and specify alignment. Add titles. proc report data=sta4102.ch6data nowd headline headskip; column year semester gender sat; define year / display center 'Year'; define semester / display center 'Semester'; define gender / display center 'Gender' ; define sat / display center 'Average SAT Score'; title 'Academic Applicants'; run;

Concatenating Two SAS Data Sets and Reporting the Results File: pg1-ch6-ex02.sas Concatenate two SAS data sets and use SAS functions in assignment statements to create new variables. Use the DROP statement to eliminate unwanted variables. Use a PROC REPORT step to create a detail report.

Section 6.3 Creating a Summary Report with Report Procedure Objectives Explain the GROUP option used in the DEFINE statement in a PROC REPORT step. Explain the MEAN option used in the DEFINE statement in a PROC REPORT step. Use the MEAN and GROUP options and other PROC REPORT features to create a summary report of presentation quality.

Scenario Some faculty claim that the SAT score is lower for the applicants in Summer semester.

The Goal Report Academic Applicants Average Year Semester Gender SAT Score ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 2000 Fall Female Male Not Report Summer Female Male Not Report Spring Female Male Not Report

Defining Group Variables Use the REPORT procedure to create a summary report by defining variables as group variables. All observations whose group variables have the same values are collapsed into a single observation in the report.

Defining Group Variables Semester as Group Variable Before Grouping Semester Avg. SAT Fall1080 Fall 1260 Summer 1210 Summer 850 After Grouping Semester Avg. SAT Fall Spring Summer

Defining Group Variables You can define more than one variable as a group variable, but all group variables must precede other types of variables. Nesting is determined by the order of the variables in the COLUMN statement.

Selecting the Variables Use the COLUMN statement in a PROC REPORT step to specify the variables to be included in the report. proc report data=sta4102.ch6data nowd headline headskip; column semester sat; other SAS statements run;

Defining the Variable Attributes Use the DEFINE statement to define Semester as a group variable and assign it a column header. proc report data=sta4102.ch6data nowd headline headskip; column semester sat; define semester / group order=internal 'Semester‘; other SAS statements run;

Use the DEFINE statement to define SAT as mean variables and assign their attributes. proc report data=sta4102.ch6data nowd headline headskip; column semester sat; define semester / group order=internal 'Semester‘; define sat / 'Average SAT Score' width=10 format=8.2 mean; other SAS statements run; Defining the Variable Attributes

Use the HEADLINE option to underline the column headers and the HEADSKIP option to add a blank line between the column headers. Add titles. Controlling Report Appearance

Creating a Summary Report File: pg1-ch6-ex03.sas This demonstration illustrates how to create a summary report.

Section 6.4 Subsetting Observations from a SAS Data Set Objectives Subset data using the WHERE statement with comparison and logical operators. Define special operators and explain how they are used in the WHERE statement. Define SAS date constants. Define the YEARCUTOFF= option. Use a WHERE statement with a special operator to subset a SAS table.

Scenario University wants to create a separate report for accepted students and enrolled students.

Scenario Holiday Sales Report Sales Total Total Date Flight Seats Passengers Revenue ___________________________________________________________ 24NOV1999 IA $143, NOV1999 IA $143, NOV1999 IA $133, NOV1999 IA $145, NOV1999 IA $142, NOV1999 IA $133, NOV1999 IA $142, DEC1999 IA $130, DEC1999 IA $156, DEC1999 IA $146, DEC1999 IA $136,143.00

Subsetting Your Data with the WHERE Statement The WHERE statement enables you to select observations that meet a certain condition before SAS brings the observation into the PROC REPORT step. proc report data=sta4102.ch6data nowd headline headskip; where action in ('A', 'P', 'X'); run;

Subsetting Data with the WHERE Statement General syntax of the WHERE statement: where-expression is a sequence of operands and operators. Operands include variables functions constants. WHERE where-expression;

Subsetting Your Data with the WHERE Statement The WHERE statement can be used with comparison operators logical operators. You can also use the WHERE statement with special operators.

Comparison Operators MnemonicSymbolDefinition

Comparison Operators Examples: where Salary>25000; where EmpID='0082'; where Salary=.; where LastName=' ';

Comparison Operator You can use the IN comparison operator to see if a variable is equal to one of the values in a list. Example: where action in ('A', 'P', 'X');  Character comparisons are case sensitive.

Logical Operators Logical operators include ANDif both expressions are true, then the compound expression is true ORif either expression is true, then the compound expression is true NOTcan be combined with other operators to reverse the logic of a comparison.

Logical Operators Examples: where JobCode='FLTAT3' and Salary>50000; where JobCode='PILOT1' or JobCode='PILOT2' or JobCode='PILOT3';

Special Operators The following are special operators : LIKE selects observations by comparing character values to specified patterns. A percent sign (%) replaces any number of characters and an underscore (_) replaces one character. where Code like 'E_U%'; (E, a single character, U, followed by any characters.) continued...

Special Operators The sounds-like (=*) operator selects observations that contain a spelling variation of the word or words specified. where Name=*'SMITH'; (Selects the names Smythe, Smitt, and so on.) CONTAINS or ? selects observations that include the specified substring. where Word ? 'LAM'; (BLAME, LAMENT, and BEDLAM are selected.) continued...

IS NULL or IS MISSING selects observations in which the value of the variable is missing. where Flight is missing; BETWEEN-AND selects observations in which the value of the variable falls within a range of values. where Date between '01mar1999'd and '01apr1999'd; Special Operators

Using SAS Date Constants The constant ' ddMMMyyyy ' d creates a SAS date value from the dates enclosed in quotes where dd is a one- or two-digit value for the day MMMis a three-letter abbreviation for the month (JAN, FEB, MAR, and so on) yyyyis a two- or four-digit value for the year.

Using the YEARCUTOFF= Option If you use a two-digit value for year, SAS will use the value of the YEARCUTOFF= system option to determine the complete year value. YEARCUTOFF= specifies the first year of a 100-year span used to determine the century of a two-digit year. options yearcutoff=1930; …1980… … year span

Subsetting a Report Use a WHERE statement with the special operator BETWEEN-AND to create a subset based on dates. proc report data=sta4102.ch6data nowd headline headskip; column semester sat; where action in ('A', 'P', 'X') and apptype = '1'; define semester / group order=internal 'Semester' width=8 format=$semfmt. style(header) = {just=center} style(column) = {just=left}; define sat / 'Average SAT Score' width=10 format=8.2 mean style(header) = {just=center} style(column) = {just=right}; title 'Accepted FTIC Students SAT Summary Report'; title2 '( )'; title3 'Gropu by Semester'; run;

WHERE or Subsetting IF?

Subsetting Report Data with a WHERE Statement File: pg1-ch6-ex04.sas This demonstration illustrates how to use special operators in a WHERE statement to subset the data based on a range of dates.