1 EPIB 698C Lecture 4 Raul Cruz-Cano Summer 2012.

Slides:



Advertisements
Similar presentations
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Advertisements

Creating a Compact Columnar Output with PROC REPORT Walter R. Young Principal Clinical Programmer Analyst Wyeth.
SAS Programming: Working With Variables. Data Step Manipulations New variables should be created during a Data step Existing variables should be manipulated.
Quick Data Summaries in SAS Start by bringing in data –Use permanent data set for these examples Proc Tabulate –Produces summaries very quickly and easily.
15b. Accessing Data: Frequencies in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Introduction to SQL Session 1 Retrieving Data From a Single Table.
1 Computer Applications in Epidemiology Dongmei Li Lecture 26 5/6/2009.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Data Cleaning 101 Ron Cody, Ed.D Robert Wood Johnson Medical School Piscataway, NJ.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
Chapter 8 Producing Summary Reports. Section 8.1 Introduction to Summary Reports.
SAS PROC REPORT PROC TABULATE
Lecture 5 Sorting, Printing, and Summarizing Your Data.
Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency.
Chapter 3 Single-Table Queries
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
PROC REPORT organizes the output in many ways, from the simple to highly complex… PROC REPORT NOWINDOWS HEADLINE HEADSKIP; COLUMN variable-list; DEFINE.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
USING SAS PROCEDURES SAS System Options OPTIONS Statement
1 Data List Spreadsheets or simple databases - a different use of Spreadsheets Bent Thomsen.
Niraj J. Pandya, Element Technologies Inc., NJ.  Summarize all possible combinations of class level variables even if few categories are altogether missing.
EPIB 698D Lecture 2 Raul Cruz Spring SAS functions SAS has over 400 functions, with the following general form: Function-name (argument, argument,
EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1.
1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
SQL Chapter Two. Overview Basic Structure Verifying Statements Specifying Columns Specifying Rows.
Summer SAS Workshop Lecture 2. Summer Summer SAS Workshop Lecture 2 I’ve got Data…how do I get started? Libname Review How do you do arithmetic.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging.
Priya Ramaswami Janssen R&D US. Advantages of PROC REPORT -Very powerful -Perform lists, subsets, statistics, computations, formatting within one procedure.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
Lecture 3 Topic - Descriptive Procedures Programs 3-4 LSB 4:1-4.4; 4:9:4:11; 8:1-8:5; 5:1-5.2.
Lesson 4 - Topics Creating new variables in the data step SAS Functions.
Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.
Chapter 17: Formatting Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
1 Statistical Software Programming. STAT 6360 –Statistical Software Programming Sorting, Printing, Summarizing Data Now that we can input data and do.
Lesson 8 - Topics Creating SAS datasets from procedures Using ODS and data steps to make reports Using PROC RANK Programs in course notes LSB 4:11;5:3.
An Introduction Katherine Nicholas & Liqiong Fan.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
LISA SHORT COURSE SERIES: INTRODUCTION TO SAS UNIVERSITY William DeShong Fall 2015.
Use the SET statement to: –create an exact copy of a SAS dataset –modify an existing SAS dataset by creating new variables, subsetting (using a subsetting.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
Basics in R part 2. Variable types in R Common variable types: Numeric - numeric value: 3, 5.9, Logical - logical value: TRUE or FALSE (1 or 0)
1 Checking Data with the PRINT and FREQ Procedures.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Session 1 Retrieving Data From a Single Table
Applied Business Forecasting and Regression Analysis
Instructor: Raul Cruz-Cano
Lesson 8 - Topics Creating SAS datasets from procedures
Chapter 4: Sorting, Printing, Summarizing
Quick Data Summaries in SAS
Producing Descriptive Statistics
Introduction to SAS Essentials Mastering SAS for Data Analytics
Presentation transcript:

1 EPIB 698C Lecture 4 Raul Cruz-Cano Summer 2012

2 Sorting, Printing and Summarizing Your Data SAS Procedures (or PROC) perform specific analysis or function, produce results or reports Eg: Proc Print data =new; run; All procedures have required statements, and most have optional statements All procedures start with the key word “PROC”, followed by the name of the procedure, such as PRINT, or contents Options, if there are any, follow the procedure name Data=data_name options tells SAS which dataset to use as an input for this procedure. NOTE: if you skip it, SAS will use the most recently created dataset, which is not necessary the same as the mostly recently used data.

3 BY statement The BY statement is required for only one procedure, Proc sort PROC Sort data = new; By gender; Run; For all the other procedures, BY is an optional statement, and tells SAS to perform analysis for each level of the variable after the BY statement, instead of treating all subjects as one group Proc Print data =new; By gender; Run; All procedures, except Proc sort, assumes you data are already sorted by the variables in your BY statement

4 PROC Sort Syntax Proc Sort data =input_data_name out =out_data_name ; By variable-1 … variable-n; The variables in the by statement are called by variables. With one by variable, SAS sorts the data based on the values of that variable With more than one variable, SAS sorts observations by the first variable, then by the second variable within the categories of the first variable, and so on The DATA and OUT options specify the input and output data sets. Without the DATA option, SAS will use the most recently created data set. Without the OUT statement, SAS will replace the original data set with the newly sorted version

5 PROC Sort By default, SAS sorts data in ascending order, from the lowest to the highest value or from A to Z. To have the the ordered reversed, you can add the keyword DESCENDING before the variable you want to use the highest to the lowest order or Z to A order The NODUPKEY option tells SAS to eliminate any duplicate observations that have the same values for the BY variables

6 PROC Sort Example: The sealife.txt contains information on the average length in feet of selected whales and sharks. We want to sort the data by the family and length Name Family Length beluga whale 15 whale shark 40 basking shark 30 gray whale 50 mako shark 12 sperm whale 60 dwarf shark.5 whale shark 40 humpback. 50 blue whale 100 killer whale 30

7 PROC Sort Example: The sealife.txt contains information on the average length in feet of selected whales and sharks. We want to sort the data by the family and length Name Family Length beluga whale 15 whale shark 40 basking shark 30 gray whale 50 mako shark 12 sperm whale 60 dwarf shark.5 whale shark 40 humpback. 50 blue whale 100 killer whale 30

8 PROC Sort DATA marine; INFILE 'F:\SAS\lecture4\Sealife.txt'; INPUT Name $ Family $ Length; run; * Sort the data; PROC SORT DATA = marine OUT = seasort NODUPKEY; BY Family DESCENDING Length; run;

9 Title and Footnote statement Title and Footnote statements are global statements, and are not technically part of any step. You can put them anywhere in your program; but since they apply to the procedure output, it is usually make sense to put them with the procedure Syntax Title ‘This is a title for this procedure’ Footnote ‘This is the footnote for this procedure’; To cancel the current title or footnote, use the following null statement: Title; Footnote;

10 Label Statement The label statement can create descriptive labels, up to 256 characters long, for each variable Eg: Label Shipdate = ‘Date merchandise was shipped’; ID =‘Identification number of subject’; When a label statement is used in a data step, the labels become part of the data set; but when used in a PROC step, the labels stay in effect only for the duration of that step

11 PROC Format statement The PROC FORMAT procedure allows you to create your own formats. It is useful when you use coded data. The Proc format procedure creates formats what will later be associated with variables in a FORMAT statement Syntax of the PROC FORMAT: PROC FORMAT; Value name range-1 =‘formated-text-1’ range-2 =‘formated-text-2’ range-n =‘formated-text-n’; Name is the name of the format you are creating; if the format is for character data, the you need to use $name instead of name. In addition the name can not be the name of an existing format

12 PROC Format statement Each range is the value of the variable that is assigned to the text given in the quotation marks The text can be up to 32,767 characters long, but some procedures print only the first 8 to 16 characters The following are some examples of valid range specifications: ‘A’=‘Asian’; character values must be put in quotation marks 1,3,5,7,9=‘ODD’; with more than one value in the range, separate them with comma or hyphen (-); 5000-high=‘high price’; the key word high and low can be used in ranges to indicate the lowest and highest non-missing values for the variable

13 PROC Format statement Here is a survey about subject’s preference of car colors. The data contains subject’s age, sex (coded as 1 for male and 2 for female), annual income, and preferred car color (yellow, green, blue, and white). Here are the data: age sex income color Y G B Y W

14 PROC FORMAT; VALUE gender 1 = 'Male‘ 2 = 'Female'; VALUE agegroup 13 -< 20 = 'Teen' 20 -< 65 = 'Adult' 65 - HIGH = 'Senior'; VALUE $col 'W' = 'Moon White' 'B' = 'Sky Blue' 'Y' = 'Sunburst Yellow' 'G' = ‘Green'; PROC PRINT DATA = carsurvey; FORMAT Sex gender. Age agegroup. Color $col. Income DOLLAR8.; RUN;

15 Subsetting in procedures with a where statement The WHERE statement tells a procedure to use a subset of data It is an optional statement for any PROC step Unlike subsetting in the DATA step, using a WHERE statement in a procedure does not create a new data set The basic form is Where condition; (eg : where gender =‘female’;)

16 Subsetting in procedures with a where statement A data set contains information about well-known painters: Name StyleNation of origin Mary Cassatt Impressionism U Paul Cezanne Post-impressionism F Edgar Degas Impressionism F Paul Gauguin Post-impressionism F Claude Monet Impressionism F Pierre Auguste Renoir Impressionism F Vincent van Gogh Post-impressionism N Goal: we want a list of impressionist painters

17 DATA style; INFILE 'F:\SAS\lecture4\style.txt'; INPUT Name $ 1-21 style $ Origin $ 42; RUN; PROC PRINT DATA = style; WHERE style = 'Impressionism'; TITLE 'Major Impressionist Painters'; FOOTNOTE 'F = France N = Netherlands U = US'; RUN;

18 Summarizing you data with PROC MEANS The proc means procedure provide simple statistics on numeric variables. Syntax: Proc means options ; List of simple statistics can be produced by proc means: MAX: the maximum value MIN: the minimum value MEAN: the mean N : number of non-missing values STDDEV: the standard deviation NMISS: number of missing values RANGE: the range of the data SUM: the sum MEDIAN: the median DEFAULT

19 Proc means Options of Proc means:  By variable-list : perform analysis for each level of the variables in the list. Data needs to be sorted first  Class variable-list: perform analysis for each level of the variables in the list. Data do not need to be sorted  Var variable list: specifies which variables to use in the analysis

20 Proc means A wholesale nursery is selling garden flowers, they want to summarize their sales figures by month. The data is as follows: IDDate Lily SnapDragon Marigold /04/ /14/ /12/ /14/ /18/ /01/ /11/ /19/ /25/

21 DATA sales; INFILE 'C:\teaching\SAS\lecture4\Flowers.txt'; INPUT CustomerID SaleDate MMDDYY10. Lily SnapDragon Marigold; Month = MONTH(SaleDate); PROC SORT DATA = sales; BY Month; * Calculate means by Month for flower sales; PROC MEANS DATA = sales; BY Month; VAR Lily SnapDragon Marigold; TITLE 'Summary of Flower Sales by Month'; RUN;

22 OUTPUT statement We can use the OUTPUT statement to write summary statistics in a SAS data set Syntax OUTPUT out =data_name output-statistic-list; Eg: Proc means data =new; Var age BMI; Output out = new1 mean (age BMI)=mean_age mean_BMI; Run; In the output data set new1, we have two means for age and BMI respectively. The variable names are mean_age mean_BMI respectively.

23 Proc means A wholesale nursery is selling garden flowers, they want to summarize their sales figures by month. The data is as follows: IDDate Lily SnapDragon Marigold /04/ /14/ /12/ /14/ /18/ /01/ /11/ /19/ /25/

24 PROC MEANS DATA = sales; BY Month; VAR Petunia SnapDragon Marigold; output out=new1 mean(Lily SnapDragon Marigold)=mean_lily mean_SnapDragon mean_Marigold sum (lily SnapDragon Marigold)=sum_lily sum_SnapDragon sum_Marigold; TITLE 'Summary of Flower Sales by Month'; RUN;

25 OUTPUT statement The SAS data set created by the output statement will contain all the variables defined in the output statistic list; any variables in a BY or CLASS statement, plus two new variables: _TYPE_ and _FREQ_ Without BY or CLASS statement, the data will have just one observation If there is a BY statement, the data will have one observation for each level of the BY group CLASS statements produce one observation for each level of interaction of the class variables The value _TYPE_depends on the level of interactions of the CLASS statement. _TYPE_= 0 is the grand total

26 Proc Freq PROC FREQ can be used to count frequencies of both character and numeric variables When you have counts for one variable, it is called one-way frequencies When you have two or more variables, the counts are called two-way, three-way or so on up to n-way frequencies; or simply cross-tabulations Syntax: Proc freq ; Table(s) variable-combinations; To produce one-ways frequencies, just put variable name after “TABLES”; To produced cross-tabulations, put an asterisk (*) between the variables

27 Proc Freq The blood.txt data contain information of 1000 subjects. The variables include: subject ID, gender, blood_type, age group, red blood cell count, white blood cell count, and cholesterol. Here is the data with first few subjects: 1 Female AB Young Male AB Old Male A Young Male B Old Male A Young We want to derive frequencies of gender, age group, and blood type.

28 Proc Freq proc freq data=blood; tables Gender Blood_Type; tables Gender * blood_Type/chisq ; tables Gender * Age_Group * Blood_Type / nocol norow nopercent; run;

29 PROC FREQ options Nocol: Suppress the column percentage for each cell Norow: Suppress the row percentage for each cell Nopercent: Suppress the percentages in crosstabulation tables, or percentages and cumulative percentages in one-way frequency tables and in list format

30 PROC FREQ options Missprint: Display missing value frequencies Missing: Treat missing values as nonmissing

31 PROC FREQ output creates an output data set with frequencies, percentages, and expected cell frequencies Out=: Specify an output data set to contain variable values and frequency counts