LISA SHORT COURSE SERIES: INTRODUCTION TO SAS UNIVERSITY William DeShong Fall 2015.

Slides:



Advertisements
Similar presentations
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Advertisements

Chapter 3: Editing and Debugging SAS Programs. Some useful tips of using Program Editor Add line number: In the Command Box, type num, enter. Save SAS.
Introduction to SAS Programming Christina L. Ughrin Statistical Software Consulting Some notes pulled from SAS Programming I: Essentials Training.
Statistics in Science  Introducing SAS ® software Acknowlegements to David Williams Caroline Brophy.
By Hrishikesh Gadre Session II Department of Mechanical Engineering Louisiana State University Engineering Equation Solver Tutorials.
1 An Introduction to IBM SPSS PSY450 Experimental Psychology Dr. Dwight Hennessy.
Introduction to SAS ISYS 650. What Is SAS? SAS is a collection of modules that are used to process and analyze data. It began in the late ’60s and early.
Introduction to Structured Query Language (SQL)
Introduction to SQL Session 1 Retrieving Data From a Single Table.
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Creating SAS® Data Sets
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
Advanced File Processing
Lecture 5 Sorting, Printing, and Summarizing Your Data.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Lesson 1: Introduction to ABAP OBJECTS Todd A. Boyle, Ph.D. St. Francis Xavier University.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
IPC144 Introduction to Programming Using C Week 1 – Lesson 2
HPR Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
Key Data Management Tasks in Stata
1 Experimental Statistics - week 2 Review: 2-sample t-tests paired t-tests Thursday: Meet in 15 Clements!! Bring Cody and Smith book.
Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the.
Introduction to Java Applications Part II. In this chapter you will learn:  Different data types( Primitive data types).  How to declare variables?
Knowing Understanding the Basics Writing your own code SAS Lab.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
Chapter 1: Introduction to SAS  SAS programs: A sequence of statements in a particular order  Rules for SAS statements: –Every SAS statement ends in.
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
30/10/ Iteration Loops Do While (condition is true) … Loop.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
What is PHP? PHP stands for PHP: Hypertext Preprocessor PHP is a server-side scripting language, like ASP PHP scripts are executed on the server PHP supports.
1 EPIB 698E Lecture 1 Notes Instructor: Raul Cruz 7/9/13.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
Summer SAS Workshop Lecture 3. Summer SAS Workshop Website
14b. Accessing Data Files in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Chapter 21: Controlling Data Storage Space 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
IENG-385 Statistical Methods for Engineers SPSS (Statistical package for social science) LAB # 1 (An Introduction to SPSS)
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Session 1 Retrieving Data From a Single Table
User-Written Functions
Chapter 6 JavaScript: Introduction to Scripting
Chapter 6: Modifying and Combining Data Sets
Introduction to SPSS.
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
DEPARTMENT OF COMPUTER SCIENCE
ECONOMETRICS ii – spring 2018
Chapter 1: Introduction to SAS
Instructor: Raul Cruz-Cano
Guide To UNIX Using Linux Third Edition
Presentation transcript:

LISA SHORT COURSE SERIES: INTRODUCTION TO SAS UNIVERSITY William DeShong Fall 2015

Upcoming LISA Short Courses

Outline 1. SAS Overview 2. SAS University Environment 3. Data Step 1. Importing Data Sets 2. Merging Data Sets 4. Procedure Step 1. Manipulate/View Data 1. Proc Print 2. Proc Sort 2. Aggregate Data 1. Proc Summary 2. Proc Freq 3. Proc Means 3. Model Data 1. Proc Reg (If time permits)

SAS SAS (an acronym for Statistical Analysis System) is a data-driven programming language that provides information from data. The functionality of SAS is built around four data-driven tasks. Data Access Addresses or locates the data required by the programmer. Data Management Shapes the data into a form required by the programmer. Data Analysis Summarizes, reduces, or transforms raw data into meaningful and useful information. Data Presentation Communicates information in ways that clearly demonstrate its significance.

SAS Program A SAS program (also called "SAS code") is a series of statements (or "steps") for SAS to execute. There are three types of SAS statements: DATA statements PROC statements global statements All DATA statements end with a RUN command. All PROC statements end with either: RUN command (for almost all statements) QUIT command (for very, very few statements)

Flow of Programming A DATA statement can be used to (1) create a SAS dataset from scratch, (2) create a SAS dataset from a raw dataset, (3) check for and correct errors in a dataset, and (4) create a SAS dataset by merging, subsetting, and updating existing SAS datasets. Raw Dataset DATA Statement SAS Dataset PROC Statement Report Built-In SAS Dataset(s)

SAS Pointers When programming in SAS, keep in mind the following pointers to prevent syntax errors: Semicolon Check: Every line of code (with exception to formats and labels) end with a semicolon ( ; ). One missing semicolon can destroy an entire SAS program. Use Comments: You can make one-line comments by placing an asterisk ( * ) in the front of your comment. For a multi-line comment, start with ( /* ) on the first line and end with ( */ ) on the last line. Commented lines of code are ignored by the SAS processor. Comments are used to help the programmer remember parts of the SAS code.

SAS University Edition Environment Let’s take a look at SAS University now!

Data Step

Importing Datasets Lets use the Data Importing Wizard!

Accessing Permanent SAS Datasets To access existing SAS datasets, use the following code: The name_of_library is a name that you choose to represent the name of the folder to store the SAS datasets in or access the existing SAS datasets. The location_of_file represents the location where SAS should go to find or save permanent SAS datasets. libname name_of_library ‘ location_of_file ’; run ;

Accessing Permanent SAS Datasets Note that in giving the location, you are not mentioning which particular SAS dataset that you want to use. Rather, you locate the folder or extension (if there is no folder) where the SAS dataset(s) are located. Most SAS programmers put all of their SAS programs in one folder so that they can access them all at one time. libname name_of_library ‘ location_of_file ’; run ;

Accessing Permanent SAS Datasets The name_of_library is limited to 1 to 8 characters long, can only begin with a letter or underscore, and contains only letters, numbers, or underscores. Legal vs. Illegal Names of Libraries clinic1 1_clinic _%clinic _clinic1 _1clinic clinic_1 1clinic_1 libname name_of_library ‘ location_of_file ’; run ; How many of the following seven library names are legal library names? 4

Descriptive Statistics Functions Below are a few of the descriptive statistics functions. Most of these descriptive statistics can be found using PROC MEANS or PROC UNIVARIATE. FunctionsSyntaxCalculates SUMsum(argument, argument, …) ;sum of values MEANmean(argument, argument, …) ;average of nonmissing values MINmin(argument, argument, …) ;minimum value MAXmax(argument, argument, …) ;maximum value VARvar(argument, argument, …) ;variance of the values STDstd(argument, argument, …) ;standard deviation

Date and Time Functions FunctionsSyntaxCalculates TODAYtoday( ) ; gives today's SAS date value, requires no arguments TIMEtime( ) ; gives current time, requires no arguments MDYmdy(month_val, day_val, year_val) ; gives back the numeric SAS date value DAYday(date_val) ; gives back the day date of the SAS date value (1-31) QTRqtr(date_val) ; gives back the quarter of the year of the SAS value (1-4) WEEKDAYweekday(date_val) ; gives back the numeric day of the SAS date value (1-7) MONTHmonth(date_val) ; gives back the month of the SAS date value (1-12) YEARyear(date_val) ; gives back the year of the SAS date value (4 digits)

Date and Time Functions Here are some interesting ones, however. FunctionsSyntaxCalculates INTCK intck('day', SASdate1, SASdate2) ; intck('week', SASdate1, SASdate2) ; intck('month, SASdate1, SASdate2) ; intck('qtr', SASdate1, SASdate2) ; intck('year', SASdate1, SASdate2) ; provides the difference in the number of {days, weeks, months, quarters, years} between two SAS date values. INTNX intnx('interval', SAS_start_date, increment, alignment_character) ; a SAS_end_date which is a multiple of the time interval added to SAS_start_date alignment_characters ' b ' = 1st of the month ' m ' = 15th of the month ' e ' = 30th/31st of month ' s ' = same day of SAS_start_date

Mathematical Functions Below are a few of the billions of mathematical functions. There is no way to list them all. You learn them as you learn how to program. FunctionsSyntaxCalculates ROUNDround( argument, d ) ; rounds to nearest d where d =10 (tens) d = 1 (integer) d =.1 (tenths) d =.01 (hundredth) LOGlog(argument) ;take the natural log LOG10log10(argument) ;takes the log base 10 FLOORfloor(argument) ;rounds down to nearest integer CEILceil(argument) ;rounds up to nearest integer INTint(argument) ;returns integer part of value only

Character Functions FunctionsSyntaxCalculates SCANscan(argument, n, delimiters) ; returns a specified word from a character word SUBSTRsubstr(argument, n, delimiters) ; extracts a substring replaces character values TRIMtrim(argument) ;trims trailing blanks INDEXindex(source, excerpt) ; searches a character value for a specific string UPCASEupcase(argument) ;converts to uppercase letters LOWCASElowcase(argument) ;converts to lowercase letters PROPCASEpropcase(argument) ;uppercase first character value TRANWRDtranwrd(source, target, replace) ; replaces or removes all occurrences of a pattern of characters

PROC SORT Statement The purpose of PROC SORT is to reorganize a SAS dataset by a subset of its variables. The PROC SORT statement can sort: by one variable or more than one variable in ascending order or descending order remove duplicates while sorting (not by default, you must specify it) proc sort data = libref.datasetname ; by var1 var2 … vark ; run ;

PROC SORT Statement The purpose of PROC SORT is to reorganize a SAS dataset by a subset of its variables. If you specify an out statement, SAS will sort the original SAS dataset (dataset1) and put it in the SAS dataset (dataset2). If you do not use the out statement, SAS will sort dataset1 and store it into dataset1. Thus, it overwrites the dataset and you lose the original order. proc sort data = libref2.dataset1 out = libref2.dataset2 ; by var1 var2 … vark ; run ;

Merging Data sets with Match-Merging With simple match-merging, the SAS programmer is trying to link observations together using the values in the variables listed in the BY statement. proc sort data = SAS-dataset-1 ; by variable_1 variable_2 … variable_n ; SAS statements ; run ; " " proc sort data = SAS-dataset-k ; by variable_1 variable_2 … variable_n ; SAS statements ; run ; data newSASdataset ; merge SAS-dataset-1 SAS-dataset-2 SAS-dataset-k ; by variable_1 variable_2 … variable_n ; SAS statements ; run ;

Match-Merging It is required that all of the original SAS datasets being merged are sorted by the variables in the BY statement first to perform this technique. proc sort data = SAS-dataset-1 ; by variable_1 variable_2 … variable_n ; SAS statements ; run ; " " proc sort data = SAS-dataset-k ; by variable_1 variable_2 … variable_n ; SAS statements ; run ; data newSASdataset ; merge SAS-dataset-1 SAS-dataset-2 SAS-dataset-k ; by variable_1 variable_2 … variable_n ; SAS statements ; run ;

Example #1 Flight attendants for International Airlines are need to pass three exams (federal regulations, customer service, and safety procedures) in order to become certified flight attendants. They can take them at any time, but they must pass the federal regulations exam first before moving on. Below are three permanent SAS datasets showing the attempts by id number and their scores. A score higher than 6 is needed to pass.

Match-Merging [Step 1: Use a PROC SORT] The PROC SORT steps will sort the three SAS datasets by the idnum variable. This will set us up to begin the simple match-merging procedure.

Match-Merging [Step 2: The DATA Statement] The DATA step will link the observations together by the idnum variable. But how does SAS accomplish this?

Match-Merging [Step 3: The Merging] From all three SAS datasets, SAS searches for the first set of observations with the lowest value for idnum. In this case, it is the missing value in the third dataset. Why? Notice, however, that there are no observations in the other SAS datasets with an idnum also equal to blank. If an input SAS dataset does not have a matching BY value, then the observation in the output SAS dataset contains missing values for the variables that are unique to that input dataset.

Match-Merging [Step 3: The Merging] SAS now searches for the next lowest value for the idnum variable. Here, the value appears in only two of the three SAS datasets. Again, SAS will put missings in for the fr_score variable.

Match-Merging [Step 3: The Merging] The next idnum value is Fortunately, it appears in all three once. SAS simply links them together. So when BY variable value appears the same number of times in all of the SAS dataset, SAS has no problem at all linking them together by order.

Match-Merging [Step 3: The Merging] Similar to the last idnum value, SAS is going to do the same for the value of Since there is an equal number of observations in all three SAS datasets, SAS is going to link them together by the order in which they appear. The first observations in each dataset will link together and the second observations will link together.

Match-Merging [Step 3: The Merging] Now look at this situation. Not only are we missing an observation in the third SAS dataset, but there is an uneven number of observations in the first two. SAS only knows how to match if there are the same number of observations in the SAS datasets that share the same BY variable values.

Match-Merging [Step 3: The Merging] SAS then links by the order in which they appear in the background. This is what actually really happens for SAS datasets without a BY variable value observation. Please note that the replicated observations do not appear in the input SAS datasets

Match-Merging [Step 3: The Merging] This is similar to the last example (but with more observations). Questions How many observations will SAS create for idnum 4524? What observations will be replicated to perform the match-merging? How will SAS link these records together?

Match-Merging [Step 3: The Merging] I think you get the point now, right? And so you should know what appears next in the SAS dataset. There will be two observations for idnum 5702…

Match-Merging [Step 3: The Merging] There will be three observations for idnum 6256…

Match-Merging [Step 3: The Merging] There will be one observation for 7803…

Match-Merging [Step 3: The Merging] There will be two observations for idnum 8008…

Match-Merging [Step 3: The Merging] And finally, there will be four observations for idnum

Voila! … the SAS dataset is complete.

Common Variable (Simple Match-Merging) Keep in mind that all four common variable rules apply for the simple match-merging process. The common variable must have the same variable type (i.e. numeric or character) in each of its SAS original datasets. Otherwise, SAS will return an error message. The values from the last original SAS dataset overwrite the previous values stored for that variable. If a common variable has different formats, SAS will use the first format it sees for that variable. If a common variable has different lengths, SAS will use the first length it sees for that variable. It is this common variable rule that we are going to investigate more right now. The last thing that we want to do is overwrite data.

The PROC PRINT Statement The PROC PRINT statement is the most popularly used procedure in SAS. This statement lets you output a SAS dataset (or a subset of it) in the output window. The most basic format of the PROC PRINT statement is the following: In this format, SAS will print all of the variables in the SAS dataset into the output window unformatted. Of course, there are ways to enhance the output (which we will cover some now). proc print data = libref.datasetname ; run ;

PROC PRINT: Options If you want SAS to print specific variables, you can adjust the code by including a var statement. You can also produce column totals for numeric variables by using a sum statement. proc print data = libref.datasetname ; var variable1 variable2 … variablek ; run ; proc print data = libref.datasetname ; sum num_variable ; run ;

PROC PRINT: Options (cont.) You can also specify not to provide the observation number by including the noobs statement in the code. Rather, if you have a variable that represents the identity of each observation, you can use the id statement to replace the default observation number. proc print data = libref.datasetname ; id variable1 ; run ; proc print data = libref.datasetname noobs; run ;

PROC PRINT: Options (cont.) Rather than use variable name, you can substitute a label for the variable by including a label statement. But notice where you have to mention it in the code. You can also specify to print a subset of observations from the SAS dataset based on a condition or a set of conditions using a where statement in the code. proc print data = libref.datasetname label ; label variable1 = ‘Variable 1’ ; run ; proc print data = libref.datasetname ; where insert_condition_here ; run ;

PROC CONTENTS Statement The purpose of PROC CONTENTS is to provide a detailed listing of: the variables listed in a SAS dataset the SAS datasets located in a SAS folder The ‘ _all_ ‘ is a SAS keyword to reference all of the SAS datasets in a SAS library. proc contents data = libref.datasetname ; run ; proc contents data = libref._all_ ; run ;

PROC FREQ Statement Now, we turn our attention to procedures that will help produce results in the output window. The purpose of PROC FREQ is to create a frequency or relative frequency table over a subset of SAS variables. The code to do this is the following: The PROC FREQ statement can not only create a table by one or more variables, but it can also save the results as a SAS dataset. proc freq data = libref.datasetname ; tables var1 var2 … vark ; run ;

PROC MEANS Let's start from the basics. The basic form of the PROC MEANS is the following: This basic form: produces statistical output for all of the numeric variables in the SAS dataset produces the sample size, mean, standard deviation, minimum, and maximum values by default We will use our baseball SAS dataset to understand how this procedure works. proc means data = libref.datasetname ; run ;

Scenario Here is a SAS dataset called baseball. It is located in the ' ia ' library.

Scenario Here is the breakdown of the variables.

PROC MEANS Here is the application of the PROC MEANS without any options: Again, without any options, SAS calculates the sample mean, sample standard deviation, sample size, minimum, and maximum values for each numeric variable in the SAS dataset. The output is placed in a table and is posted in the output window (i.e. no new output window is created from the MEANS procedure unless specified otherwise). proc means data = ia.baseball ; run ;

Let's adjust the code to get better output.

PROC MEANS [var keyword] Notice in the last slide, all of the variables were provided in the output. To specify specific variables in the SAS dataset, include a var statement followed by the variables that you only want outputted. SAS will only output the statistics for those that you provided (and in that order). Note: if you have SAS variables with names that differ by a number at the end of the variable name (for example: exam1 exam2 exam3 exam4 exam5), you can reference all of them by saying the following: var variable_1 - variable_k For our example, we can say: var exam1 - exam5 proc means data = libref.datasetname ; var variable_1 variable_2 variable_3 …. variable_k ; run ;

PROC MEANS [ ] You can specify which descriptive statistics that you want to output if you list them after the name of the dataset. By using this option, you will be trumping the default statistics that is outputted. Now, SAS will only produce the statistics that you specify. There are dozens of statistical keywords to choose from. proc means data = libref.datasetname ; var variable_1 variable_2 variable_3 …. variable_k ; run ;

PROC REG The basic form of the PROC REG is the following: This basic form: produces a linear regression model with model fit, parameter estimates, and produces the residual diagnostic test We will use our Salary of Major Leageue Baseball Players SAS dataset to understand how this procedure works. proc reg data = libref.datasetname ; id; model responsevar = var1 var2…vark; run ;

Questions?

Special Thanks Dr. Chris Franck- Assistant Director of LISA Tonya Pruitt-Administrative Specialist LISA Dr. Marlow Lemons Kris Patton Elaine Perrin Weibin Xu