How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging.

Slides:



Advertisements
Similar presentations
Summary Statistics/Simple Graphs in SAS/EXCEL/JMP.
Advertisements

The SAS ® System Additional Information on Statistical Analysis Programming.
The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
SAS Programming:File Merging and Manipulation. Reading External Files (review) data barf; * create the dataset BARF; infile ’s:\mysas\Table7.1'; * open.
Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.
Introduction to SAS Programming Christina L. Ughrin Statistical Software Consulting Some notes pulled from SAS Programming I: Essentials Training.
Statistics in Science  Introducing SAS ® software Acknowlegements to David Williams Caroline Brophy.
SAS Programming: Working With Variables. Data Step Manipulations New variables should be created during a Data step Existing variables should be manipulated.
Today: Run SAS programs on Saturn (UNIX tutorial) Runs SAS programs on the PC.
Permanent Formats and Working Across Platforms. 32bit vs. 64 bit SAS The different versions of SAS optimize datasets and formats to work as fast as possible.
Biostatistical Methods II PubH 6415 Spring PubH 6415 – Biostatistics I Instructor: Susan Telke (office hours: lecture.
The Information Delivery Process Data In Information Out ManageOrganizeExploit.
Descriptive Statistics In SAS Exploring Your Data.
A Simple Guide to Using SPSS© for Windows
Categorical Data Analysis using SAS. 2 List the components of a SAS program. Open an existing SAS program and run it. Discuss the Chi Square Test of Independence.
NonParametric Statistics using SAS. 2 List the components of a SAS program. Open an existing SAS program and run it. Objectives.
Basic And Advanced SAS Programming
1 SAS SAS is a statistics software package developed by SAS Institute Inc. in U.S.A. SAS products include SAS/STAT, SAS/IML, SAS/OR, etc. The most.
SAS Workshop INTRODUCTORY ASPECTS SPRING 2012 January 20121K. F. O'Brien.
Understanding SAS Data Step Processing Alan C. Elliott stattutorials.com.
Into to SAS ®. 2 List the components of a SAS program. Open an existing SAS program and run it. Objectives.
Introduction to SAS Lecture 2 Brian Healy.
Creating SAS® Data Sets
How to start using SAS.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
Collection and Analysis of Data CPH 608 Spring 2015.
Lecture 5 Sorting, Printing, and Summarizing Your Data.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Using SPSS for Windows Part II Jie Chen Ph.D. Phone: /6/20151.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
1 Experimental Statistics - week 2 Review: 2-sample t-tests paired t-tests Thursday: Meet in 15 Clements!! Bring Cody and Smith book.
Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the.
SAS 介绍和举例 Presented by 经济实验教学中心 商务数据挖掘中心. Raw Data Read in Data Process Data (Create new variables) Output Data (Create SAS Dataset) Analyze Data Using.
EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1.
BMTRY 789 Lecture 2 SAS Syntax, entering raw data, etc. Lecturer: Annie N. Simpson, MSc. Readings – Chapters 1, 2, 12, & 13 Lab Problems 1.1, 1.2, 1.3,
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Grant Brown.  AIDS patients – compliance with treatment  Binary response – complied or no  Attempt to find factors associated with better compliance.
Summer SAS Workshop Lecture 2. Summer Summer SAS Workshop Lecture 2 I’ve got Data…how do I get started? Libname Review How do you do arithmetic.
1 An Introduction to SPSS for Windows Jie Chen Ph.D. 6/4/20161.
1 EPIB 698E Lecture 1 Notes Instructor: Raul Cruz 7/9/13.
Lesson 6 - Topics Reading SAS datasets Subsetting SAS datasets Merging SAS datasets.
Laboratory 1. Introduction to SAS u Statistical Analysis System u Package for –data entry –data manipulation –data storage –data analysis –reporting.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
SAS Basics. Windows Program Editor Write/edit all your statements here. Log Watch this for any errors in program as it runs. Output Will automatically.
Chapter 17: Formatting Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
How to start using SAS SARBAJIT MUKHERJEE. WHAT IS SAS? SAS stands for Statistical Analysis System. Useful for the following types of task: 1. Data entry,
Lesson 8 - Topics Creating SAS datasets from procedures Using ODS and data steps to make reports Using PROC RANK Programs in course notes LSB 4:11;5:3.
SAS Basics. Windows Program Editor Write/edit all your statement here.
An Introduction Katherine Nicholas & Liqiong Fan.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
LISA SHORT COURSE SERIES: INTRODUCTION TO SAS UNIVERSITY William DeShong Fall 2015.
Chapter 8: Using Basic Statistical Procedures “33⅓% of the mice used in the experiment were cured by the test drug; 33⅓% of the test population were unaffected.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
The Urban Institute - SAS Training6/9/20161 SAS Training This SAS Training Course was designed to introduce users at The Urban Institute to SAS programming.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Instructor: Raul Cruz-Cano
Tamara Arenovich Tony Panzarella
Presentation transcript:

How to start using SAS Tina Tian

The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging SAS Data Sets Formatting data Introduce some simple statistical analysis procedures

Basic Screen Navigation Main: Editor contains the SAS program to be submitted. Log contains information about the processing of the SAS program, including any warning and error messages Output contains reports generated by SAS procedures and DATA steps Side: Explore navigate to other objects like libraries Results navigate your Output window

SAS programs A SAS program is a sequence of steps that the user submits for execution. Data steps are typically used to create SAS data sets PROC steps are typically used to process SAS data sets ( that is, generate reports and graphs, sort data and analyze data)

SAS Data Libraries A SAS data library is a collection of SAS files that are recognized as a unit by SAS A SAS data set is one type of SAS file stored in a data library Work library is temporary library, when SAS is closed, all the datasets in the Work library are deleted; create a permanent SAS dataset via your own library.

SAS Data Libraries Identify/create SAS data libraries by assigning each a library reference name (libref) with LIBNAME statement LIBNAME libref “file-folder-location”; Eg: LIBNAME readData 'C:\temp\sas class\readData‘; Rules for naming a library reference name: The name must be 8 characters or less The name must begin with a letter or underscore The remaining characters must be letters, numbers or underscores.

Reading internal raw data in SAS system Put small amounts of raw data directly in the SAS program to create SAS data set, you must Start a DATA step and name the SAS data set being created with DATA statement Describe how to read the data fields from the raw data file with INPUT statement Use the DATALINES statement to indicate internal data The RUN statement detects the end of a step

Reading internal raw data in SAS system Example: DATA dog1; INPUT ID Age Gender $ Income; DATALINES; 1 10 m f f m m 1000; RUN;

Reading external raw data files into SAS system In order to create a SAS data set from a raw data file, you must Start a DATA step and name the SAS data set being created (DATA statement) Identify the location of the raw data file to read (INFILE statement) Describe how to read the data fields from the raw data file (INPUT statement) The RUN statement detects the end of a step

Reading external raw data file into SAS system LIBNAME readData “ C:\temp\sas class”; DATA readData.dog1; INFILE “ C:\temp\sas class\dog.txt ”; INPUT ID Age Gender $ Income; RUN; The LIBNAME statement assigns a libref ‘ readData ’ to a data library. The DATA statement creates a permanent SAS data set named ‘dog1 ’. The INFILE statement points to a raw data file. The INPUT statement - name the SAS variables - identify the variables as character or numeric ($ indicates character data) - specify the locations of the fields in the raw data - can be specified as column, formatted, list, or named input The RUN statement detects the end of a step

Reading Delimited or PC Database Files with the IMPORT Procedure If your data file has the proper extension, use the simplest form of the IMPORT procedure: PROC IMPORT DATA FILE = ‘filename’ OUT = data-set DBMS = identifier ; RUN; Type of File Extension DBMS Identifier Comma-delimited.csv CSV Tab-delimited.txt TAB Excel.xls EXCEL Lotus Files.wk1,.wk3,.wk4 WK1,WK3,WK4 Delimiters other than commas or tabs DLM Examples: PROC IMPORT DATAFILE=‘c:\temp\sale.xls’ OUT=readData.import1; DBMS = EXCEL; RUN;

Reading Delimited or PC Database Files with the IMPORT Procedure If your file does not have the proper extension, or your file is of type with delimiters other than commas or tabs, then you must use the DBMS= and DELIMITER= option PROC IMPORT DATA FILE = ‘filename’ OUT = data-set DBMS = identifier ; DELIMITER = ‘delimiter-character’; RUN; Examples: PROC IMPORT DATAFILE=‘c:\temp\sale.txt’ OUT=readData.import2; DBMS = DLM; DELIMITER = ‘&’; RUN;

Reading Files with the IMPORT Procedure If your file does not have the proper extension, or your file is of type with delimiters other than commas or tabs, then you must use the DBMS= and DELIMITER= option PROC IMPORT DATAFILE = ‘filename’ OUT = data-set DBMS = identifier; DELIMITER = ‘delimiter-character’; RUN; Example: PROC IMPORT DATAFILE = ‘C:\sas class\readData\import2.txt’ OUT =readData.sasfile DBMS =DLM; DELIMITER = ‘&’; RUN;

Format in SAS data set Standard Formats (selected): Character: $ w. Date, Time and Datetime: DATE w., MMDDYY w., TIMEw. d, …… Numeric: COMMA w. d, DOLLAR w. d, …… Use FORMAT statement PROC PRINT DATA=sales; VAR Name DateReturned CandyType Profit; FORMAT DateReturned DATE9. Profit DOLLAR 6.2; RUN;

Format in SAS data set Create your own custom formats with two steps: Create the format using PROC FORMAT and VALUE statement. Assign the format to the variable using FORMAT statement. General form of a simple PROC FORMAT steps: PROC FORMAT; VALUE name range-1=‘formatted-text-1’ range-2=‘formatted-text-2’ ……; RUN; The name in VALUE statement is the name of the format you are creating, which can’t be longer than eight characters, must not start or end with a number. If the format is for character data, it must start with a $.

Format in SAS data set Exmaple: /* Step1: Create the format for certain variables */ PROC FORMAT; VALUE $genFmt ‘m’ = 'Male' ‘f’ = 'Female'; VALUE polFmt 1=‘likes’ 2=‘dont care’ 3=‘dislikes’ 9=‘no answer’ RUN; /* Step2: Assign the variables */ DATA Mydata.dog123(replace=yes); SET Mydata.dog123; FORMAT Gender genFmt. Policy polFmt.; RUN;

Format in SAS data set Permanently store formats in a SAS catalog by Creating a format catalog file with LIB in PROC FORMAT statement Setting the format search options Example: LIBNAME Mydata ‘C :\sas class\Format ’; OPTIONS FMTSEARCH=( Mydata.dogfmt); PROC FORMAT LIB=Myd ata.dogfmt; VALUE $genFmt m = 'Male’ f = 'Female'; RUN; Read formats OPTIONS nofmterr; OPTIONS FMTSEARCH=(Mydata.dogfmt);

Combining SAS Data Sets: Concatenating and Interleaving Use the SET statement in a DATA step to concatenate SAS data sets. Use the SET and BY statements in a DATA step to interleave SAS data sets.

Combining SAS Data Sets: Concatenating and Interleaving General form of a DATA step concatenation: DATA new SAS-data-set; SET SAS-data-set1 SAS-data-set2 …; RUN; Example: DATA mydata.dog12; SET dog1 mydata.dog2; RUN;

Combining SAS Data Sets: Concatenating and Interleaving General form of a DATA step interleave: DATA new-data-set; SET SAS-data-set1 SAS-data-set2 …; BY BY-variable; RUN; Sort all SAS data set first by using PROC SORT Example: PROC SORT data=dog1 OUT=dog1_sorted; BY ID; RUN; DATA mydata.dog12; SET dog1 mydata.dog2; BY ID; RUN;

Match-Merging SAS Data Sets One-to-one match merge One-to-many match merge Many-to-many match merge The SAS statements for all three types of match merge are identical in the following form: DATA new-data-set; MERGE SAS-data-set-1 SAS-data-set-2 SAS-data-set-3 …; BY by-variable(s); /* indicates the variable(s) that control which observations to match */ RUN;

Merging SAS Data Sets: A More Complex Example /* To match-merge the data sets by common variables - EmpID, the data sets must be ordered by EmpID */ PROC SORT data=combData.Groupsched; BY EmpID; RUN; Example: Merge two data sets acquire the names of the group team that is scheduled to fly next week. combData.employee combData.groupsched EmpIDLastName E00632Strauss E01483Lee E01996Nick E04064Waschk EmpIDFlightNum E E E

Merging SAS Data Sets: A More Complex Example /* simply merge two data sets */ DATA combData.nextweek; MERGE combData.employee combData.groupsched; BY EmpID; RUN; EmpIDLastJNameFlightNum E00632Strauss5250 E01483Lee E01996Nick5501 E04064Waschk5105

Merging SAS Data Sets: A More Complex Example Eliminating Nonmatches Use the IN= data set option to determine which dataset(s) contributed to the current observation. General form of the IN=data set option: SAS-data-set (IN=variable) Variable is a temporary numeric variable that has two possible values: 0 indicates that the data set did not contribute to the current observation. 1 indicates that the data set did contribute to the current observation.

Merging SAS Data Sets: A More Complex Example /* Exclude from the data set employee who are not scheduled to fly next week. */ LIBNAME combData “K:\sas class\merge”; DATA combData.nextweek; MERGE combData.employee combData.groupsched (in=InSched); BY EmpID; IF InSched=1; True RUN; EmpIDLastJNameFlightNum E00632Strauss5250 E01996Nick5501 E04064Waschk5105

Merging SAS Data Sets: A More Complex Example /* Find employees who are not in the flight scheduled group. */ LIBNAME combData “ K:\sas class\merge ”; DATA combData.nextweek; MERGE combData.employee (in=InEmp) combData.groupsched (in=InSched); BY EmpID; IF InEmp=1; True IF InSched=0; False RUN; EmpIDLastJNameFlightNum E01483Lee

Different Types of Merges in SAS DATA work.three; MERGE work.one work.two; BY X; RUN; One-to-Many Merging XY 1A 2B 3C XE 1A1 1A2 2B1 3C1 3C2 XYZ 1AA1 1AA2 2BB1 3CC1 3CC2 Work.three Work.two Work.one

Different Types of Merges in SAS DATA work.three; MERGE work.one work.two; BY X; RUN; Many-to-Many Merging XY 1A1 1A2 2B1 2B2 XZ 1AA1 1AA2 1AA3 2BB1 2BB2 XYZ 1A1AA1 1A2AA2 1A2AA3 2B1BB1 2B2BB2 Work.three Work.two Work.one

Some simple analysis procedure The PRINT Procedure The CONTENTS Procedure The FREQ Procedure The SORT Procedure The MEANS Procedure The CORR Procedure The TTEST Procedure The ANOVA Procedure

The PRINT Procedure The PRINT procedure prints the observations in a SAS data set. General form of a simple PROC PRINT steps: PROC PRINT DATA = SAS-data-set; VAR variable(s) ; SUM variable(s) ; RUN; The VAR statement specifies which variables to print and the order The SUM statement indicates the total values of numeric variables

The Contents Procedure The CONTENTS procedure shows the contents of a SAS data set and prints the directory of the SAS data library General form of a simple PROC CONTENTS steps: PROC CONTENTS DATA = SAS-data-set; RUN;

The SORT Procedure The SORT procedure orders SAS data set observations by the values of one or more character or numeric variables. General form of a simple PROC SORT steps: PROC SORT DATA = SAS-data-set; BY variable-1 variable-n>; RUN;

The MEANS Procedure The MEANS procedure provides descriptive statistics for variables across all observations General form of a simple PROC MEANS steps: PROC MEANS DATA = SAS-data-set; CLASS variable(s) ; VAR variable(s) RUN;

The FREQ Procedure The FREQ procedure produces one-way to n-way frequency and crosstabulation (contingency) tables General form of a simple PROC FREQ steps: PROC FREQ DATA = SAS-data-set; TABLE requests ; RUN; The TABLES statement requests one-way to n-way frequency and crosstabulation tables and statistics for those tables

The TTEST Procedure The TTEST procedure performs t tests for one sample, two samples, and paired observations. General form of a simple PROC FREQ steps: PROC TTEST DATA = SAS-data-set H0=m; VAR variable(s); RUN; PROC TTEST DATA = SAS-data-set; VAR variable(s); CLASS variable; RUN; use H0 option to a given number in the one sample t test use CLASS statement in the two groups comparison t test

The ANOVA Procedure The ANOVA procedure performs one-way analysis of variance (ANOVA) for balanced data General form of a simple PROC FREQ steps: PROC ANOVA DATA = SAS-data-set; CLASS variable(s) ; MODLE dependents = effects ; RUN;

Some simple analysis procedure The UNIVARIATE Procedure The REG Procedure The LOGISTIC Procedure

The UNIVARIATE Procedure The UNIVARIATE procedure provides descriptive statistics, histograms, quartile - quartile plots (Q-Q plots) and probability plots General form of a simple PROC FREQ steps: PROC UNIVARIATE DATA = SAS-data-set; VAR variables; HISTOGRAM; QQPLOT; RUN;

The REG procedure The REG procedure is one of many regression procedures in the SAS System. The REG procedure allows several MODEL statements and gives additional regression diagnostics, especially for detection of collinearity. It also creates plots of model summary statistics and regression diagnostics. PROC REG ; MODEL dependents=independents ; PLOT ; RUN;

An example PROC REG DATA=water; MODEL Water = Temperature Days Persons / VIF; MODEL Water = Temperature Production Days / VIF; RUN; PROC REG DATA=water; MODEL Water = Temperature Production Days; PLOT STUDENT.* PREDICTED.; /*To get studentized Residual */ PLOT STUDENT.* NPP.; /*To get Normal Cumulative Distribution*/ PLOT r.*nqq.; /*Produce normal Q-Q plot */ RUN;

The LOGISTIC procedure The binary or ordinal responses with continuous independent variables PROC LOGISTIC ; MODEL dependents=independents ; RUN; The binary or ordinal responses with categorical independent variables PROC LOGISTIC ; CLASS categorical variables ; MODEL dependents=independents ; RUN;

Example PROC LOGISTIC data=Mydata2.pain; CLASS Treatment Sex; MODEL Pain= Treatment Sex Treatment*Sex Age Duration; RUN;