Presentation and Data  Short Courses  Intro to SAS  Download Data to Desktop 1.

Slides:



Advertisements
Similar presentations
Summary Statistics/Simple Graphs in SAS/EXCEL/JMP.
Advertisements

Technology Short Courses: Spring 2010 Kentaka Aruga
I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Introduction to SAS Programming Christina L. Ughrin Statistical Software Consulting Some notes pulled from SAS Programming I: Essentials Training.
Presentation and Data  Short Courses  Intro to SAS  Download Data to Desktop 1.
Where to find this presentation and data  Short Courses  “Data Analysis in SAS”  Course Materials  Download Data to Desktop.
Multiple regression analysis
Detecting univariate outliers Detecting multivariate outliers
A Simple Guide to Using SPSS© for Windows
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
Data Analysis Statistics. Inferential statistics.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Excel Web App By: Ms. Fatima Shannag.
Generalized Linear Models
November 4, 2009 Introduction to SAS LISA Short Course Series Mark Seiss, Dept. of Statistics.
Presentation and Data  Short Courses  Intro to SAS  Download Data to Desktop 1.
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
Inferential Statistics: SPSS
Chapter Outline 5.0 PROC GLM
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
Introduction to SAS Essentials Mastering SAS for Data Analytics
Review of Econ424 Fall –open book –understand the concepts –use them in real examples –Dec. 14, 8am-12pm, Plant Sciences 1129 –Vote Option 1(2)
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Using SPSS for Windows Part II Jie Chen Ph.D. Phone: /6/20151.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
1 Experimental Statistics - week 2 Review: 2-sample t-tests paired t-tests Thursday: Meet in 15 Clements!! Bring Cody and Smith book.
Analyzing and Interpreting Quantitative Data
Quantify the Example Data First, code and quantify the data (assign column locations & variable names) Use the sample data to create a data set from the.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
Haas MFE SAS Workshop Lecture 3: Peng Liu Haas School.
1 An Introduction to SPSS for Windows Jie Chen Ph.D. 6/4/20161.
1 EPIB 698E Lecture 1 Notes Instructor: Raul Cruz 7/9/13.
How to start using SAS Tina Tian. The topics An overview of the SAS system Reading raw data/ create SAS data set Combining SAS data sets & Match merging.
Chapter 1 – Matlab Overview EGR1302. Desktop Command window Current Directory window Command History window Tabs to toggle between Current Directory &
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
Lecture 3 Topic - Descriptive Procedures Programs 3-4 LSB 4:1-4.4; 4:9:4:11; 8:1-8:5; 5:1-5.2.
Excel Web App By: Ms. Fatima Shannag.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
SW388R7 Data Analysis & Computers II Slide 1 Detecting Outliers Detecting univariate outliers Detecting multivariate outliers.
Chapter 6: Analyzing and Interpreting Quantitative Data
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
An Introduction Katherine Nicholas & Liqiong Fan.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
Experimental Statistics - week 9
Chapter 8: Using Basic Statistical Procedures “33⅓% of the mice used in the experiment were cured by the test drug; 33⅓% of the test population were unaffected.
Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
1 EPIB 698C Lecture 1 Instructor: Raul Cruz-Cano
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
IENG-385 Statistical Methods for Engineers SPSS (Statistical package for social science) LAB # 1 (An Introduction to SPSS)
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
BINARY LOGISTIC REGRESSION
Notes on Logistic Regression
DEPARTMENT OF COMPUTER SCIENCE
SAS Programming I Matthew A. Lanham Doctoral Student
Instructor: Raul Cruz-Cano
Tamara Arenovich Tony Panzarella
6-1 Introduction To Empirical Models
Instructor: Raul Cruz 9/4/13
Presentation transcript:

Presentation and Data  Short Courses  Intro to SAS  Download Data to Desktop 1

Mark Seiss, Dept. of Statistics Introduction to SAS November 8 and 9, 2010

Reference Material The Little SAS Book – Delwiche and Slaughter SAS Programming I: Essentials SAS Programming II: Manipulating Data with the DATA Step Presentation and Data

Presentation Outline 1.Introduction to the SAS Environment 2.Working With SAS Data Sets 3.Summary Procedures 4.Basic Statistical Analysis Procedures

Presentation Outline Questions/Comments Individual Goals/Interests

Introduction to the SAS Environment 1.SAS Programs 2.SAS Data Sets and Data Libraries 3.SAS System Help 4.Creating SAS Data Sets

SAS Programs File extension -.sas Editor window has four uses: Access and edit existing SAS programs Write new SAS programs Submitting SAS programs for execution Saving SAS programs SAS program – sequence of steps that the user submits for execution Submitting SAS programs Entire program Selection of the program

SAS Programs Syntax Rules for SAS statements Free-format – can use upper or lower case Usually begin with an identifying keyword Can span multiple lines Always end with a semicolon Multiple statements can be on the same line Errors Misspelled key words Missing or invalid punctuation (missing semi-colon common) Invalid options Indicated in the Log window

SAS Programs 2 Basic steps in SAS programs: Data Steps Typically used to create SAS datasets and manipulate data, Begins with DATA statement Proc Steps Typically used to process SAS data sets Begins with PROC statement The end of the data or proc steps are indicated by: RUN statement – most steps QUIT statement – some steps Beginning of another step (DATA or PROC statement)

SAS Programs Output generated from SAS program – 2 Windows SAS log Information about the processing of the SAS program Includes any warnings or error messages Accumulated in the order the data and procedure steps are submitted SAS output Reports generated by the SAS procedures Accumulates output in the order it is generated

SAS Data Sets and Data Libraries SAS Data Set Specifically structured file that contains data values. File extension -.sas7bdat Rows and Columns format – similar to Excel Columns – variables in the table corresponding to fields of data Rows – single record or observation Two types of variables Character – contain any value (letters, numbers, symbols, etc.) Numeric – floating point numbers Located in SAS Data Libraries

SAS Data Sets and Data Libraries SAS Data Libraries Contain SAS data sets Identified by assigning a library reference name – libref Temporary Work library SAS data files are deleted when session ends Library reference name not necessary Permanent SAS data sets are saved after session ends SASUSER library You can create and access your own libraries

SAS Data Sets and Data Libraries SAS Data Libraries cont. Assigning library references Syntax LIBNAME libref ‘SAS-data-library’; Rules for Library References 8 characters or less Must begin with letter or underscore Other characters are letters, numbers, or under scores

SAS Data Sets and Data Libraries SAS Data Libraries cont. Identifying SAS data sets within SAS Data Libraries libref.filename Accessing SAS data sets within SAS Data Libraries Example:DATA new_data_set; set libref.filename; run; Creating SAS data sets within SAS Data Libraries Example:DATA libref.filename; set old_data_set; run;

SAS System Help SAS Help and Documentation Help  SAS Help and Documentation Red Book Icon SAS Online Help

Creating SAS Data Sets Creating a SAS data sets from raw data 4 methods 1.Importing existing data sets using Import menu option 2.Importing existing raw data in SAS program 3.Manually entering raw data in SAS program 4.Manually entering raw data using Table Editor

Creating SAS Data Sets Using the import data menu option 1.File  Import Data 2.Standard data source  select the file format 3.Specify file location or Browse to select file 4.Create name for the new SAS data set and specify location

Creating SAS Data Sets Compatible file formats Microsoft Excel Spreadsheets Microsoft Access Databases Comma Separate Files (.csv) Tab Delimited Files (.txt) dBASE Files (.dbf) JMP data sets SPSS Files Lotus Spreadsheets Stata Files Paradox Files

Creating SAS Data Sets Example Data Sets Excel File – State_SAT_data.xls Extracted from 1997 Digest of Education Statistics, an annual publication of the U.S. Department of Education Contains variables that show the relationship between public school expenditure and SAT performance Variables: –State (state) –Current expenditure per pupil (expend) –Average pupil to teacher ratio (PT_ratio) –Estimated annual salary of teachers (salary) –Percentage of eligible students taking the SAT (students) –Average verbal SAT score (verbal) –Average math SAT Score (math) –Average total score (total)

Creating SAS Data Sets Example Data Sets Cont. Text file – State_region_data.txt Contains region assignments for each state 1 = New England 2 = Middle Atlantic 3 = East North Central 4 = West North Central 5 = South Atlantic 6 = East South Central 7 = West South Central 8 = Mountain 9 = Pacific

Creating SAS Data Sets Import State_SAT_data.xls  Assign as work.state_sat_data.sas7bdat Import State_region_data.txt  Assign as work.state_region_data.sas7bdat

Introduction to the SAS Environment Questions/Comments

Working With SAS Data Sets 1.Data Set Information 2.Data Set Manipulation 3.Data Set Processing 4.Combining Data Sets A.Concatenating/Appending B.Merging 5.Saving Data Sets

Data Set Information Proc Contents Output contains a table of contents of the specified data set Data Set Information Data set name Number of observations Number of Variables Variable Information Type (numeric or character) Length Syntax: PROC CONTENTS DATA=input_data_set; RUN;

Data Set Information Assignment Obtain Data Set Information for work.state_sat_data and work.state_region_data

Data Set Information Solution proc contents data=state_sat_data; run; proc contents data=state_region_data; run;

Data Set Manipulation Create a new SAS data set using an existing SAS data set as input Specify name of the new SAS data set after the DATA statement Use SET statement to identify SAS data set being read Syntax: DATA output_data_set; SET input_data_set; ; RUN; By default the SET statement reads all observations and variables from the input data set into the output data set.

Data Set Manipulation Assignment Statements Evaluate an expression Assign resulting value to a variable General Form:variable = expression; Example:miles_per_hour = distance/time; SAS Functions Perform arithmetic functions, compute simple statistics, manipulate dates, etc. General Form:variable=function_name(argument1, argument2,…); Example: Time_worked = sum(Day1,Day2, Day3, Day4, Day5);

Data Set Manipulation Selecting Variables Use DROP and KEEP to determine which variables are written to new SAS data set. 2 Ways DROP and KEEP as statements –Form:DROP Variable1 Variable2; KEEP Variable3 Variable4 Variable5; DROP and KEEP options in SET statement –Form:SET input_data_set (KEEP=Var1);

Data Set Manipulation Conditional Processing Uses IF-THEN-ELSE logic General Form:IF THEN ; ELSE IF THEN ; ELSE ; is a true/false statement, such as: Day1=Day2, Day1 > Day2, Day1 < Day2 Day1+Day2=10 Sum(day1,day2)=10 Day1=5 and Day2=5

Data Set Manipulation Conditional Processing SymbolicMnemonicExample =EQIF region=‘Spain’; ~= or ^=NEIF region ne ‘Spain’; >GTIF rainfall > 20; <LTIF rainfall lt 20; >=GEIF rainfall ge 20; <=LEIF rainfall <= 20; &ANDIF rainfall ge 20 & temp < 90; | or !ORIF rainfall ge 20 OR temp < 90; IS NOT MISSING IF region IS NOT MISSING; BETWEEN ANDIF region BETWEEN ‘Plain’ AND ‘Spain’; CONTAINSIF region CONTAINS ‘ain’; INIF region IN (‘Rain’, ‘Spain’, ‘Plain’);

Data Set Manipulation Conditional Processing cont. If is true, is processed ELSE IF and ELSE are only processed if is false Only one statement specified using this form Use DO and END statements to execute group of statements General Form:IF THEN DO; ; END; ELSE DO; ; END;

Data Set Manipulation Subsetting Rows (Observations) We will look at two ways Using IF statement Using WHERE option in SET statement IF statement Only writes observations to the new data set in which an expression is true; General Form: IF ; Example: IF career = ‘Teacher’; IF sex ne ‘M’; In the second example, only observations where sex is not equal to ‘M’ will be written to the output data set

Data Set Manipulation Subsetting Rows (Observations) cont. Where Option in SET statement Use option to only read rows from the input data set in which the expression is true General Form:SET input_data_set (where=( )); Example:SET vacation (where=(destination=‘Bermuda’)); Only observations where the destination equals ‘Bermuda’ will be read from the input data set Comparison Resulting output data set is equivalent IF statement – all rows read from the input data set Where option – only rows where expression is true are read from input data set Difference in processing time when working with big data sets

Data Set Manipulation Assignments 1.Create new dataset work.state_SAT_data2 from work.state_SAT_data Assign new variable  upper_ind If total > 1000 then upper_ind=1 Otherwise upper_ind=0 2.Create new dataset work.south from work.state_region_data Specify work.south contains only records from regions 5, 6, or 7 Specify work.south only contains the state variable

Data Set Manipulation Solutions 1. data state_sat_data2; set state_sat_data; if total>1000 then upper_ind=1; else upper_ind=0; run;

Data Set Manipulation Solutions 2. data south; set state_region_data; if region=5 or region=6 or region=7; keep state; run; OR data south; set state_region_data(where=(region=5 or region=6 or region=7)); keep state; run;

Data Set Manipulation PROC SORT sorts data according to specified variables General Form:PROC SORT DATA=input_data_set ; BY Variable1 Variable2; RUN; Sorts data according to Variable1 and then Variable2; By default, SAS sorts data in ascending order Number low to high A to Z Use DESCENDING statement for numbers high to low and letters Z to A BY City DESCENDING Population; SAS sorts data first by city A to Z and then Population high to low

Data Set Manipulation Some Options NODUPKEY Eliminates observations that have the same values for the BY variables OUT=output_data_set By default, PROC SORT replaces the input data set with the sorted data set Using this option, PROC SORT creates a newly sorted data set and the input data set remains unchanged

Data Set Processing DATA steps read in data from existing data sets or raw data files one row at a time, like a loop DATA step reads data from the input data set in the following way: 1. Read in current row from input data set to Program Data Vector (PDV) 2.Process SAS statements 3.PDV to output data set 4.Set current row to the next row in the input data set 5.Iterate to Step 1 One row at a time is processed Thus we cannot simply add the value of a variable in one row to the value in another row

Combining Data Sets Concatenating (or Appending) Stacks each data set upon the other If one data set does not have a variable that the other datasets do, the variable in the new data set is set to missing for the observations from that data set. General Form:DATA output_data_set; SET data1 data2; run; PROC APPEND may also be used

Combining Data Sets Merging Data Sets One-to-One Match Merge A single record in a data set corresponds to a single record in all other data sets Example: Patient and Billing Information One-to-Many Match Merge Matching one observation from one data set to multiple observations in other data sets Example: County and State Information Note:Data must be sorted before merging can be done (PROC SORT)

Combining Data Sets One-to-One Match Merge Usually need at least one common variable between data sets – matching purposes For the example, a patient ID would be needed Do not need common variable if all data sets are in exactly the same order General Form:DATA output_data_set; MERGE input_data_set1 input_data_set2; By variable1 variable2; RUN;

Combining Data Sets One-to-One Match Merge Example: PerformanceGoals Code: DATA compare; MERGE performance goals; BY month; difference=sales-goal; RUN; MonthSales MonthGoal

Combining Data Sets One-to-One Match Merge Example cont.: Compare MonthSalesGoalDifference

Combining Data Sets One-to-Many Match Merge Requires at least one common variable in the data sets for matching purposes For the example, State information is in both the state and county files If two data sets have variables with the same name, the variables in the second data set will overwrite the variable in the first. General Form:DATA output_data_set; MERGE Data1 Data2 Data3; BY Variable1 Variable2; RUN:

Combining Data Sets One-to-Many Match Merge Example: VideosAdjustment Code: DATA prices; MERGE videos adjustment BY category; NewPrice=(1-adjustment)*sales; RUN; CategorySales Aerobics12.99 Aerobics13.99 Aerobics13.99 Step12.99 Step12.99 Weights15.99 CategoryAdjustment Aerobics.20 Step.30 Weights.25

Combining Data Sets One-to-One Many Merge Example cont.: Videos CategorySalesAdjustmentNewPrice Aerobics Aerobics Aerobics Step Step Weights

Combining Data Sets Assignment Create the dataset work.state_data Merge work.state_sat_data2 with work.state_region_data by the state variable

Combining Data Sets Solution proc sort data=state_sat_data2; by state; run; proc sort data=state_region_data; by state; run; data state_data; merge state_sat_data2 state_region_data; by state; run;

Saving Data Sets Save as SAS dataset (.sas7bdat) LIBNAME libref “destination folder”; DATA libref.filename; SET current_name; optional commands; RUN; Other Formats 1.File  Export Data 2.Specify SAS data set 3.Standard data source  select the file format 4.Specify File Folder and Filename

Working With SAS Data Sets Questions/Comments

Summary Procedures 1.Print Procedure 2.Plot Procedure 3.Univariate Procedure 4.Means Procedure 5.Freq Procedure

Print Procedure PROC PRINT is used to print data to the output window By default, prints all observations and variables in the SAS data set General Form:PROC PRINT DATA=input_data_set ; RUN; Some Options input_data_set (obs=n) -Specifies the number of observations to be printed in the output NOOBS - Suppresses printing observation number LABEL - Prints the labels instead of variable names

Print Procedure Optional SAS statements BY variable1 variable2 variable3; Starts a new section of output for every new value of the BY variables ID variable1 variable2 variable3; Prints ID variables on the left hand side of the page and suppresses the printing of the observation numbers SUM variable1 variable2 variable3; Prints sum of listed variables at the bottom of the output VAR variable1 variable2 variable3; Prints only listed variables in the output

Print Procedure Assignment Use PROC PRINT to print out the state variable separately for each region Note: All procedures for the remainder of the course will be run on the data set work.state_data.

Print Procedure Solution proc sort data=state_data; by region; run; proc print data=state_data; var state; by region; run;

Plot Procedure Used to create basic scatter plots of the data Use PROC GPLOT or PROC SGPLOT for more sophisticated plots General Form: PROC PLOT DATA=input_data_set; PLOT vertical_variable * horizontal_variable/ ; RUN; By default, SAS uses letters to mark points on plots A for a single observation, B for two observations at the same point, etc. To specify a different character to represent a point PLOT vertical_variable * horizontal variable = ‘*’;

Plot Procedure To specify a third variable to use to mark points PLOT vertical_variable * horizontal_variable = third_variable; To plot more than one variable on the vertical axis PLOT vertical_variable1 * horizontal_variable=‘2’ vertical_variable2 * horizontal_variable=‘1’/OVERLAY ;

Plot Procedure Assignment Use the PLOT PROCEDURE to plot SAT Verbal scores versus SAT Math Scores Use the value of the region variable to mark points

Plot Procedure Solution proc plot data=state_data; plot math*verbal=region; run;

Univariate Procedure PROC UNIVARIATE is used to examine the distribution of data Produces summary statistics for a single variable Includes mean, median, mode, standard deviation, skewness, kurtosis, quantiles, etc. General Form: PROC UNIVARIATE DATA=input_data_set ; VAR variable1 variable2 variable3; RUN ; If the variable statement is not used, summary statistics will be produced for all numeric variables in the input data set.

Univariate Procedure Options include: PLOT – produces Stem-and-leaf plot, Box plot, and Normal probability plot; NORMAL – produces tests of Normality

Univariate Procedure Assignment Use PROC UNIVARIATE to produce a normal probability plot and test the normality of the SAT Total variable and Expenditure variable

Univariate Procedure Solution proc univariate data=state_data normal plot; var expend total; run;

Means Procedure Similar to the Univariate procedure General Form:PROC MEANS DATA=input_data_set options; ; RUN; With no options or optional SAS statements, the Means procedure will print out the number of non-missing values, mean, standard deviation, minimum, and maximum for all numeric variables in the input data set

Means Procedure Options Statistics Available Note: The default alpha level for confidence limits is 95%. Use ALPHA= option to specify different alpha level. CLMTwo-Sided Confidence LimitsRANGERange CSSCorrected Sum of SquaresSKEWNESSSkewness CVCoefficient of VariationSTDDEVStandard Deviation KURTOSISKurtosisSTDERRStandard Error of Mean LCLMLower Confidence LimitSUMSum MAXMaximum ValueSUMWGTSum of Weight Variables MEANMeanUCLMUpper Confidence Limit MINMinimum ValueUSSUncorrected Sum of Squares NNumber Non-missing ValuesVARVariance NMISSNumber Missing ValuesPROBTProbability for Student’s t MEDIAN (or P50)MedianTStudent’s t Q1 (P25)25% QuantileQ3 (P75)75% Quantile P11% QuantileP55% Quantile P1010% QuantileP9090% Quantile P9595% QuantileP9999% Quantile

Means Procedure Optional SAS Statements VAR Variable1 Variable2; Specifies which numeric variables statistics will be produced for BY Variable1 Variable2; Calculates statistics for each combination of the BY variables Output out=output_data_set; Creates data set with the default statistics

Means Procedure Assignment Use PROC MEANS to calculate the mean and variance of the expenditure variable for each region

Means Procedure Solution proc sort data=state_data; by region; run; proc means data=state_data mean var; var expend; by region; run;

FREQ Procedure PROC FREQ is used to generate frequency tables Most common usage is create table showing the distribution of categorical variables General Form:PROC FREQ DATA=input_data_set; TABLE variable1*variable2*variable3/ ; RUN; Options LIST – prints cross tabulations in list format rather than grid MISSING – specifies that missing values should be included in the tabulations OUT=output_data_set – creates a data set containing frequencies, list format NOPRINT – suppress printing in the output window Use BY statement to get percentages within each category of a variable

FREQ Procedure Assignment Use PROC FREQ to find the number of states within each region

FREQ Procedure Solution proc freq data=state_data; table region; run;

Summary Procedures Questions/Comments

Statistical Analysis Procedures 1.Correlation – PROC CORR 2.Regression – PROC REG 3.Analysis of Variance – PROC ANOVA 4.Chi-square Test of Association – PROC FREQ 5.General Linear Models – PROC GENMOD

CORR Procedure PROC CORR is used to calculate the correlations between variables Correlation coefficient measures the linear relationship between two variables Values Range from -1 to 1 Negative correlation - as one variable increases the other decreases Positive correlation – as one variable increases the other increases 0 – no linear relationship between the two variables 1 – perfect positive linear relationship -1 – perfect negative linear relationship General Form:PROC CORR DATA=input_data_set VAR Variable1 Variable2; With Variable3; RUN;

CORR Procedure If the VAR and WITH statements are not used, correlation is computed for all pairs of numeric variables Options include SPEARMAN – computes Spearman’s rank correlations KENDALL – computes Kendall’s Tau coefficients

CORR Procedure Question: What is the correlation between the SAT Total variable and Expenditure variable? Is it significant? Based on previous exercises, which correlation coefficient should we use? Assignment:Use PROC CORR to find the correlation between the SAT Total variable and Expenditure Variable

CORR Procedure Solution If the normality assumption is valid proc corr data=state_data /; var total expend; run; If the normality assumption is not valid proc corr data=state_data spearman; var total expend; run;

REG Procedure PROC REG is used to fit linear regression models by least squares estimation One of many SAS procedures that can perform regression analysis Only continuous independent variables (Use GENMOD for categorical variables) General Form: PROC REG DATA=input_data_set MODEL dependent=independent1 independent2/ ; ; RUN; PROC REG statement options include PCOMIT=m - performs principle component estimation with m principle components CORR – displays correlation matrix for independent variables in the model

REG Procedure MODEL statement options include SELECTION= Specifies a model selection procedure be conducted – FORWARD, BACKWARD, and STEPWISE ADJRSQ - Computes the Adjusted R-Square MSE – Computes the Mean Square Error COLLIN – performs collinearity analysis CLB – computes confidence limits for parameter estimates ALPHA= Sets significance value for confidence and prediction intervals and tests

REG Procedure Optional statements include PLOT Dependent*Independent1 – generates plot of data

REG Procedure Assignment Use PROC REG to generate a multiple linear regression model Dependent Variable – SAT Total (total) Use Stepwise Selection  Possible Independent Variables –Average pupil to teacher ratio (PT_ratio) –Current expenditure per pupil (expend) –Estimated annual salary of teachers (salary) –Percentage of eligible students taking the SAT (students)

REG Procedure Solution proc reg data=state_data; model total=pt_ratio expend salary students/selection=stepwise; run;

ANOVA Procedure PROC ANOVA performs analysis of variance Designed for balanced data (PROC GLM used for unbalance data) Can handle nested and crossed effects and repeated measures General Form: PROC ANOVA DATA=input_data_set ; CLASS independent1 independent2; MODEL dependent=independent1 independent2; ; Run; Class statement must come before model statement, used to define classification variables

ANOVA Procedure Useful PROC ANOVA statement option – OUTSTAT=output_data_set Generates output data set that contains sums of squares, degrees of freedom, statistics, and p-values for each effect in the model Useful optional statement – MEANS independent1/ Used to perform multiple comparisons analysis Set to: TUKEY – Tukey’s studentized range test BON – Bonferroni t test T – pairwise t tests Duncan – Duncan’s multiple-range test Scheffe – Scheffe’s multiple comparison procedure

ANOVA Procedure Question:Are there significant differences between the Match SAT scores of students from different regions? If there are significant differences, which regions are different? Assignment:Use PROC ANOVA to determine if there are significant differences in the Math SAT variable between regions Perform multiple comparisons between regions using Tukey’s Adjustment

ANOVA Procedure Solution proc anova data=state_data; class region; model math=region; means region/tukey; run;

FREQ Procedure PROC FREQ can also be used to perform analysis with categorical data General Form:PROC FREQ DATA=input_data_set; TABLE variable1 variable2/ ; RUN; TABLE statement options include: AGREE – Tests and measures of classification agreement including McNemar’s test, Bowker’s test, Cochran’s Q test, and Kappa statistics CHISQ - Chi-square test of homogeneity and measures of association MEASURE - Measures of association include Pearson and Spearman correlation, gamma, Kendall’s Tau, Stuart’s tau, Somer’s D, lambda, odds ratios, risk ratios, and confidence intervals

GENMOD Procedure PROC GENMOD is used to estimate linear models in which the response is not necessarily normal Logistic and Poisson regression are examples of generalized linear models General Form: PROC GENMOD DATA=input_data_set; CLASS independent1; MODEL dependent = independent1 independent2/ dist= link= ; run;

GENMOD Procedure DIST = - specifies the distribution of the response variable LINK= - specifies the link function from the linear predictor to the mean of the response Example – Logistic Regression DIST = binomial LINK = logit Example – Poisson Regression DIST = poisson LINK = log

GENMOD Procedure Question:How do we model the probability of having a high total SAT average based on other variables in the dataset? Is the dependent variable normal, or does it have a different distribution? What link function would you specify? Assignment:Use PROC GENMOD to perform Logistic Regression on the work.state_data data set Dependent variable – upper_ind Independent variables –Average pupil to teacher ratio (PT_ratio) –Current expenditure per pupil (expend) –Estimated annual salary of teachers (salary) –Percentage of eligible students taking the SAT (students) –Region (region)

GENMOD Procedure Solution proc genmod data=state_data descending; class region; model upper_ind=pt_ratio expend salary students/dist=bin link=logit; run;

Statistical Analysis Procedures Questions/Comments

Attendee Questions If time permits