Introduction to SAS. What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a.

Slides:



Advertisements
Similar presentations
Summary Statistics/Simple Graphs in SAS/EXCEL/JMP.
Advertisements

Technology Short Courses: Spring 2010 Kentaka Aruga
EViews Student Version. Today’s Workshop Basic grasp of how EViews manages data Creating Workfiles Importing data Running regressions Performing basic.
Topic 12: Multiple Linear Regression
Exercise 7.5 (p. 343) Consider the hotel occupancy data in Table 6.4 of Chapter 6 (p. 297)
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Applied Econometrics Second edition
Forecasting Using the Simple Linear Regression Model and Correlation
Introduction to SAS Programming Christina L. Ughrin Statistical Software Consulting Some notes pulled from SAS Programming I: Essentials Training.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
EPI 809/Spring Probability Distribution of Random Error.
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Today: Run SAS programs on Saturn (UNIX tutorial) Runs SAS programs on the PC.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
Multiple regression analysis
1 Pertemuan 13 Uji Koefisien Korelasi dan Regresi Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
More Linear Regression Outliers, Influential Points, and Confidence Interval Construction.
Data mining and statistical learning, lecture 3 Outline  Ordinary least squares regression  Ridge regression.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Chapter 7 Forecasting with Simple Regression
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Simple Linear Regression Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
SAS PROC REPORT PROC TABULATE
Review of Econ424 Fall –open book –understand the concepts –use them in real examples –Dec. 14, 8am-12pm, Plant Sciences 1129 –Vote Option 1(2)
Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Topic 2: An Example. Leaning Tower of Pisa Construction began in 1173 and by 1178 (2 nd floor), it began to sink Construction resumed in To compensate.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
Chapter 1: Introduction to SAS  SAS programs: A sequence of statements in a particular order  Rules for SAS statements: –Every SAS statement ends in.
Quantitative Methods in Finance (FINA 514/614) Assoc.Prof.Dr. Salih KATIRCIOGLU (Ph.D in Economic Development and International Economics)
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Haas MFE SAS Workshop Lecture 3: Peng Liu Haas School.
Lecture 3 Topic - Descriptive Procedures Programs 3-4 LSB 4:1-4.4; 4:9:4:11; 8:1-8:5; 5:1-5.2.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Analysis Introduction Data files, SPSS, and Survey Statistics.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
FORMAT statements can be used to change the look of your output –if FORMAT is in the DATA step, then the formats are permanent and stored with the dataset.
Customize SAS Output Using ODS Joan Dong. The Output Delivery System (ODS) gives you greater flexibility in generating, storing, and reproducing SAS procedure.
Multiple Imputation using SAS Don Miller 812 Oswald Tower
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Experimental Statistics - week 9
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
DEPARTMENT OF COMPUTER SCIENCE
Linear Regression.
6-1 Introduction To Empirical Models
Introduction to SAS A SAS program is a list of SAS statements executed in order Every SAS statement ends with a semicolon! SAS statements can be in caps.
Eviews Tutorial for Labor Economics Lei Lei
Producing Descriptive Statistics
Multiple Linear Regression
Presentation transcript:

Introduction to SAS

What is a data set? A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question.

There are three types of datasets Cross-sectional Time-Series Panel (combination of cross-sectional time- series data sets)

Cross-Sectional Data Cross-sectional data refers to data collected by observing many subjects (such as individuals, firms or countries/regions) at the same point of time, or without regard to differences in time. MembersAgeWageYears of schooling John40100k14 Paul34110k17 Mary2875k10 Tom30130k16 Sara3750k15

Time-Series Data A time series is a sequence of data points, measured typically at successive times spaced at uniform time intervals. Frequencies: daily, weekly, monthly, quarterly, annual YearGDP xyzInflation Rate

Panel Data Panel data, also called longitudinal data or cross-sectional time series data, are data where multiple cases (people, firms, countries etc) were observed at two or more time periods. PersonYearIncomeAgeSex

What should you know about your dataset? What type of dataset do you have? How many variables do you have? How many observations do you have? What kind of variables do you have? – Numeric. numerical variable is an observed response that is a numerical value – String. A string variable is any combination of one or more characters. Are there missing values?

How to store your dataset? Microsoft Excel Spreadsheets

Accessing SAS Version 9.2 Click on ENGLISH 9.2

1. What does SAS look like? EDITOR WINDOW LOG WINDOW OUTPUT WINDOW RESULTS WINDOW EXPLORER WINDOW EXECUTE THE PROGRAM NEW LIBRARIES

Anatomy of a SAS Program (1)Data name statement (2)Input statement (list of all variables to be read into the program) (3)Transformation statements (4)Datalines statement (copy & paste from Excel) (5)Placement of data (6)PROC statements – Means – Corr – Reg – Model – Autoreg (7) Run Statement

Examples

Spaghetti Sauce Program Data set name Placement of data after the datalines statement Input statement

Need this statement after the data No date will appear on the output

Model Statement print Creation of a data set named datareg which contains the predicted values of the dependent variable and the residuals Test of normality of the residuals autoreg also produces AIC, SIC, and within sample MAE, MAPE, and RMSE. Confidence intervals associated with the estimated coefficients Square of partial correlation coefficients

Statistics in SAS Use PROC MEANS or PROC CORR Proc Means Data = ??? N mean median std min max cv skewness kurtosis var var_name1 var_name2…;

Regression in SAS Use PROC REG or PROC MODEL Simple and Multiple Regression

Using SAS PROC REG for Simple Linear Regression The general syntax for PROC REG is – PROC REG ; ; The most commonly used options are: – DATA=datsetname Specifies dataset – SIMPLE Displays descriptive statistics The most commonly used statements are: – MODEL dependentvar = independentvar ; Specifies the variable to be predicted (dependentvar) and the variable that is the predictor (independentvar) Several MODEL options are available.

Example Proc reg data = spaghettisauce Model qprego = pprego/Pr cli dwprob;

SSR SSE SST R2R2

Test of normality of residuals

residual predicted variables

Confidence limits of parameter estimates square of partial correlation coefficients

Using SAS PROC REG for Multiple Linear Regression The general syntax for PROC REG is – PROC REG ; ; The most commonly used options are: – DATA=datsetname Specifies dataset – SIMPLE Displays descriptive statistics The most commonly used statements are: – MODEL dependentvar = independentvar Specifies the variable to be predicted (dependentvar) and the variables that are the predictors (independentvars)

MODEL STATEMENT OPTIONS (Place after slash following the list of explanatory variables.) PRequests a table containing predicted values from the model RRequests that the residuals be analyzed. CLIRequests the 95 percent upper and lower confidence limits for an individual value of the dependent variable.

Example

Transformation statements

Square of partial correlation coefficients SSR SSE SST R2R2

R2R2