Haas MFE SAS Workshop Lecture 3: Peng Liu Haas School.

Slides:



Advertisements
Similar presentations
Haas MFE SAS Workshop Lecture 3:
Advertisements

Technology Short Courses: Spring 2010 Kentaka Aruga
I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
Statistical Techniques I EXST7005 Start here Measures of Dispersion.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Analysis of variance (ANOVA)-the General Linear Model (GLM)
EPI 809/Spring Probability Distribution of Random Error.
Presentation and Data  Short Courses  Intro to SAS  Download Data to Desktop 1.
Computing the ranks of data is only one of several possible so- called scoring methods that are in use... Section 2.7 reviews three of them – we’ll look.
Overview of Logistics Regression and its SAS implementation
Lecture 10 Non Parametric Testing STAT 3120 Statistical Methods I.
Multiple regression analysis
Experimental Design & Analysis
Final Review Session.
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Leedy and Ormrod Ch. 11 Gray Ch. 14
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Simple Linear Regression
Review of Econ424 Fall –open book –understand the concepts –use them in real examples –Dec. 14, 8am-12pm, Plant Sciences 1129 –Vote Option 1(2)
Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Statistics Definition Methods of organizing and analyzing quantitative data Types Descriptive statistics –Central tendency, variability, etc. Inferential.
USING SAS PROCEDURES SAS System Options OPTIONS Statement
Introduction to SAS Essentials Mastering SAS for Data Analytics
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
HLTH 653 Lecture 2 Raul Cruz-Cano Spring Statistical analysis procedures Proc univariate Proc t test Proc corr Proc reg.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
1 Experimental Statistics - week 2 Review: 2-sample t-tests paired t-tests Thursday: Meet in 15 Clements!! Bring Cody and Smith book.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Linear correlation and linear regression + summary of tests
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Introduction to Multiple Imputation CFDR Workshop Series Spring 2008.
 Muhamad Jantan & T. Ramayah School of Management, Universiti Sains Malaysia Data Analysis Using SPSS.
Lecture 3 Topic - Descriptive Procedures Programs 3-4 LSB 4:1-4.4; 4:9:4:11; 8:1-8:5; 5:1-5.2.
Xuhua Xia Correlation and Regression Introduction to linear correlation and regression Numerical illustrations SAS and linear correlation/regression –CORR.
Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
1 Statistical Software Programming. STAT 6360 –Statistical Software Programming Sorting, Printing, Summarizing Data Now that we can input data and do.
Statistics as a Tool A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.
Lecture 4 Ways to get data into SAS Some practice programming
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Computing the ranks of data is only one of several possible so-called scoring methods that are in use... Section 2.7 reviews three of them – we’ll look.
Multiple Imputation using SAS Don Miller 812 Oswald Tower
Chapter 8: Using Basic Statistical Procedures “33⅓% of the mice used in the experiment were cured by the test drug; 33⅓% of the test population were unaffected.
Nonparametric Statistics
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Data Workshop H397. Data Cleaning  Inputting data  Missing Values  Converting String Variables  Creating Scales  Creating Dummy Variables.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
EHS 655 Lecture 4: Descriptive statistics, censored data
BINARY LOGISTIC REGRESSION
Applied Business Forecasting and Regression Analysis
제 5장 기술통계 및 추론 PROC MEANS 절차 PROC MEANS <options> ;
Data Analysis Using SAS
Typical biostatistics tasks
SA3202 Statistical Methods for Social Sciences
6-1 Introduction To Empirical Models
What is Regression Analysis?
Outline READ DATA DATA STEPS MERGE DATA PROC IMPORT
Producing Descriptive Statistics
Let’s continue to review some of the statistics you’ve learned in your first class: Bivariate analyses (two variables measured at a time on each observation)
Presentation transcript:

Haas MFE SAS Workshop Lecture 3: Peng Liu Haas School of Business, Berkeley, MFE 2006

Commonly used PROCedures in Financial Economics Peng Liu Haas School of Business, Berkeley, MFE 2006

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 3 Basic Statistical Analysis Univariate statistics PROC MEANS; PROC UNIVARIATE; PROC FREQ; Bivariate and Multivariate Statistics PROC CORR; PROC NPAR1WAY; PROC TTEST;

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 4 Comparison of PROC MEANS and PROC UNIVARIATE PROC MEANS DESCRIPTIVE STATISTICS CLM CSS CV KURTOSIS LCLM MAX MEAN MIN N NMISS RANGE SKEWNESS STD STDERR SUM SUMWGT UCLM USS VAR QUANTILE STATISTICS MEDIAN|P50 Q1|P25 Q3|P75 P1 P5 P10 P90 P95 P99 RANGE HYPOTHESIS TESTING PROBT T PROC UNIVARIATE DESCRIPTIVE STATISTICS CSS CV KURTOSIS MAX MEAN MIN MODE N NMISS RANGE SKEWNESS STD STDMEAN SUM SUMWGT USS VAR QUANTILE STATISTICS MEDIAN| P1 P5 P10 P90 P95 P99 Q1 Q3 RANGE QUANTILE STATISTICS NORMAL PROBN MSIGN PROBM SIGNRANK PROBS T PROBT ROBUST STATISTICS GINI MAD QN SN STD_SINI STD_MAD STD_QN STD_QRANGE STD_SN

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 5 PROC MEANS PROC MEANS DATA=mfe.loan; VAR appraisal ltv; CLASS state; RUN; The default output for PROC MEANS are variable label N Mean Std Dev Min max median min max clm alpha=0.05 are examples of options you can specify. You can get summary statistics for many variables CLASS statements will produce summary stat for each grouping class. You can suppress print using NOPRINT option You can save the result in a self-defined sas dataset. PROC MEANS DATA=mfe.loan max min; VAR appraisal ltv; OUTPUT OUT=m max=maxvalue maxltv min=minvalue minltv; RUN;

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 6 PROC UNIVARIATE PROC UNIVARIATE DATA=mfe.loan ; VAR ltv; ID id; RUN; Use VAR to specify which variable you want to analyze, otherwise, this PROC will produce all variables Use ID to identify Extreme Observations, without ID statement it will use observation number by default Can plot histogram, quantile-quantile plots etc. Can do twosided T test, etc. PROC UNIVARIATE DATA=mfe.loan; VAR ltv; HISTOGRAM; QQPLOT /normal; RUN;

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 7 PROC FREQ PROC FREQ DATA=mfe.loan; TABLE term; RUN; One-way v.s two-way frequency table /CHISQ or /BINOMIAL option can be used to test equal proportion In one TABLE statement, you can produce more than one frequency tables You can suppress col percentage or/and row percentage by option /nocol norow PROC FREQ DATA=mfe.loan; TABLE state state*term/nocol norow; RUN;

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 8 PROC CORR PROC CORR DATA=mfe.loan; VAR rate ltv fico_orig; RUN; The CORR procedure computes Pearson correlation coefficients, three nonparametric measures of association (Spearman rank- oder correlation, Kendall’s taub and Hoeffding’s measure of dependence D), and the probabilities associated with these statistics for numeric variables; The default is Pearson correlation. COV option evolke the computation of covariance PROC CORR DATA=mfe.loan COV SPEARMAN; VAR rate ltv fico_orig; RUN;

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 9 PROC TTEST DATA; INPUT a b ; DATALINES; ; RUN; DATA step will produce automatic dataset, if user did not specify one. in INPUT lets SAS continuously read from datelines DATALINES; is a SAS statement followed by lines of raw data. Data are typed continuously separated by blank, you can separated into a different line in the way you like. ; should be stand by itself PROC step will perform specified procedure on current dataset in working directory if user did not specify a particular dataset name Paired T-Test PROC TTEST; PAIRED a*b; RUN;

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 10 PROC NPAR1WAY PROC NPAR1WAY DATA=mfe.loan; CLASS state; VAR ltv; RUN; NONPARAMETRIC TEST FOR DIFFERENCE ACROSS ONE- WAY CLASSIFICATION. IF the normality assumption does not hold, we may use some nonparametric tests. PROC NPAR1WAY performs nonparametric tests for location and scale differences across a one-way classiication, based on the following scores: Wilcoxin, Median, Van Der Waerden, Savage, Siegel-Tukey, Ansari-Bradley, Klotz, and Modd Scores.

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 11 Financial Econometrics using SAS Linear Models (OLS, GLS and their variates) PROC REG PROC GLM (Skip) Logistic Regression PROC LOGISTIC PROC GENMOD Hazard Regression (Cox-P.H.) PROC PHREG

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 12 Linear Model: Theory Data: ( y i, x i =(x i1, x i2, … x ik )) for i=1, …, n and y i  R Model: y i =  0 +  -1x i1 + … +  k x ik +  i for i=1,…,n For short y=X  +  where Assumption:  i are i.i.d. normal N(0,  2 ) Ordinary Least Square Estimation  = (X T X) -1 X T y

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 13 PROC REG PROC REG is a SAS procedure for simple or multivariate linear regression models with continuous dependent variables. Part of SAS/STAT Model fitting (parameters, residuals, confidence limits, influential statistics, etc) Model selection (forward, backward, stepwise,,etc) Hypothesis testing Model diagnostics Plotting Outputting estimates and statistics

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 14 PROC REG –Examples PROC REG DATA=mfe.loan; MODEL ltv = rate; PLOT ltv * rate; QUIT; Begin with PROC REG; end with QUIT; Multiple independent, dependent variables are separated by space; Label “OLS” is optional, useful for multiple MODEL statement in one PROC REG By default, a constant is included; Use /Options to request additional stat or specify model selection method; PLOT creates a scatter plot of your regression data and automatically adds the regression line. MODEL ltv = rate fico_orig; OLS:MODEL ltv term= rate fico_orig; MODEL ltv = rate fico_orig term/SELECTION=F;

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 15 Logistic Regression– Theory Data: ( y i, x i =(x i1, x i2, … x ik )) for i=1, …, n and y i is a binary or ordinal response variable. e.g. y i  {0,1} Model: Maximum Likelihood estimate of  Assumption: binomial Variation

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 16 Logistic Regression – SAS procedure SAS has several procedures that performs logistic regression, e.g. GENMOD, CATMOD and LOGISTIC PROC LOGISTIC Works for binary or ordinal response variables Performs MLE using different optimization algorithms 4 model selection methods: F, B, Stepwise, Score Outputs statistics to dataset Tests linear hypotheses of parameters

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 17 PROC LOGISTIC –Examples PROC LOGISTIC DATA=mfe.loan; CLASS state edu; MODEL default = ltv age edu term rate state/LINK=LOGIT; RUN; Begin with PROC LOGISTIC; end with QUIT; /LINK=LOGIT option can be ignored, other options: PROBIT, CLOGIT, CLOGLOG Use CLASS statement to avoid creating dummy in DATA step /option can be used to request additional stat, or specify selection method. TEST statement

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 18 Survival Analysis – Background 1

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 19 Survival Analysis – Background 2

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 20 Cox Proportional Hazard Regression

Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 21 PROC PHREG - Example PROC PHREG DATA=mfe.loan; MODEL loanage*prepay(0) = age edu race rate ltv fico_orig state; RUN; Use WHERE option to subset sample to want to regress You can define, group variables inside PHREG after MODEL using IF THEN ELSE Handling tied data: /TIES=EXACT, other option: DISCRETE Run PHREG for different group, use BY option, need to sort data. Use CLASS statement to create dummy variables