Distribution of X: (nknw096) data toluca; infile 'H:\CH01TA01.DAT'; input lotsize workhrs; seq=_n_; proc print data=toluca; run; Obslotsizeworkhrsseq 1803991.

Slides:



Advertisements
Similar presentations
I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
Advertisements

Statistical Techniques I EXST7005 Start here Measures of Dispersion.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Topic 9: Remedies.
Introduction to Predictive Modeling with Examples Nationwide Insurance Company, November 2 D. A. Dickey.
Shape of Normal Curves. 68%-95%-99.7% Rule Areas under Normal Curve.
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
CS Example: General Linear Test (cs2.sas)
Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test.
EPI 809/Spring Probability Distribution of Random Error.
Topic 3: Simple Linear Regression. Outline Simple linear regression model –Model parameters –Distribution of error terms Estimation of regression parameters.
Partial Regression Plots. Life Insurance Example: (nknw364.sas) Y = the amount of life insurance for the 18 managers (in $1000) X 1 = average annual income.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Multiple regression analysis
Matrix A matrix is a rectangular array of elements arranged in rows and columns Dimension of a matrix is r x c  r = c  square matrix  r = 1  (row)
Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)
Descriptive Statistics In SAS Exploring Your Data.
Regression Diagnostics Using Residual Plots in SAS to Determine the Appropriateness of the Model.
1 Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Consider the variable.
Example: Price Analysis for Diamond Rings in Singapore Variables Response Variable: Price in Singapore dollars (Y) Explanatory Variable: Weight of diamond.
EPI809/Spring Testing Individual Coefficients.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Week 3 Topic - Descriptive Procedures Program 3 in course notes Cody & Smith (Chapter 2)
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Topic 2: An Example. Leaning Tower of Pisa Construction began in 1173 and by 1178 (2 nd floor), it began to sink Construction resumed in To compensate.
4-5 Inference on the Mean of a Population, Variance Unknown Hypothesis Testing on the Mean.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
HLTH 653 Lecture 2 Raul Cruz-Cano Spring Statistical analysis procedures Proc univariate Proc t test Proc corr Proc reg.
Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General.
Lecture 4 SIMPLE LINEAR REGRESSION.
Statistics Frequency and Distribution. We interrupt this lecture for the following… Significant digits You should not report numbers with more significant.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Chapter 3: Diagnostics and Remedial Measures
Nonlinear Models. Learning Example: knnl533.sas Y = relative efficiency of production of a new product (1/expected cost) X 1 : Location A : X 1 = 1, B:
Topic 17: Interaction Models. Interaction Models With several explanatory variables, we need to consider the possibility that the effect of one variable.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
1 Fourier Series (Spectral) Analysis. 2 Fourier Series Analysis Suitable for modelling seasonality and/or cyclicalness Identifying peaks and troughs.
Topic 6: Estimation and Prediction of Y h. Outline Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band.
Hormone Example: nknw892.sas Y = change in growth rate after treatment Factor A = gender (male, female) Factor B = bone development level (severely depressed,
Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures.
Topic 25: Inference for Two-Way ANOVA. Outline Two-way ANOVA –Data, models, parameter estimates ANOVA table, EMS Analytical strategies Regression approach.
Bread Example: nknw817.sas Y = number of cases of bread sold (sales) Factor A = height of shelf display (bottom, middle, top) Factor B = width of shelf.
Topic 26: Analysis of Covariance. Outline One-way analysis of covariance –Data –Model –Inference –Diagnostics and rememdies Multifactor analysis of covariance.
Lecture 3 Topic - Descriptive Procedures Programs 3-4 LSB 4:1-4.4; 4:9:4:11; 8:1-8:5; 5:1-5.2.
ANOVA: Graphical. Cereal Example: nknw677.sas Y = number of cases of cereal sold (CASES) X = design of the cereal package (PKGDES) r = 4 (there were 4.
Simple Linear Regression. Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Linear Models Alan Lee Sample presentation for STATS 760.
2-1 Data Summary and Display Population Mean For a finite population with N measurements, the mean is The sample mean is a reasonable estimate of.
Sigmoidal Response (knnl558.sas). Programming Example: knnl565.sas Y = completion of a programming task (1 = yes, 0 = no) X 2 = amount of programming.
Today: March 7 Data Transformations Rank Tests for Non-Normal data Solutions for Assignment 4.
Example 1 (knnl917.sas) Y = months survival Treatment (3 levels)
© Willett, Harvard University Graduate School of Education, 1/28/2016S010Y/C09 – Slide 1 S010Y: Answering Questions with Quantitative Data Class 9/III.2:
Customize SAS Output Using ODS Joan Dong. The Output Delivery System (ODS) gives you greater flexibility in generating, storing, and reproducing SAS procedure.
Experimental Statistics - week 9
Interview Example: nknw964.sas Y = rating of a job applicant Factor A represents 5 different personnel officers (interviewers) n = 4.
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
© 2012 W.H. Freeman and Company Lecture 2 – Aug 29.
This Week Review of estimation and hypothesis testing
Lesson 3 Overview Descriptive Procedures Controlling SAS Output
Multiple Linear Regression
Shape of Distributions
2-1 Data Summary and Display 2-1 Data Summary and Display.
Presentation transcript:

Distribution of X: (nknw096) data toluca; infile 'H:\CH01TA01.DAT'; input lotsize workhrs; seq=_n_; proc print data=toluca; run; Obslotsizeworkhrsseq

Distribution of X: Descriptive proc univariate data=toluca plot; var lotsize workhrs; run;

Distribution of X: Descriptive (1) Moments N25Sum Weights25 Mean70Sum Observations1750 Std Deviation Variance825 Skewness Kurtosis Uncorrected SS142300Corrected SS19800 Coeff Variation Std Error Mean Basic Statistical Measures LocationVariability Mean Std Deviation Median Variance Mode Range Interquartile Range

Distribution of X: Descriptive (2) Tests for Location: Mu0=0 TestStatisticp Value Student's tt Pr > |t|<.0001 SignM12.5Pr >= |M|<.0001 Signed RankS162.5Pr >= |S|<.0001 Quantiles (Definition 5) QuantileEstimateQuantileEstimate 100% Max1205%30 99%1201%20 95%1100% Min20 90%110 75% Q390 50% Median70 25% Q150 10%30

Distribution of X: Descriptive (3) Extreme Observations LowestHighest ValueObsValueObs

Distribution of X: Descriptive (4) Stem Leaf # Boxplot | | | | | *--+--* | | | | | Multiply Stem.Leaf by 10**+1

Distribution of X: Sequence plot title1 h=3 'Sequence plot for X with smooth curve'; symbol1 v=circle i=sm70; axis1 label=(h=2); axis2 label=(h=2 angle=90); proc gplot data=toluca; plot lotsize*seq/haxis=axis1 vaxis=axis2; run;

Distribution of X: QQPlot title1 'QQPlot (normal probability plot)'; proc univariate data=toluca noprint; qqplot lotsize workhrs / normal (L=1 mu=est sigma=est); run;

Quadratic: (nknw100quad.sas) title1 h=3 'Quadratic relationship'; data quad; do x=1 to 30; y=x*x-10*x+30+25*normal(0); output; end; proc reg data=quad; model y=x; output out=diagquad r=resid; run; Analysis of Variance SourceDFSum of Squares Mean Square F ValuePr > F Model <.0001 Error Corrected Total Root MSE R-Square0.8480

Quadratic: Example (cont) symbol1 v=circle i=rl; axis1 label=(h=2); axis2 label=(h=2 angle=90); proc gplot data=quad; plot y*x/haxis=axis1 vaxis=axis2; run;

Quadratic: Example (cont) symbol1 v=circle i=sm60; proc gplot data=quad; plot y*x/haxis=axis1 vaxis=axis2; run;

Quadratic: Example (cont)

proc gplot data=diagquad; plot resid*x/ vref=0 haxis=axis1 vaxis=axis2; run;

Quadratic: Example (cont)

Heteroscediastic: (nknw100het.sas) title1 h=3 'Heteroscedastic'; axis1 label=(h=2); axis2 label=(h=2 angle=90); Data het; do x=1 to 100; y=100*x+30+10*x*normal(0); output; end; proc reg data=het; model y=x; run; Analysis of Variance SourceDFSum of Squares Mean Square F ValuePr > F Model <.0001 Error Corrected Total Root MSE R-Square0.9700

Heteroscediastic: Example (cont) symbol1 v=circle i=sm60; proc gplot data=het; plot y*x/haxis=axis1 vaxis=axis2; run;

Heteroscediastic: Example (cont)

Outlier: Example1 (nknw100out.sas) title1 h=3 'Outlier at x=50'; axis1 label=(h=2); axis2 label=(h=2 angle=90); data outlier50; do x=1 to 100 by 5; y=30+50*x+200*normal(0); output; end; x=50; y=30+50* ; d='out'; output; proc print data=outlier50; run;

Outlier: Example1 (cont) Obsxyd out

Outlier: Example1 (cont) Code: Without outlier:With outlier: proc reg data=outlier50;proc reg data=outlier50; model y=x;model y=x; where d ne 'out'; Parameter Estimates (without outlier) VariableDFParameter Estimate Standard Error t ValuePr > |t| Intercept x <.0001 Root MSE R-Square Parameter Estimates (with outlier) VariableDFParameter Estimate Standard Error t ValuePr > |t| Intercept x Root MSE R-Square0.3052

Outlier: Example1 (cont) symbol1 v=circle i=rl; proc gplot data=outlier50; plot y*x/haxis=axis1 vaxis=axis2; run;

Outlier: Example2 (nknw100out.sas) title1 h=3 'Outlier at x=100'; data outlier100; do x=1 to 100 by 5; y=30+50*x+200*normal(0); output; end; x=100; y=30+50* ; d='out'; output; proc print data=outlier100; run;

Outlier: Example2 (cont) Code: Without outlier:With outlier: proc reg data=outlier100;proc reg data=outlier100; model y=x;model y=x; where d ne 'out'; Parameter Estimates (without outlier) VariableDFParameter Estimate Standard Error t ValuePr > |t| Intercept x <.0001 Root MSE R-Square Parameter Estimates (with outlier) VariableDFParameter Estimate Standard Error t ValuePr > |t| Intercept x Root MSE R-Square0.1276

Outlier: Example2 (cont) symbol1 v=circle i=rl; proc gplot data=outlier100; plot y*x/haxis=axis1 vaxis=axis2; run;

Toluca: Residual Plot (nknw106a.sas) title1 h=3 'Toluca Diagnostics'; data toluca; infile 'H:\My Documents\Stat 512\CH01TA01.DAT'; input lotsize workhrs; proc reg data=toluca; model workhrs=lotsize; output out=diag r=resid; run; symbol1 v=circle cv = red; axis1 label=(h=2); axis2 label=(h=2 angle=90); proc gplot data=diag; plot resid*lotsize/ vref=0 haxis=axis1 vaxis=axis2; run; quit;

Normality: Toluca (nknw106b.sas) title1 h=3 'Toluca Diagnostics'; data toluca; infile 'H:\My Documents\Stat 512\CH01TA01.DAT'; input lotsize workhrs; proc print data=toluca; run; proc reg data=toluca; model workhrs=lotsize; output out=diag r=resid; run; proc univariate data=diag plot normal; var resid; histogram resid / normal kernel; qqplot resid / normal (mu=est sigma=est); run;

Normality: Toluca (cont)

Normal: (nknw100norm.sas) %let mu = 0; %let sigma=10; title1 'Normal Distribution mu='&mu' sigma='σ data norm; do x=1 to 100; y=100*x+30+rand('normal',&mu,&sigma); output; end; proc reg data=norm; model y=x; output out=diagnorm r=resid; run; symbol1 v=circle i=none; proc univariate data=diagnorm plot normal; var resid; histogram resid / normal kernel; qqplot resid / normal (mu=est sigma=est); run;

Normal: (cont) Normal Distribution mu=0 sigma=10

Normality: failure (nknw100nnorm.sas) title1 'Right Skewed distribution'; data expo; do x=1 to 100; y=100*x+30+exp(2)*rand('exponential'); output; end; proc reg data=expo; model y=x; output out=diagexpo r=resid; run; symbol1 v=circle i=none; proc univariate data=diagexpo plot normal; var resid; histogram resid / normal kernel; qqplot resid / normal (mu=est sigma=est); run;

Normality: right skewed (cont)

Normality: left skewed (cont)

Normality: long tailed (cont)

Normality: short tailed (cont)

Normality: nongraphical proc univariate data=diagy normal; var resid; run; Toluca: Tests for Normality TestStatisticp Value Shapiro-WilkW Pr < W Kolmogorov-SmirnovD Pr > D> Cramer-von MisesW-Sq Pr > W-Sq> Anderson-DarlingA-Sq Pr > A-Sq>0.2500

Normality (nongraphical) cont. Toluca right skewedleft skewedlong tailedshort tailed TeststatP P P P P Shapiro-Wilk < < < <0.01 Kolmogorov- Smirnov 0.10> < < < Cramer- von Mises 0.03> < < < <0.01 Anderson- Darling 0.21> < < < <0.01

Normality (nongraphical): Right Skewed Tests for Normality TestStatisticp Value Shapiro-WilkW Pr < W< Kolmogorov-SmirnovD Pr > D< Cramer-von MisesW-Sq Pr > W-Sq< Anderson-DarlingA-Sq Pr > A-Sq<0.0050

Normality (nongraphical): Left Skewed Tests for Normality TestStatisticp Value Shapiro-WilkW Pr < W< Kolmogorov-SmirnovD Pr > D< Cramer-von MisesW-Sq Pr > W-Sq< Anderson-DarlingA-Sq Pr > A-Sq<0.0050

Normality (nongraphical): Long Tailed Tests for Normality TestStatisticp Value Shapiro-WilkW Pr < W< Kolmogorov-SmirnovD Pr > D< Cramer-von MisesW-Sq Pr > W-Sq< Anderson-DarlingA-Sq Pr > A-Sq<0.0050

Normality (nongraphical): Short Tailed Tests for Normality TestStatisticp Value Shapiro-WilkW Pr < W Kolmogorov-SmirnovD Pr > D Cramer-von MisesW-Sq Pr > W-Sq Anderson-DarlingA-Sq Pr > A-Sq<0.0050

Transformations (X)

Transformations (Y) Y’ = Y’ = log 10 Y Y’ = 1/Y Note: a simultaneous transformation on X may also be helpful or necessary.

Equations for Box-Cox Procedure where

Box-Cox: Plasma (boxcox.sas) Y = Plasma level of polyamine X = Age of healthy children n = 25

Box-Cox: Example (Input) data orig; input age plasma cards; ; proc print data=orig; run; Obsageplasma

Box-Cox: Example (Y vs. X) title1 h=3'Original Variables'; axis1 label=(h=2); axis2 label=(h=3 angle=90); symbol1 v=circle i=rl; proc gplot data=orig; plot plasma*age/haxis=axis1 vaxis=axis2; run;

Box-Cox: Example (regression) proc reg data=orig; model plasma=age; output out = notrans r = resid; run; Analysis of Variance SourceDFSum of Squares Mean Square F ValuePr > F Model <.0001 Error Corrected Total Root MSE R-Square0.7532

Box-Cox: Example (resid vs. X) symbol1 i=sm70; proc gplot data = notrans; plot resid*age / vref = 0 haxis=axis1 vaxis=axis2;

Box-Cox: Example (QQPlot) proc univariate data=notrans noprint; var resid; histogram resid/normal kernel; qqplot resid/normal (mu = est sigma=est); run;

Box-Cox: Example (find transformation) proc transreg data = orig; model boxcox(plasma)=identity(age); run;

Box-Cox: Example (calc transformation) title1 'Transformed Variables'; data trans; set orig; logplasma = log(plasma); rsplasma = plasma**(-0.5); proc print data = trans; run;

Box-Cox: Log (Y vs. X) symbol1 i=rl; proc gplot data = logtrans; plot logplasma * age/haxis=axis1 vaxis=axis2; run;

Box-Cox: Log (regression) proc reg data = trans; model logplasma = age; output out = logtrans r = logresid; run; Analysis of Variance SourceDFSum of Squares Mean Square F ValuePr > F Model <.0001 Error Corrected Total Root MSE R-Square0.8535

Box-Cox: Log(resid vs. X) symbol1 i=sm70; proc gplot data = logtrans; plot logresid * age / vref = 0 haxis=axis1 vaxis=axis2;

Box-Cox: Log(QQPlot) proc univariate data=logtrans noprint; var logresid; histogram logresid/normal kernel; qqplot logresid/normal (L=1 mu = est sigma = est); run;

Box-Cox: Log(QQPlot (cont))

Box-Cox: Reciprocal Sq. Rt. (Y vs. X) title1 h=3 'Reciprocal Square Root Transformation'; symbol1 i=rl; proc gplot data = trans; plot rsplasma * age/haxis=axis1 vaxis=axis2; run;

Box-Cox: Reciprocal Sq. Rt. (regression) proc reg data = trans; model rsplasma = age; output out = rstrans r = rsresid; run; Analysis of Variance SourceDFSum of Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square0.8665

Box-Cox: Reciprocal Sq. Rt. (resid vs. X) symbol1 i=sm70; proc gplot data = rstrans; plot rsresid * age / vref = 0 haxis=axis1 vaxis=axis2;

Box-Cox: Reciprocal Sq. Rt. (QQPlot) proc univariate data=rstrans noprint; var rsresid; histogram rsresid/normal kernel; qqplot/normal (L=1 mu = est sigma = est); run;

Box-Cox: Reciprocal Sq. Rt. (QQPlot, cont)

Calculation of t c : (knnl155.sas) data tcrit; alpha = 0.05; n = 25; g = 2; percentile = 1 - alpha/g/2; df = n - 2; tcrit = tinv(percentile,df); run; proc print data=tcrit; run; Obsalphangpercentiledftcrit

Calculation of S: (knnl155.sas) data Scheffe; alpha = 0.05; n = 25; g = 2; percentile = 1 - alpha; dfn = g; dfd = n - 2; S = sqrt(2*Finv(percentile,dfn,dfd)); proc print data=Scheffe; run; ObsalphangpercentiledfndfdS