Today: March 7 Data Transformations Rank Tests for Non-Normal data Solutions for Assignment 4.

Slides:



Advertisements
Similar presentations
I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
Advertisements

Statistical Techniques I EXST7005 Start here Measures of Dispersion.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Data: Crab mating patterns Data: Typists (Poisson with random effects) (Poisson Regression, ZIP model, Negative Binomial) Data: Challenger (Binomial with.
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test.
EPI 809/Spring Probability Distribution of Random Error.
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Multiple regression analysis
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions.
Chapter 11: Inferential methods in Regression and Correlation
Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions.
Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)
Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
1 Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Consider the variable.
Dr. Michael R. Hyman, NMSU Statistics for a Single Measure (Univariate)
Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Standard Deviation Interquartile Range (IQR)
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Week 3 Topic - Descriptive Procedures Program 3 in course notes Cody & Smith (Chapter 2)
Regression and Correlation Methods Judy Zhong Ph.D.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
LECTURE 8 Thursday, 19 February STA291 Fall 2008.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Statistics 1 Measures of central tendency and measures of spread.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
HLTH 653 Lecture 2 Raul Cruz-Cano Spring Statistical analysis procedures Proc univariate Proc t test Proc corr Proc reg.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
Adventures in ODS: Producing Customized Reports Using Output from Multiple SAS® Procedures Stuart Long Westat, Durham,
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Copyright © Cengage Learning. All rights reserved. 12 Simple Linear Regression and Correlation
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
5-5 Inference on the Ratio of Variances of Two Normal Populations The F Distribution We wish to test the hypotheses: The development of a test procedure.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Class 23 The most over-rated statistic The four assumptions The most Important hypothesis test yet Using yes/no variables in regressions.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Agenda Descriptive Statistics Measures of Spread - Variability.
March 28, 30 Return exam Analyses of covariance 2-way ANOVA Analyses of binary outcomes.
AOV Assumption Checking and Transformations (§ )
Topic 23: Diagnostics and Remedies. Outline Diagnostics –residual checks ANOVA remedial measures.
1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics.
PSYC 3030 Review Session April 19, Housekeeping Exam: –April 26, 2004 (Monday) –RN 203 –Use pencil, bring calculator & eraser –Make use of your.
Lecture 3 Topic - Descriptive Procedures Programs 3-4 LSB 4:1-4.4; 4:9:4:11; 8:1-8:5; 5:1-5.2.
Simple Linear Regression. Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Linear Models Alan Lee Sample presentation for STATS 760.
Experimental Statistics - week 3
Lecture 4 Ways to get data into SAS Some practice programming
Inference on Two Population DATA CARLSBAD; INPUT YEAR COUNT CARDS;
Customize SAS Output Using ODS Joan Dong. The Output Delivery System (ODS) gives you greater flexibility in generating, storing, and reproducing SAS procedure.
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
Experimental Statistics - week 9
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
Lesson 10 - Topics SAS Procedures for Standard Statistical Tests and Analyses Programs 19 and 20 LSB 8:16-17.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
Today: Feb 28 Reading Data from existing SAS dataset One-way ANOVA
Let’s Get It Straight! Re-expressing Data Curvilinear Regression
This Week Review of estimation and hypothesis testing
Shape of Distributions
Adequacy of Linear Regression Models
Presentation transcript:

Today: March 7 Data Transformations Rank Tests for Non-Normal data Solutions for Assignment 4

Transformations ANOVA and Regression Common assumptions; normality, constant variance, linear relationship What if these aren’t true? –One method - transform your data to help meet the necessary assumptions (choose a different scale of measurement)

Transformations Common transformations: log e, log 10, square root, inverse Steps: Choose your transformation Re-check assumptions (residual plot) Perform inference on transformed data Miles/hour to hours per mile

PROC CONTENTS OUTPUT The CONTENTS Procedure Data Set Name: TOMHS.BPSTUDY Observations: 902 Member Type: DATA Variables: 16 Engine: V8 Indexes: 0 Created: 9:07 Saturday, February 26, 2005 Observation Length: 128 Last Modified: 9:07 Saturday, February 26, 2005 Deleted Observations: Alphabetic List of Variables and Attributes----- # Variable Type Len Pos AGE Num CHOL12 Num GROUP Num HDL12 Num PULSE12 Num PULSEBL Num SBP12 Num SBPBL Num SEX Num TRIG12 Num WT12 Num WTBL Num cholbl Num hdlbl Num id Char trigbl Num Triglycerides distributions are typically skewed

The UNIVARIATE Procedure Variable: TRIG12 Histogram # Boxplot 530+* 1 *..* 1 * * 1 *..* 1 *.** 3 * 330+* 1 0.** 3 0.* 2 0.*** *** 5 0.******* 13 |.********** 19 |.********* 18 |.************** 28 | 130+*************************** ***************************** 58 | + |.********************************************* 89 *-----*.********************************************* ******************************* 62 | 30+********* 18 |

The UNIVARIATE Procedure The UNIVARIATE Procedure Variable: TRIG12 Normal Probability Plot 530+ * | | * | 430+ | * | | * | ** 330+ * | ** | * ++ | ** **++ | *** | +*** | ++*** | +++*** ***** | ++**** | ****** | ******* | *********++ 30+********

The UNIVARIATE Procedure Trig12 Moments N 472 Sum Weights 472 Mean Sum Observations Std Deviation Variance Skewness Kurtosis Uncorrected SS Corrected SS Coeff Variation Std Error Mean Basic Statistical Measures Location Variability Mean Std Deviation Median Variance 4217 Mode Range Interquartile Range Tests for Normality Test --Statistic p Value Shapiro-Wilk W Pr < W < Kolmogorov-Smirnov D Pr > D <0.0100

Taking LOG Transformation – Base 10 Xlog 10 X Takes small values of X and spreads them out and takes large values of X and brings them closer together. DATA temp; SET tomhs.bpstudy; logtrig12 = log10(trig12); logtrig12x = log (trig12); Natural log;

The UNIVARIATE Procedure The UNIVARIATE Procedure Variable: logtrig12 Histogram # Boxplot 2.75+* 1 0.* 2 0.*** 6 |.************ 23 |.********************* 41 |.****************************** ************************************** 76 | |.********************************************* 89 *--+--*.********************************** ************************** 52 |.**************** 31 |.******* 13 |.* 2 | 1.35+** 3 | * may represent up to 2 counts

The UNIVARIATE Procedure Normal Probability Plot * | * | ****+ | **+++ | ***** | ****** ***** | ****** | ***** | ****** |+** *

The UNIVARIATE Procedure Variable: logtrig12 Moments N 472 Sum Weights 472 Mean Sum Observations Std Deviation Variance Skewness Kurtosis Uncorrected SS Corrected SS Coeff Variation Std Error Mean Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode Range Interquartile Range Tests for Normality Test --Statistic p Value Shapiro-Wilk W Pr < W Kolmogorov-Smirnov D Pr > D >0.1500

TOMHS Study 6 Treatment groups (Variable GROUP) –Beta-blocker –Calcium channel blocker –Diuretic –Alpha-blocker –ACE inhibitor –Placebo –All Treatments given lifestyle intervention to lower BP

TOMHS Triglyceride Analyses 3 Treatment groups (Variable GROUP) –Beta-blocker –Diuretic –Placebo Beta-blockers may increase triglycerides

LIBNAME tomhs 'C:\my documents\ph5415\'; DATA temp; SET tomhs.bpstudy; logtrig12 = log10(trig12); logtrig12x = log(trig12); if group in(1,3,6); Select only group 1, 3, and 6

PROC GLM; CLASS group; MODEL trig12 logtrig12 logtrig12x = group; Dependent Variable: TRIG12 Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total Dependent Variable: logtrig12 (Analyses Using LOG Scale – Base 10) Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total

PROC GLM; CLASS group; MODEL trig12 logtrig12 logtrig12x = group; Dependent Variable: logtrig12 (Analyses Using LOG Scale – Base 10) Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total Dependent Variable: logtrig12x (Analyses Using LOG Scale - Base e) Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total

PROC GLM; CLASS group; MODEL trig12 logtrig12 logtrig12x = group; MEANS group; ESTIMATE 'BB vs Diur' group ; ESTIMATE 'BB vs Plac' group ; The GLM Procedure Level of TRIG logtrig logtrig12x GROUP N Mean Std Dev Mean Std Dev Mean Std Dev Note SDs are much closer between groups in log scale

PROC GLM; CLASS group; MODEL trig12 logtrig12 logtrig12x = group; MEANS group; ESTIMATE 'BB vs Diur' group ; ESTIMATE 'BB vs Plac' group ; Dependent Variable: TRIG12 Standard Parameter Estimate Error t Value Pr > |t| BB vs Diur BB vs Plac Dependent Variable: logtrig12 Standard Parameter Estimate Error t Value Pr > |t| BB vs Diur BB vs Plac Dependent Variable: logtrig12x Standard Parameter Estimate Error t Value Pr > |t| BB vs Diur BB vs Plac

Interpretation of Differences Using Natural Log Scale (Base e) Standard Parameter Estimate Error t Value Pr > |t| BB vs Diur BB vs Plac indicates that BB increases triglycerides by approximately 7.45% compared to diuretic indicates that BB increases trigycerides by approximately 15.2% compared to placebo More precise estimate is 100*(exp(0.074) – 1) = 7.7% More precise estimate is 100*(exp(0.152) – 1) = 16.4%

USING WILCOXON RANK TEST Each point is given score from 1 to n. Analyses is done on these ranked values PROC NPAR1WAY WILCOXON; CLASS group; VAR trig12; RUN; The NPAR1WAY Procedure Wilcoxon Scores (Rank Sums) for Variable TRIG12 Classified by Variable GROUP Sum of Expected Std Dev Mean GROUP N Scores Under H0 Under H0 Score ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Average scores were used for ties. Kruskal-Wallis Test Chi-Square DF 2 Pr > Chi-Square