Today: March 7 Data Transformations Rank Tests for Non-Normal data Solutions for Assignment 4
Transformations ANOVA and Regression Common assumptions; normality, constant variance, linear relationship What if these aren’t true? –One method - transform your data to help meet the necessary assumptions (choose a different scale of measurement)
Transformations Common transformations: log e, log 10, square root, inverse Steps: Choose your transformation Re-check assumptions (residual plot) Perform inference on transformed data Miles/hour to hours per mile
PROC CONTENTS OUTPUT The CONTENTS Procedure Data Set Name: TOMHS.BPSTUDY Observations: 902 Member Type: DATA Variables: 16 Engine: V8 Indexes: 0 Created: 9:07 Saturday, February 26, 2005 Observation Length: 128 Last Modified: 9:07 Saturday, February 26, 2005 Deleted Observations: Alphabetic List of Variables and Attributes----- # Variable Type Len Pos AGE Num CHOL12 Num GROUP Num HDL12 Num PULSE12 Num PULSEBL Num SBP12 Num SBPBL Num SEX Num TRIG12 Num WT12 Num WTBL Num cholbl Num hdlbl Num id Char trigbl Num Triglycerides distributions are typically skewed
The UNIVARIATE Procedure Variable: TRIG12 Histogram # Boxplot 530+* 1 *..* 1 * * 1 *..* 1 *.** 3 * 330+* 1 0.** 3 0.* 2 0.*** *** 5 0.******* 13 |.********** 19 |.********* 18 |.************** 28 | 130+*************************** ***************************** 58 | + |.********************************************* 89 *-----*.********************************************* ******************************* 62 | 30+********* 18 |
The UNIVARIATE Procedure The UNIVARIATE Procedure Variable: TRIG12 Normal Probability Plot 530+ * | | * | 430+ | * | | * | ** 330+ * | ** | * ++ | ** **++ | *** | +*** | ++*** | +++*** ***** | ++**** | ****** | ******* | *********++ 30+********
The UNIVARIATE Procedure Trig12 Moments N 472 Sum Weights 472 Mean Sum Observations Std Deviation Variance Skewness Kurtosis Uncorrected SS Corrected SS Coeff Variation Std Error Mean Basic Statistical Measures Location Variability Mean Std Deviation Median Variance 4217 Mode Range Interquartile Range Tests for Normality Test --Statistic p Value Shapiro-Wilk W Pr < W < Kolmogorov-Smirnov D Pr > D <0.0100
Taking LOG Transformation – Base 10 Xlog 10 X Takes small values of X and spreads them out and takes large values of X and brings them closer together. DATA temp; SET tomhs.bpstudy; logtrig12 = log10(trig12); logtrig12x = log (trig12); Natural log;
The UNIVARIATE Procedure The UNIVARIATE Procedure Variable: logtrig12 Histogram # Boxplot 2.75+* 1 0.* 2 0.*** 6 |.************ 23 |.********************* 41 |.****************************** ************************************** 76 | |.********************************************* 89 *--+--*.********************************** ************************** 52 |.**************** 31 |.******* 13 |.* 2 | 1.35+** 3 | * may represent up to 2 counts
The UNIVARIATE Procedure Normal Probability Plot * | * | ****+ | **+++ | ***** | ****** ***** | ****** | ***** | ****** |+** *
The UNIVARIATE Procedure Variable: logtrig12 Moments N 472 Sum Weights 472 Mean Sum Observations Std Deviation Variance Skewness Kurtosis Uncorrected SS Corrected SS Coeff Variation Std Error Mean Basic Statistical Measures Location Variability Mean Std Deviation Median Variance Mode Range Interquartile Range Tests for Normality Test --Statistic p Value Shapiro-Wilk W Pr < W Kolmogorov-Smirnov D Pr > D >0.1500
TOMHS Study 6 Treatment groups (Variable GROUP) –Beta-blocker –Calcium channel blocker –Diuretic –Alpha-blocker –ACE inhibitor –Placebo –All Treatments given lifestyle intervention to lower BP
TOMHS Triglyceride Analyses 3 Treatment groups (Variable GROUP) –Beta-blocker –Diuretic –Placebo Beta-blockers may increase triglycerides
LIBNAME tomhs 'C:\my documents\ph5415\'; DATA temp; SET tomhs.bpstudy; logtrig12 = log10(trig12); logtrig12x = log(trig12); if group in(1,3,6); Select only group 1, 3, and 6
PROC GLM; CLASS group; MODEL trig12 logtrig12 logtrig12x = group; Dependent Variable: TRIG12 Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total Dependent Variable: logtrig12 (Analyses Using LOG Scale – Base 10) Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total
PROC GLM; CLASS group; MODEL trig12 logtrig12 logtrig12x = group; Dependent Variable: logtrig12 (Analyses Using LOG Scale – Base 10) Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total Dependent Variable: logtrig12x (Analyses Using LOG Scale - Base e) Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total
PROC GLM; CLASS group; MODEL trig12 logtrig12 logtrig12x = group; MEANS group; ESTIMATE 'BB vs Diur' group ; ESTIMATE 'BB vs Plac' group ; The GLM Procedure Level of TRIG logtrig logtrig12x GROUP N Mean Std Dev Mean Std Dev Mean Std Dev Note SDs are much closer between groups in log scale
PROC GLM; CLASS group; MODEL trig12 logtrig12 logtrig12x = group; MEANS group; ESTIMATE 'BB vs Diur' group ; ESTIMATE 'BB vs Plac' group ; Dependent Variable: TRIG12 Standard Parameter Estimate Error t Value Pr > |t| BB vs Diur BB vs Plac Dependent Variable: logtrig12 Standard Parameter Estimate Error t Value Pr > |t| BB vs Diur BB vs Plac Dependent Variable: logtrig12x Standard Parameter Estimate Error t Value Pr > |t| BB vs Diur BB vs Plac
Interpretation of Differences Using Natural Log Scale (Base e) Standard Parameter Estimate Error t Value Pr > |t| BB vs Diur BB vs Plac indicates that BB increases triglycerides by approximately 7.45% compared to diuretic indicates that BB increases trigycerides by approximately 15.2% compared to placebo More precise estimate is 100*(exp(0.074) – 1) = 7.7% More precise estimate is 100*(exp(0.152) – 1) = 16.4%
USING WILCOXON RANK TEST Each point is given score from 1 to n. Analyses is done on these ranked values PROC NPAR1WAY WILCOXON; CLASS group; VAR trig12; RUN; The NPAR1WAY Procedure Wilcoxon Scores (Rank Sums) for Variable TRIG12 Classified by Variable GROUP Sum of Expected Std Dev Mean GROUP N Scores Under H0 Under H0 Score ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Average scores were used for ties. Kruskal-Wallis Test Chi-Square DF 2 Pr > Chi-Square