Statistical Data Analysis - Lecture 05 12/03/03

Slides:



Advertisements
Similar presentations
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Advertisements

Biomedical Statistics Testing for Normality and Symmetry Teacher:Jang-Zern Tsai ( 蔡章仁 ) Student: 邱瑋國.
Regression Inferential Methods
ANOVA example 4 Polychlorinated biphenyls (PCBs) previously used in the manufacture of large electrical transformers and capacitors, are extremely hazardous.
Lecture 8 Resistance of two-sample t-tools and outliers (Chapters ) Transformations of the Data (Chapter 3.5)
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Stat 512 Day 4: Quantitative Data. Last Time p-values and statistical significance  What p-values tell us (and do not tell us) For now, approximating.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Lecture 9 Today: –Log transformation: interpretation for population inference (3.5) –Rank sum test (4.2) –Wilcoxon signed-rank test (4.4.2) Thursday: –Welch’s.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Explaining the Normal Distribution
8-5 Testing a Claim About a Standard Deviation or Variance This section introduces methods for testing a claim made about a population standard deviation.
Simple Linear Regression Analysis
Use of Quantile Functions in Data Analysis. In general, Quantile Functions (sometimes referred to as Inverse Density Functions or Percent Point Functions)
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Single-Sample T-Test Quantitative Methods in HPELS 440:210.
Inference for Regression
Regression. Height Weight Suppose you took many samples of the same size from this population & calculated the LSRL for each. Using the slope from each.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
1.1 Statistical Analysis. Learning Goals: Basic Statistics Data is best demonstrated visually in a graph form with clearly labeled axes and a concise.
CPE 619 One Factor Experiments Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama in.
Chapter 10 The t Test for Two Independent Samples
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 2.
Example x y We wish to check for a non zero correlation.
Analysis of Variance STAT E-150 Statistical Methods.
1 Chapter 23 Inferences About Means. 2 Inference for Who? Young adults. What? Heart rate (beats per minute). When? Where? In a physiology lab. How? Take.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
AP Statistics. Chapter 1 Think – Where are you going, and why? Show – Calculate and display. Tell – What have you learned? Without this step, you’re never.
The symmetry statistic
ANOVA: Analysis of Variation
CHAPTER 12 More About Regression
Measures of Position – Quartiles and Percentiles
Step 1: Specify a null hypothesis
Lecture Slides Elementary Statistics Twelfth Edition
Quantitative Data Continued
Chapter 24 Comparing Means.
ANOVA: Analysis of Variation
Lecture Slides Elementary Statistics Twelfth Edition
Two-way ANOVA with significant interactions
Size of a hypothesis test
Inferences for Regression
Inference for Regression
Statistical Data Analysis - Lecture10 26/03/03
CHAPTER 12 More About Regression
Tips for Writing Free Response Questions on the AP Statistics Exam
Tips for Writing Free Response Questions on the AP Statistics Exam
Description of Data (Summary and Variability measures)
Inverse Transformation Scale Experimental Power Graphing
Quantitative Methods in HPELS HPELS 6210
Chapter 9.1: Sampling Distributions
When You See (This), You Think (That)
Chapter 1: Exploring Data
Key statistical concepts.
CHAPTER 12 More About Regression
Describing Distributions Numerically
Honors Statistics Review Chapters 4 - 5
Data Transformation, T-Tools and Alternatives
CHAPTER 12 More About Regression
One-Factor Experiments
Advanced Algebra Unit 1 Vocabulary
The Normal Distribution
Introductory Statistics
Presentation transcript:

Statistical Data Analysis - Lecture 05 12/03/03 A small case study Description: Rainfall from Cloud-Seeding. The rainfall in acre-feet from 52 clouds 26 of which were chosen at random and seeded with silver nitrate. Reference: Chambers, Cleveland, Kleiner, and Tukey. (1983). Graphical Methods for Data Analysis. Wadsworth International Group, Belmont, CA, 351. Original Source: Simpson, Alsen, and Eden.(1975). A Bayesian analysis of a multiplicative treatment effect in weather modification. Technometrics 17, 161-166. Statistical Data Analysis - Lecture 05 12/03/03

Statistical Data Analysis - Lecture 05 12/03/03 Number of cases: 26 Variable Names: Unseeded: Amount of rainfall from unseeded clouds (in acre-feet) Seeded: Amount of rainfall from seeded clouds with silver nitrate (in acre-feet) Statistical Data Analysis - Lecture 05 12/03/03

Step 1: Visually inspect the data It is easier to look at the order statistics  sort the data. First let’s make the data easier to work with clouds<-data.frame(seed=Seeded, u.seed=Unseeded) attach(clouds) Now sort the data. Type: seed<-sort(seed) u.seed<-sort(u.seed) Type clouds to display the data Statistical Data Analysis - Lecture 05 12/03/03

Visual inspection of the data There are some very large values There are some very small values The data does not increase linearly The data sets look as though they increase at the same rate but there is a shift in mean Statistical Data Analysis - Lecture 05 12/03/03

Statistical Data Analysis - Lecture 05 12/03/03 Plot the data A histogram of each data set would be helpful Type: par(mfrow=c(2,1)) # this sets the graph screen # to 2 rows 1 column hist(u.seed) hist(seed) Statistical Data Analysis - Lecture 05 12/03/03

Interpret the histograms Note the histograms don’t have the same scales so it makes them difficult to compare Each histogram has a long tail to the right so the data sets are right skewed. Confirm this by calculating a symmetry statistic for each. Type: s(seed) s(u.seed) Note the 0.1 – this means we’re using the 0.1 quantile and the 0.9 quantile to calculate the symmetry statistic Statistical Data Analysis - Lecture 05 12/03/03

Get summary statistics Type: summary(seed) summary(u.seed) This gives you a five number summary (Min, LQ, Med, UQ, Max) plus the mean To get the standard deviations type sd(seed) sd(u.seed) To get the skewness statistics type skewness(seed) skewness(u.seed) Statistical Data Analysis - Lecture 05 12/03/03

Descriptive statistics Note that the skewness statistic confirms our histogram interpretation and our symmetry statistic. We can plot the mean and median on each histogram: hist(seed) abline(v=median(seed),col=“red”,lwd=2) abline(v=mean(seed),col=“blue”,lwd=2) hist(u.seed) abline(v=median(u.seed),col=“red”,lwd=2) abline(v=mean(u.seed),col=“blue”,lwd=2) Notice how the median lines up with the centre of the data much better than the mean. If we want to compare groups it is going to be easier if we transform the data Statistical Data Analysis - Lecture 05 12/03/03

Statistical Data Analysis - Lecture 05 12/03/03 Transforming the data Choose a sensible transformation either using the sympowerplot function, e.g. sympowerplot(u.seed) sympowerplot(seed) Or by using the Box-Cox method. bcp(u.seed) bcp(seed) Statistical Data Analysis - Lecture 05 12/03/03

Statistical Data Analysis - Lecture 05 12/03/03 Transforming the data Box-Cox recommends p = 0.047 and p = 0.118 – therefore a log transformation might work well. If you’re willing to wait awhile try bcp.bounds(seed) and bcp.bounds(u.seed) These will give approximate 95% confidence intervals on the powers and they both include 0 Symmetry power plots recommend p = – 0.1 and p = 0.16 for unseeded and seeded respectively. Log is easiest to explain, so try that. Look at log transformed histograms Still lumpy, but skewness has been reduced Statistical Data Analysis - Lecture 05 12/03/03

Statistical Data Analysis - Lecture 05 12/03/03 Compare the data Boxplot the transformed data Join the data into one vector rainfall<-c(log.u.seed, log.seed) Use the rep command to make a vector of labels corresponding to the unseeded and seeded data gp<-rep(c(“Unseeded”,”Seeded”),c(26,26)) boxplot(split(rainfall,gp)) QQ-Plot the data qqplot(log.u.seed,log.seed) qq.plot(log.u.seed,log.seed) Statistical Data Analysis - Lecture 05 12/03/03

Statistical Data Analysis - Lecture 05 12/03/03 Compare the data QQ-plot shows linear trend  logged data are from similar distributions Slope is approximately equal to 1  similar spread Is the difference in the means significant ? Two-sample t-test Statistical Data Analysis - Lecture 05 12/03/03

Assumptions of the two sample t-test The data sets are statistically independent The data are normally distributed The data have the same variance This last assumption can be relaxed if we use Welch’s modification to the t-test Statistical Data Analysis - Lecture 05 12/03/03

Statistical Data Analysis - Lecture 05 12/03/03 Two sample t-test Control group (unseeded) and treatment group (seeded) were independently measured so data are independent Log data are approximately unimodal and symmetric, so normality is okay Check with norplots Use t.test(log.u.seed,log.seed) Statistical Data Analysis - Lecture 05 12/03/03

Statistical Data Analysis - Lecture 05 12/03/03 Interpret results 0.01< P <0.05 difference significant at 5% level 95% C.I does not contain zero  difference significant at 5% level “Based on the data there is evidence to suggest that we reject the null hypothesis (of no difference)” What does this mean? Statistical Data Analysis - Lecture 05 12/03/03

Statistical Data Analysis - Lecture 05 12/03/03 Interpret the results We have a 95% c.i. for the difference in the logged means. What does this tell us in terms of the original data? Not much on this scale – transform estimated difference and c.i. back to original scale Statistical Data Analysis - Lecture 05 12/03/03

Statistical Data Analysis - Lecture 05 12/03/03 Interpret the results So now we have three numbers 3.14, 1.27 and 7.74 – what do we do with them “The average amount of rainfall (in acre-feet) is approximately 3 times higher if the clouds are seeded with silver nitrate than if they’re not.” “With 95% confidence we can say this figure could be as low as 1.3 times or as high as 7.7 times.” Statistical Data Analysis - Lecture 05 12/03/03