Stats Questions We Are Often Asked

Slides:



Advertisements
Similar presentations
Regression and correlation methods
Advertisements

Chapter 4 The Relation between Two Variables
1 Objective Investigate how two variables (x and y) are related (i.e. correlated). That is, how much they depend on each other. Section 10.2 Correlation.
DEPARTMENT OF STATISTICS Stats Questions We Are Often Asked.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Correlation A correlation exists between two variables when one of them is related to the other in some way. A scatterplot is a graph in which the paired.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Correlation and Regression
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Statistical Analysis Topic – Math skills requirements.
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
Chapter 10 Correlation and Regression
 Graph of a set of data points  Used to evaluate the correlation between two variables.
DEPARTMENT OF STATISTICS  What are they?  When should I use them?  How do Excel and GCs handle them?  Why should I be careful with the Nulake text?
3.3 Correlation: The Strength of a Linear Trend Estimating the Correlation Measure strength of a linear trend using: r (between -1 to 1) Positive, Negative.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
Confidence Intervals for a Population Proportion Excel.
DEPARTMENT OF STATISTICS Statistical literacy. DEPARTMENT OF STATISTICS Damaged for life by too much TV.
Stats Methods at IC Lecture 3: Regression.
Correlation and Linear Regression
GS/PPAL Section N Research Methods and Information Systems
Regression and Correlation
Chapter 8: Estimating with Confidence
Regression Analysis.
CHAPTER 8 Estimating with Confidence
Chapter 8: Estimating with Confidence
Topic 10 - Linear Regression
REGRESSION (R2).
Basic Estimation Techniques
CHAPTER 8 Estimating with Confidence
Regression Analysis.
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Essential Statistics (a.k.a: The statistical bare minimum I should take along from STAT 101)
Basic Estimation Techniques
Scatterplots, Association, and Correlation
CHAPTER 26: Inference for Regression
CHAPTER 8 Estimating with Confidence
Inferences and Conclusions from Data
CHAPTER 8 Estimating with Confidence
STEM Fair Graphs.
Bivariate Data analysis
STA 291 Summer 2008 Lecture 23 Dustin Lueker.
STATISTICS Topic 1 IB Biology Miss Werba.
Simple Linear Regression
Least-Squares Regression
Correlation and Regression
Chapter 8: Estimating with Confidence
CHAPTER 8 Estimating with Confidence
Inferential Statistics
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
CHAPTER 8 Estimating with Confidence
CHAPTER 8 Estimating with Confidence
CHAPTER 8 Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapters Important Concepts and Terms
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
CHAPTER 8 Estimating with Confidence
CHAPTER 8 Estimating with Confidence
STA 291 Spring 2008 Lecture 23 Dustin Lueker.
Chapter 8: Estimating with Confidence
Presentation transcript:

Stats Questions We Are Often Asked

Stats questions we are often asked When can I use r and R 2 ? When can I make a ‘causal-type’ claim? Why should I be careful with a media reported margin of error? When can I say a confidence interval gives support to a claim?

Stats questions we are often asked When can I use r and R 2 ? When can I make a ‘causal-type’ claim? Why should I be careful with a media reported margin of error? When can I say a confidence interval gives support to a claim?

r – little r – what is it? r is the correlation coefficient between y and x r measures the strength of a linear relationship r is a multiple of the slope

r – when can it be used? Only use r if the scatter plot is linear Don’t use r if the scatter plot is non-linear! x y * r = 0.99

r – what does it tell you? How close the points in the scatter plot come to lying on the line r = 0.99 x y * r = 0.57

R 2 – big R 2 – what is it? R 2 is the coefficient of determination Measures how close the points in the scatter plot come to lying on the fitted line or curve x y *

R 2 – big R 2 – when can it be used? When the scatter plot of y versus x is linear or non-linear x y *

R 2 – what does it tell you? ˆ Dotplot of the y ’s Shows the variation in the y ’s x y ˆ x Dotplot of the y ’s Shows the variation in the y ’s ˆ

R 2 – what does it tell you? ˆ ˆ Variation in fitted values Variation in the y ’s This amount of variation can be explained by the model ˆ x We see some additional variation in the y ’s. The excess is not explained by the model. 2 Variation in y 's ˆ Variation in fitted values Variation in y values Variation in y 's R =

R 2 – what does it tell you? When expressed as a percentage, R 2 is the percentage of the variation in Y that our regression model can explain R 2 near 100%  model fits well R 2 near 0%  model doesn’t fit well

R 2 – what does it tell you? R 2 = 90% x y * R 2 = 90% 90% of the variation in Y is explained by our regression model.

R 2 – pearls of wisdom! R 2 and r 2 have the same value ONLY when using a linear model DON’T use R 2 to pick your model Use your eyes!

R 2 and Excel & Graphics Calculators

Damaged for life by too much TV

Damaged for life by too much TV N Z Herald (04/10/2005)

Damaged for life by too much TV

Damaged for life by too much TV Causal relationship? r = - 0.93 Health Score TV watching

Causal relationships Two general types of studies: experiments and observational studies In an experiment, the experimenter determines which experimental units receive which treatments.

Damaged for life by too much TV Causal relationship? r = - 0.93 Health Score TV watching

Causal relationships Two general types of studies: experiments and observational studies In an experiment, the experimenter determines which experimental units receive which treatments. In an observational study, we simply compare units that happen to have received different levels of the factor of interest.

Causal relationships Only well designed and carefully executed experiments can reliably demonstrate causation. An observational study is often useful for identifying possible causes of effects, but it cannot reliably establish causation

Causal relationships - Summary In observational studies, strong relationships are not necessarily causal relationships. Correlation does not imply causation. Be aware of the possibility of lurking variables.

Damaged for life by too much TV

Margin of Error Sunday Star Times: National 44% Labour 37.2% NZ First 4.7% margin of error: 4.4% (n = 540) Herald on Sunday: Labour 42% National 38.5% NZ First 5.5% margin of error: 4.9% (n = 400)

Margin of Error Herald on Sunday: Labour 42% National 38.5% NZ First 5.5% margin of error: 4.9% (n = 400)

Margin of Error Confidence Interval: estimate ± margin of error Herald on Sunday: Labour 42% National 38.5% NZ First 5.5% margin of error: 4.9% (n = 400)

Margin of Error Survey Errors Sampling Error Nonsampling Errors

Margin of Error Survey Errors Sampling Error Nonsampling Errors caused by the act of sampling has potential to be bigger in smaller samples can determine how large it can be – margin of error unavoidable (price of sampling)

Margin of Error Survey Errors Sampling Error Nonsampling Errors e.g., nonresponse bias, behavioural, . . . can be much larger than sampling errors impossible to correct for after completion of survey impossible to determine how badly they affect results

Margin of Error Herald on Sunday: Labour 42% National 38.5% NZ First 5.5% margin of error: 4.9% (n = 400)

Margin of Error Approx. 95% confidence interval for p:

Margin of Error Margin of error (single proportion)

Margin of Error Sunday Star Times: National 44% Labour 37.2% NZ First 4.7% margin of error: 4.4% (n = 540) Herald on Sunday: Labour 42% National 38.5% NZ First 5.5% margin of error: 4.9% (n = 400)

Bank Dissatisfaction Scores – 95% CIs mC – mA: 0.5 to 20.7 mA – mW: – 9.8 to 6.6

Bank Dissatisfaction Scores – 95% CIs mC – mA: 0.5 to 20.7 mA – mW: – 9.8 to 6.6 With 95% confidence, the mean dissatisfaction score for Canterbury customers is somewhere between 0.5 and 20.7 larger than the mean dissatisfaction score for Auckland customers.

Bank Dissatisfaction Scores – 95% CIs mC – mA: 0.5 to 20.7 mA – mW: – 9.8 to 6.6 With 95% confidence, the mean dissatisfaction score for Canterbury customers is somewhere between 0.5 and 20.7 larger than the mean dissatisfaction score for Auckland customers.

Bank Dissatisfaction Scores – 95% CIs mC – mA: 0.5 to 20.7 mA – mW: – 9.8 to 6.6 With 95% confidence, the mean dissatisfaction score for Auckland customers is somewhere between 9.8 less than and 6.6 greater than the mean dissatisfaction score for Wellington customers.

Bank Dissatisfaction Scores – 95% CIs mC – mA: 0.5 to 20.7 mA – mW: – 9.8 to 6.6 With 95% confidence, the mean dissatisfaction score for Auckland customers is somewhere between 9.8 less than and 6.6 greater than the mean dissatisfaction score for Wellington customers.

Bank Dissatisfaction Scores – 95% CIs mC – mA: 0.5 to 20.7 mA – mW: – 9.8 to 6.6 Does this confidence interval support the proposition that there is a difference between the two population means? Supports mA – mW  0 ? No, it doesn’t support the proposition. Since 0 is in the confidence interval, then 0 is a believable value for the difference. There could be no difference between the two means. mA – mW = 0 (no diff) mA – mW  0 (a diff)

Bank Dissatisfaction Scores – 95% CIs mC – mA: 0.5 to 20.7 mA – mW: – 9.8 to 6.6 Does this confidence interval support the proposition that there is NO difference between the two population means? Supports mA – mW = 0 ? No, it doesn’t support the proposition. Since there are non-zero numbers in the interval mA – mW could be non-zero, there could be a difference. mA – mW = 0 (no diff) mA – mW  0 (a diff)

Bank Dissatisfaction Scores – 95% CIs mC – mA: 0.5 to 20.7 mA – mW: – 9.8 to 6.6 Does this confidence interval support the proposition that there is a difference between the two population means? Supports mA – mW  0 ? Yes, it does support the proposition. Since zero is not in the interval, it is not believable that the difference is zero. No difference between the means is not believable. mA – mW = 0 (no diff) mA – mW  0 (a diff)

Bank Dissatisfaction Scores – 95% CIs mC – mA: 0.5 to 20.7 mA – mW: – 9.8 to 6.6 Does this confidence interval support the proposition that there is NO difference between the two population means? Supports mA – mW = 0 ? No, it doesn’t support the proposition. In fact, it provides evidence against it. 0 is not in the interval. No difference between the means is not believable. mA – mW = 0 (no diff) mA – mW  0 (a diff)