SUMMARY. Two-sided t-test This is not an exact formula! It just demonstrates main ingrediences. difference between means, i.e. variability between samples.

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
Review ? ? ? I am examining differences in the mean between groups
Correlation Chapter 9.
Chapter 13 Multiple Regression
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
Chapter 12 Simple Regression
BA 555 Practical Business Analysis
Chapter 12 Multiple Regression
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Chapter Topics Types of Regression Models
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
Ch. 14: The Multiple Regression Model building
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Correlation & Regression
Correlation and Linear Regression
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Lecture 16 Correlation and Coefficient of Correlation
Understanding Research Results
Correlation and Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Linear Regression and Correlation
Correlation and Linear Regression
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
STA291 Statistical Methods Lecture 27. Inference for Regression.
STA291 Statistical Methods Lecture 31. Analyzing a Design in One Factor – The One-Way Analysis of Variance Consider an experiment with a single factor.
Correlation and Regression. The test you choose depends on level of measurement: IndependentDependentTest DichotomousContinuous Independent Samples t-test.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Chapter 10 Correlation and Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Hypothesis testing Intermediate Food Security Analysis Training Rome, July 2010.
1Spring 02 First Derivatives x y x y x y dy/dx = 0 dy/dx > 0dy/dx < 0.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Multiple Regression BPS chapter 28 © 2006 W.H. Freeman and Company.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lecture 10: Correlation and Regression Model.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Data Analysis.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Summary.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Analysis of Variance STAT E-150 Statistical Methods.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Inference for Least Squares Lines
Stats Club Marnie Brennan
Simple Linear Regression
Inferential Statistics
Presentation transcript:

SUMMARY

Two-sided t-test This is not an exact formula! It just demonstrates main ingrediences. difference between means, i.e. variability between samples variability within samples

Two-sided t-test The numerator indicates how much the means differ. This is an explained variation because it most likely results from the differences due to the treatment or just dut to the differences in the populations (recall beer prices, different brands are differently exppensive). The denominator is a measure of an error. It measures individual differences of subjects. This is considered an error variation because we don't know why individual subjects in the same group are different.

Explained variation

Error variation

3 samples

ANOVA Compare as many means as you want just with one test.

Total variability

Hypothesis

Multiple comparisons problem And there is another (more serious problem) with many t- tests. It is called a multiple comparisons problem.

NEW STUFF

Post hoc tests F-test in ANOVA is the so-called omnibus test. It tests the means globally. It says nothing about which particular means are different. post hoc tests, multiple comparison tests. Tukey Honestly Significant Differences TukeyHSD(fit) # where fit comes from aov()

ANOVA assumptions normality – all populations samples come from are normal homogeneity of variance – variances are equal independence of observations – the results found in one sample won't affect others Most influencial is the independence assumption. Otherwise, ANOVA is relatively robust. We can sometimes violate normality – large sample size variance homogeneity – equal sample sizes + the ratio of any two variances does not exceed four

ANOVA kinds one-way ANOVA (analýza rozptylu při jednoduchém třídění, jednofaktorová ANOVA) aov(beer_brands$Price~beer_brands$Brand) two-way ANOVA (analýza rozptylu dvojného třídění, dvoufaktorová ANOVA) Example: engagement ratio, measure two educational methods (with and without song) for men and women independently aov(engagement~method+sex) interactions between factors dependent variableindependent variable

CORRELATION

Introduction Up to this point we've been working with only one variable. Now we are going to focus on two variables. Two variables that are probably related. Can you think of some examples? weight and height time spent studying and your grade temperature outside and ankle injuries

Car data Miles on a carValue of the car $ $ $ $ $6 000 x – predictor, explanatory, independent variable How do you think y is called? Think about opposites to x name. outcome determiner response stand-alone dependent

Car data Miles on a carValue of the car $ $ $ $ $6 000 How may we show these variables have a relationship? Tell me some of yours ideas. scatterplot

Scatterplot

Stronger relationship?

Correlation Relation between two variables = correlation strong relationship = strong correlation, high correlation Match these strong positive strong negative weak positive weak negative

Correlation coefficient

Covariance divide by n-1 for sample but by n for population

Coefficient of determination r 2 = 0 r 2 = 0.25 r 2 = 0.81 from

If X is age in years and Y age in months, what will the correlation coefficient be? +1.0 X is hours you're awake a day, Y is hours you're asleep a day.

Crickets Find a cricket, count the number of its chirps in 15 seconds, add 37, you have just approximated the outside temperature in degrees Fahrenheit. National Service Weather Forecast Office: chirps in 15 sectemperaturechirps in 15 sectemperature

Hypothesis testing ABCD

Confidence intervals Statistics course from 95% CI = ( , )95% CI = , ) reject the null fail to reject the null reject the null fail to reject the null try to guess:

Hypothesis testing

Correlation vs. causation causation – one variable causes another to happen e.g. the facts it is raining cause people to take their umbrellas to work correlation – just means there is a relationship e.g. do happy people have more friends? Are they just happy because they have more friends? Or they act a certain way which causes them to have more friends.

Correlation vs. causation There is a strong relationship between the ice cream consumption and the crime rate. How could this be true? The two variables must have something in common with one another. It must be something that relates to both level of ice cream consumption and level of crime rate. Can you guess what that is? Outside temperature. from causeweb.org

Correlation vs. causation If you stop selling ice cream, does the crime rate drop? What do you think? That’s because of the simple principle that correlations express the association that exists between two or more variables; they have nothing to do with causality. In other words, just because level of ice cream consumption and crime rate increase/descrease together does not mean that a change in one necessarily results in a change in the other. You can’t interpret associations as being causal.

Correlation vs. causation In an ice cream example, there exist a variable (outside temperature) we did not realize to control. Such variable is called third variable, confounding variable, lurking variable. The methodologies of scientific studies therefore need to control for these factors to avoid a 'false positive‘ conclusion that the dependent variables are in a causal relationship with the independent variable. Let’s have a look at dependence of murder rate on temperature.

from Journal of Personality and Social Psychology, 2005, Vol. 89, No. 1, 62–66

from Journal of Personality and Social Psychology, 2005, Vol. 89, No. 1, 62–66 high assault period low assault period

Correlation and regression analysis Correlation analysis investigates the relationships between variables using graphs or correlation coefficients. Regression analysis answers the questions like: which relationship exists between variables X and Y (linear, quadratic,….), is it possible to predict Y using X, and with what error?

Simple linear regression

Data set Students in higher grades carry more textbooks. Weight of the textbooks depends on the weight of the student.

outlier strong positive correlation, r = from Intermediate Statistics for Dummies

Build a model Find a straight line y = a + bx

Interpretation y-intercept (3.69 in our case) it may or may not have a practical meaning Does it fall within actual values in the data set? If yes, it is a clue it may have a practical meaning. Does it fall within negative territory where negative y-value are not possible? (e.g. weights can’t be negative) Does a value x = 0 have practical meaning (student weighting 0)? However, even if it has no meaning, it may be necessary (i.e. significantly different from zero)! slope change in y due to one-unit increase in x (i.e. if student’s weight increases by 1 pound, its textbook’s weight increases by pounds) now you can use regression line to estimate y value for new x

Regression model conditions After building a regression mode you need to check if the required conditions are met. What are these conditions? The y’s have to have normal distribution for each value of x. The y’s have to have constant spread (standard deviation) for each value of x.

Normal y’s for every x For any value of x, the population of possible y-values must have a normal distribution. from Intermediate Statistics for Dummies

Homoscedasticity condition As you move from left to the right on the x-axis, the spread around y-values remain the same. source: wikipedia.org

Confidence and prediction limit 95% confidence limits – this interval includes the true regression line with 95% probability. (pás spolehlivosti) 95% prediction limits – this interval represents the 95% probability for the values of the dependent variable. i.e. 95% of data points lie within these lines. (pás predikce)

Residuals

from Intermediate Statistics for Dummies actual value predicted value residual

Residuals The residuals are data just like any other, so you can find their mean (which is zero!!) and their standard deviation. Residuals can be standardized, i.e. converted to the Z- score so you see where it falls on the standard normal distribution. Plotting residuals on the graph – residual plots.

Using r 2 to measure model fit r 2 measures what percentage of the variability in y is explained by the model. The y-values of the data you collect have a great deal of variability in and of themselves. You look for another variable (x) that helps you explain that variability in the y-values. After you put that x variable into the model, and you find it’s highly correlated with y, you want to find out how well this model did at explaining why the values of y are different.

Interpreting r 2 high r 2 (80-90% is extremely high, 70% is fairly high) A high percentage of variability means that the line fits well because there is not much left to explain about the value of y other than using x and its relationship to y. small r 2 (0-30%) The model containing x doesn’t help much in explaining the difference in the y- values The model would not fit well. You need another variable to explain y other than the one you already tried. middle r 2 (30-70%) x does help somewhat in explaining y, but it doesn’t do the job well enough on its own. Add one or more variables to the model to help explain y more fully as a group. Textbook example: r = 0.93, r 2 = Approximately 86% of variability you find in textbook weights for is explained by the average student weight. Fairly good model.