Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.

Slides:



Advertisements
Similar presentations
AP Statistics Course Review.
Advertisements

Inference for Regression
Statistical Tests Karen H. Hagglund, M.S.
Basic Data Analysis for Quantitative Research
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Correlation. Two variables: Which test? X Y Contingency analysis t-test Logistic regression Correlation Regression.
Tuesday, October 22 Interval estimation. Independent samples t-test for the difference between two means. Matched samples t-test.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Linear Regression and Correlation Analysis
T-Tests Lecture: Nov. 6, 2002.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Chapter Six z-Scores and the Normal Curve Model. Copyright © Houghton Mifflin Company. All rights reserved.Chapter The absolute value of a number.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Lecture 6: Descriptive Statistics: Probability, Distribution, Univariate Data.
5-3 Inference on the Means of Two Populations, Variances Unknown
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Chapter 7 Probability and Samples: The Distribution of Sample Means
Chapter 11: Random Sampling and Sampling Distributions
Chapter 5 DESCRIBING DATA WITH Z-SCORES AND THE NORMAL CURVE.
Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Looking at differences: parametric and non-parametric tests
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Inference for regression - Simple linear regression
Hypothesis Testing:.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
1 GE5 Lecture 6 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
T-test Mechanics. Z-score If we know the population mean and standard deviation, for any value of X we can compute a z-score Z-score tells us how far.
Education 793 Class Notes T-tests 29 October 2003.
Comparing Means: t-tests Wednesday 22 February 2012/ Thursday 23 February 2012.
Inferential Statistics 2 Maarten Buis January 11, 2006.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Copyright © 2009 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Skewness & Kurtosis: Reference
CORRELATIONS: TESTING RELATIONSHIPS BETWEEN TWO METRIC VARIABLES Lecture 18:
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 7 - Sampling Distribution of Means.
6.3 THE CENTRAL LIMIT THEOREM. DISTRIBUTION OF SAMPLE MEANS  A sampling distribution of sample means is a distribution using the means computed from.
Jeopardy Hypothesis Testing t-test Basics t for Indep. Samples Related Samples t— Didn’t cover— Skip for now Ancient History $100 $200$200 $300 $500 $400.
1 Inferences About The Pearson Correlation Coefficient.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
I271B The t distribution and the independent sample t-test.
Lecture 10: Correlation and Regression Model.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Inferential Statistics. Coin Flip How many heads in a row would it take to convince you the coin is unfair? 1? 10?
Testing Differences between Means, continued Statistics for Political Science Levin and Fox Chapter Seven.
AGENDA Review In-Class Group Problems Review. Homework #3 Due on Thursday Do the first problem correctly Difference between what should happen over the.
Describing Samples Based on Chapter 3 of Gotelli & Ellison (2004) and Chapter 4 of D. Heath (1995). An Introduction to Experimental Design and Statistics.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Chapter Eleven Performing the One-Sample t-Test and Testing Correlation.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Hypothesis Testing and Statistical Significance
GOSSET, William Sealy How shall I deal with these small batches of brew?
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
6-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Appendix I A Refresher on some Statistical Terms and Tests.
Statistical analysis.
Statistical analysis.
Univariate Descriptive Statistics
Univariate Descriptive Statistics
The t distribution and the independent sample t-test
Statistics II: An Overview of Statistics
Presentation transcript:

Lecture 7: Bivariate Statistics

2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has no impact on S.D. If a constant is multiplied to all scores, it will affect the dispersion (S.D. and variance) S = standard deviation X = individual score M = mean of all scores n = sample size (number of scores)

3

4 Distributions and Standard Deviations Example: A distribution has a mean of 40 and a standard deviation of 5. 68% of the distribution can be found between what two values? 95% of the distribution can be found between what two values?

5 Standard Error of the Mean Standard Error is an estimate of how much the mean would vary over many samples drawn from the same population. It is calculated from a single sample– it is an estimate of the standard deviation of the sampling distribution of the mean. It is calculated from a single sample– it is an estimate of the standard deviation of the sampling distribution of the mean. Smaller S.E. suggests that our sample is likely a good estimate of the population mean. Smaller S.E. suggests that our sample is likely a good estimate of the population mean.

6 Common Data Representations Histograms Simple graphs of the frequency of groups of scores. Simple graphs of the frequency of groups of scores. Stem-and-Leaf Displays Another way of displaying dispersion, particularly useful when you do not have large amounts of data. Another way of displaying dispersion, particularly useful when you do not have large amounts of data. Box Plots Yet another way of displaying dispersion. Boxes show 75 th and 25 th percentile range, line within box shows median, and “whiskers” show the range of values (min and max) Yet another way of displaying dispersion. Boxes show 75 th and 25 th percentile range, line within box shows median, and “whiskers” show the range of values (min and max)

7 Estimation and Hypothesis Tests: The Normal Distribution A key assumption for many variables (or specifically, their scores/values) is that they are normally distributed. In large part, this is because the most common statistics (chi-square, t, F test) rest on this assumption.

8 Why do we make this assumption? Central Limit Theorem Errors can be viewed as a sum of many independent random effects, thus individual scores will tend to be normally distributed. Even if Y is not normally distributed, the distribution of the sample mean will tend to be normal as the sample size increases. Y = µ + ε A given score (Y) is the sum of the mean of the population (µ) and some error (ε) A given score (Y) is the sum of the mean of the population (µ) and some error (ε)

9 The z-score Infinitely many normal distributions are possible, one for each combination of mean and variance– but all related to a single distribution. Standardizing a group of scores changes the scale to one of standard deviation units. Allows for comparisons with scores that were originally on a different scale.

10 z-scores (continued) Tells us where a score is located within a distribution– specifically, how many standard deviation units the score is above or below the mean. Properties The mean of a set of z-scores is zero (why?) The mean of a set of z-scores is zero (why?) The variance (and therefore standard deviation) of a set of z-scores is 1. The variance (and therefore standard deviation) of a set of z-scores is 1.

11 Area under the normal curve Example, you have a variable x with mean of 500 and S.D. of 15. How common is a score of 525? Z = /15 = 1.67 Z = /15 = 1.67 If we look up the z-statistic of 1.67 in a z-score table, we find that the proportion of scores less than our value is If we look up the z-statistic of 1.67 in a z-score table, we find that the proportion of scores less than our value is Or, a score of 525 exceeds.9525 of the population. (p <.05) Or, a score of 525 exceeds.9525 of the population. (p <.05) Z-Score Calculator Z-Score Calculator Z-Score Calculator Z-Score Calculator

12 Issues with Normal Distributions SkewnessKurtosis

Correlation Hypothesis testing an association between two metric variables

14 Checking for simple linear relationships Pearson’s correlation coefficient Measures the extent to which two variables are linearly related Measures the extent to which two variables are linearly related Basically, the correlation coefficient is the average of the cross products of the corresponding z-scores.

15 Correlations Ranges from zero to 1, where 1 = perfect linear relationship between the two variables. Remember: correlation ONLY measures linear relationships, not all relationships!

16 Correlation Example General Social Survey 1993 Education and Age

The t-test Hypothesis testing for the equality of means between two independent groups

18 Alternative Hypotheses Revisited Alternative Hypotheses: H 1 : μ 1 < μ c H 1 : μ 1 < μ c H 0 : μ 1 > μ c H 0 : μ 1 > μ c H 0 : μ 1 ≠ μ c H 0 : μ 1 ≠ μ c How do we test to see if the means between two sample populations are, in fact, different?

19 The t-test Where: M = mean SDM = Standard error of the difference between means N = number of subjects in group s = Standard Deviation of group df = degrees of freedom

20 Degrees of freedom d.f. = the number of independent pieces of information from the data collected in a study. Example: Choosing 10 numbers that add up to 100. Example: Choosing 10 numbers that add up to 100. This kind of restriction is the same idea: we had 10 choices but the restriction reduced our independent selections to N-1. In statistics, further restrictions reduce the degrees of freedom. In the t-test, since we deal with two means, our degrees of freedom are reduced by two.

21 Z-distribution versus t-distribution

22 t distribution As the degrees of freedom increase (towards infinity), the t distribution approaches the z distribution (i.e., a normal distribution) Because N plays such a prominent role in the calculation of the t-statistic, note that for very large N’s, the sample standard deviation (s) begins to closely approximate the population standard deviation (σ)

23 Assumptions Underlying the Independent Sample t-test Assumption of Normality Assumption of Homogeneity of Variance The outputs for the t-test in SPSS correspond to the standard t-test (equal variance assumed) and a separate variance t-test (equal variance not assumed) The outputs for the t-test in SPSS correspond to the standard t-test (equal variance assumed) and a separate variance t-test (equal variance not assumed)

24 Practical Example: Do men and women watch different amounts of TV per week? General Social Survey 1993