Chapter 5 Introduction to Inferential Statistics.

Slides:

Advertisements

Similar presentations

Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.

Advertisements

1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.

Chapter 6 Sampling and Sampling Distributions

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.

Review: What influences confidence intervals?

EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.

Chapter 5 Introduction to Inferential Statistics.

T scores and confidence intervals using the t distribution.

The standard error of the sample mean and confidence intervals

t scores and confidence intervals using the t distribution

The standard error of the sample mean and confidence intervals

Chapter 1 The mean, the number of observations, the variance and the standard deviation.

Chapter 9 - Lecture 2 Some more theory and alternative problem formats. (These are problem formats more likely to appear on exams. Most of your time in.

Chapter 7 -Part 1 Correlation. Correlation Topics zCo-relationship between two variables. zLinear vs Curvilinear relationships zPositive vs Negative relationships.

Chapter 7 Sampling and Sampling Distributions

1 Chapter 8 – Regression 2 Basic review, estimating the standard error of the estimate and short cut problems and solutions.

Correlation 2 Computations, and the best fitting line.

Chapter 13 Introduction to Linear Regression and Correlation Analysis

The Simple Regression Model

Correlation 2 Computations, and the best fitting line.

Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 9: Hypothesis Tests for Means: One Sample.

Confidence intervals using the t distribution. Chapter 6 t scores as estimates of z scores; t curves as approximations of z curves Estimated standard.

The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.

Linear Regression and Correlation Analysis

Chapter 5 Introduction to Inferential Statistics.

Chapter 1 The mean, the number of observations, the variance and the standard deviation.

Chapter 1 The mean, the number of observations, the variance and the standard deviation.

Chapter 10 - Part 1 Factorial Experiments.

Chapter 9 - Lecture 2 Computing the analysis of variance for simple experiments (single factor, unrelated groups experiments).

Chapter 1-6 Review Chapter 1 The mean, variance and minimizing error.

Chapter 9 - Lecture 2 Computing the analysis of variance for simple experiments (single factor, unrelated groups experiments).

Today Concepts underlying inferential statistics

1 Chapter 8 – Regression 2 Basic review, estimating the standard error of the estimate and short cut problems and solutions.

T scores and confidence intervals using the t distribution.

Chapter 14 Introduction to Linear Regression and Correlation Analysis

Intro to Parametric Statistics, Assumptions & Degrees of Freedom Some terms we will need Normal Distributions Degrees of freedom Z-values of individual.

Basic Analysis of Variance and the General Linear Model Psy 420 Andrew Ainsworth.

The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.

1 Psych 5500/6500 Statistics and Parameters Fall, 2008.

Standard Error of the Mean

Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.

Correlation and Linear Regression

Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.

Statistics. Intro to statistics Presentations More on who to do qualitative analysis Tututorial time.

 The situation in a statistical problem is that there is a population of interest, and a quantity or aspect of that population that is of interest. This.

1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.

F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

Introduction to Linear Regression

Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.

From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.

Testing Hypotheses about Differences among Several Means.

Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.

MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.

Chapter 10: Introduction to Statistical Inference.

1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.

Midterm Review Ch 7-8. Requests for Help by Chapter.

Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.

Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.

SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. PPSS The situation in a statistical problem is that there is a population of interest, and a quantity or.

Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.

SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.

SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.

Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.

Things you will need in class. zLecture notes from the my website on the internet. yGo to and look for the latest set of.

Sampling Distributions and Estimation

Statistics: The Z score and the normal distribution

CHAPTER 26: Inference for Regression

MGS 3100 Business Analysis Regression Feb 18, 2016

Presentation transcript:

Chapter 5 Introduction to Inferential Statistics

Definition zinfer - vt., arrive at a decision by or opinion by reasoning from known facts or evidence.

Sample A sample comprises a part of the population selected for a study.

Random Samples If every score in the population has an equal chance of being selected each time you chose a score, then it is called a random sample. Random samples, and only random samples, are representative of the population from which they are drawn.

Q: ON WHAT MEASURES IS A RANDOM SAMPLE REPRESENTATIVE OF THE POPULATION?

EVERY Q: ON WHAT MEASURES IS A RANDOM SAMPLE REPRESENTATIVE OF THE POPULATION? A: ON EVERY MEASURE.

REPRESENTATIVE ON EVERY MEASURE zThe mean of the random sample’s height will be similar to the mean of the population. zThe same holds for weight, IQ, ability to remember faces or numbers, the size of their livers, self-confidence, how many children their aunts had, etc., etc., etc. ON EVERY MEASURE THAT EVER WAS OR CAN BE.

All sample statistics are representative of their population parameters zThe sample mean is a least squares, unbiased, consistent estimate of the population mean. zMS W is a least squares, unbiased, consistent estimate of the population variance.

In Chapter 7, you will learn about the population correlation coefficient, rho, and its estimate based on a random sample called r. zBased on what you know about sample statistics so far, what would you say about the relationship of r and rho?

r should be (and is) a least squares, unbiased, consistent estimate of rho.

On what measures will sample statistics be least squares, unbiased, and consistent estimates of their population parameters?

REPRESENTATIVE ON measures of central tendency (the mean), on measures of variability (e.g., sigma 2 ), and on all derivative measures zFor example, the way scores fall around the mean of a random sample (as indexed by MS W ) will be similar to the way scores fall around the mean of the population (as indexed by sigma 2 ).

THERE ARE OCCASIONAL RANDOM SAMPLES THAT ARE POOR REPRESENTATIVES OF THEIR POPULATION zBut 1.) we will take that into account zAnd 2.) most samples are fairly to very good representatives of their populations

Population Parameters and Sample Statistics: Nomenclature The characteristics of a population are called population parameters. They are usually represented by Greek letters (  (mu),  (sigma). The characteristics of a sample are called SAMPLE STATISTICS. They are usually represented by the English alphabet (X, s).

Three things we can do with random samples zEstimate population parameters. This is called estimation research. zEstimate the relationship between variables in the population from their relationship in a random sample. This is called correlational research. zCompare the responses of random samples drawn from the same population to different conditions. This is called experimental research.

Estimating population parameters zSample statistics are least squares, unbiased, consistent estimates of their population parameters. zWe’ll get to this in a minute, in detail.

Correlational Research We observe the relationship among variables in a random sample. We are unlikely to find strong relationships purely by chance. When you study a sample and the relationship between two variables is strong enough, you can infer that a similar relationship between the variables will be found in the population as a whole. This is called correlational research. For example, height and weight are co-related.

Another way to describe correlational research zA key datum in psychology is that individuals differ from each other in fairly stable ways. zFor example, some people learn foreign languages easily, while others find it more difficult. zCorrelational research allows us to determine whether individual differences on one variable (called the X variable) are related to individual differences on a second variable (called Y).

What is to come- CH. 6 & 7 zIn Chapter 6, you will learn to turn scores on different measures from a sample into t scores, scores that can be directly compared to each other. (You will also learn to use the t distribution to create confidence intervals and test hypotheses.) zIn Chapter 7, you will learn to compute a single number that describes the direction and consistency of the relationship between two variables. That number is called Pearson’s r, the correlation coefficient.

What is to come – CH. 8 zIn Chapter 8, you will learn to predict scores on one variable from scores on another variable when you know (or can estimate) the correlation coefficient. zIn Chapter 8, you will also learn when not to do that and to go back to predicting that everyone will score at the mean of their distribution.

What is to come - CH 9 – 11 Experimental Research In Chapters 9 – 11 you will learn about experiments. In an experiment, we start with samples that can be assumed to be similar and then treat them differently. Then we measure response differences among the samples and make inferences about whether or not similar differences would occur in response to similar treatment in the whole population. For example, we might expose randomly selected groups of depressed patients to different doses of a new drug to see which dose produces the best result. If we got clear differences, we might suggest that all patients be treated with the dose that had the best results in the sample.

The logic of experimentation zWe start off with groups that are all random samples from a single population. zAs we add scores to each group, the group means become more and more similar to the population mean and to each other. zThe variation of each group’s scores around its own mean becomes more and more similar to sigma 2. Thus, the mean squares in each group become more and more similar to each other.

The groups become more alike in every way, on every possible measure. zAt the beginning of an experiment, the different groups are alike. zThey then get treated differently. zWe then determine whether they differ more after being treated differently than they would if the different treatments did not have different effects.

Note that individual differences underlie our ability to do experimental research as well as correlational research zMany people believe that a science of psychology is impossible because people differ too much to be the subject of a science.

It is obvious that individual differences underlie our ability to do correlational research, but why does that work for experimental research as well?

zTo do experiments, we must start with groups that are similar to each other in every way. zWhy are the groups alike at the beginning of an experiment? Because each person’s score is different and each different, randomly selected score tends to correct each group’s sample statistic back towards its population parameter and toward those of the other groups. zThis happens on every conceivable measure. zThus, as you add people to your samples, individual differences make random samples from the same population similar in every way.

In summary, individual differences underlie our ability to compose groups that are the same, not different at the beginning of the experiment. zThis is important enough for us to go over it at the end of class.

In this chapter, we will focus on estimating population parameters from sample statistics.

Definition A least square estimate is a number that is the minimum average squared distance from the number it estimates. We will study sample statistics that are least squares estimates of their population parameters.

Definition An unbiased estimate is one around which deviations sum to zero. We will study sample statistics that are unbiased estimates of their population parameters.

Definition A consistent estimator is one where the larger the number of randomly selected scores underlying the sample statistic, the closer the statistic will tend to come to the population parameter. We will study sample statistics that are consistent estimates of their population parameters.

The sample mean The sample mean is called X-bar and is represented by X. X is the best estimate of , because it is a least squares, unbiased, consistent estimate. X =  X / n

Estimated variance The estimate of  2 is called the mean squared error and is represented by MS W. Like our other statistics, MS W is a least squares, unbiased, consistent estimate of its population parameter,  2. SS W =  ( X - X) 2 MS W =  ( X - X) 2 / (n-k)

Estimated standard deviation The estimate of  is called s. s = MS W

Estimating mu and sigma – single sample S# A B C X684X684 MS W = SS W /(n-k) = 8.00/2 = 4.00 s = MS W = 2.00 (X - X) (X - X) X 6.00  X=18 n= 3 X=6.00  ( X-X)=0.00  ( X-X) 2 =8.00 = SS W

Estimating sigma from multiple samples

Group X MS W = SS W /(n-k) = s = MS W = (X - X) (X - X)  (X-X 1 )=0.00  (X-X 1 ) 2 = Group  (X-X 2 ) 2 =  (X-X 2 )= Group X X 1 = X 2 =  (X-X 3 ) 2 =  (X-X 3 )= X 3 = /9 = = 12.32

Why n-k? zThis has to do with “degrees of freedom.” zAs you saw last chapter, each time you add a score to a sample, you pull the sample statistic toward the population parameter.

Any score that isn’t free to vary does not tend to pull the sample statistic toward the population parameter. zOne deviation in each group is constrained by the rule that deviations around the mean must sum to zero. So one deviation in each group is not free to vary. zDeviation scores underlie our computation of SS W, which in turn underlies our computation of MS W.

n-k is the number of degrees of freedom for MS W zYou use the deviation scores as the basis of estimating sigma 2 with MS W. zScores that are free to vary are called degrees of freedom. z Since one deviation score in each group is not free to vary, you lose one degree of freedom for each group - with k groups you lose k*1=k degrees of freedom. zThere are n deviation scores in total. k are not free to vary. That leaves n-k that are free to vary, n-k degrees of freedom for MS W, for your estimate of sigma 2. zThe precision or “goodness” of an estimate is based on degrees of freedom. The more df, the closer the estimate tends to get to its population parameter.

Group X MS W = SS W /(n-k) = s = MS W = (X - X) (X - X)  (X-X 1 )=0.00  (X-X 1 ) 2 = Group  (X-X 2 ) 2 =  (X-X 2 )= Group X X 1 = X 2 =  (X-X 3 ) 2 =  (X-X 3 )= X 3 = /9 = = 12.32

More scores that are free to vary = better estimates: the mean as an example. Each time you add a randomly selected score to your sample, it is most likely to pull the sample mean closer to mu, the population mean. Any particular score may pull it further from mu. But, on the average, as you add more and more scores, the odds are that you will be getting closer to mu..

Book example Population is 1320 students taking a test.  is 72.00,  = 12 Unlike estimating the variance (where df=n-k) when estimating the mean, all the scores are free to vary. So each score in the sample will tend to make the sample mean a better estimate of mu. Let’s randomly sample one student at a time and see what happens.

Test Scores FrequencyFrequency score Sample scores: Standard deviations Scores Mean 87 Means:

Consistent estimators This tendency to pull the sample mean back to the population mean is called “regression to the mean”. We call estimates that improve when you add scores to the sample consistent estimators. Recall that the statistics that we will learn are: consistent, least squares, and unbiased.

A philosophical point zMany intro psych students wonder about psychologists efforts to understand and predict how the average person will respond to specific situations or conditions. zThey feel that people are too different for us to really determine such laws.

While psychology as a “science” can be criticized on a number of grounds, this is not one of them. zThe random differences among individuals form one of the bases of our science. Let’s go over the importance of individual differences again.

The effect of individual differences zAs you know, each time you add a score that is free to vary to a sample, the sample statistics become better estimates of their population parameters. zThis happens, in part, because individuals differ randomly. So, each person’s score corrects the sample statistics back towards their population parameters. zThese include the mean, standard deviation and other statistics that you have yet to learn, such as r, the correlation coefficient that describes the strength and direction of a relationship between two variables.

Individual differences are central in correlational research zIn correlational research, we compare individuals who differ on two variables. zWe see whether differences on one variable are related to differences on the other. zIf individuals did not have (reasonably) stable differences from each other, there could be no such correlational research.

Individual differences underlie experimental research zIn experimental work, we need to do two things. zFirst, we need to compose groups that are similar to each other. zThe key to this is randomly selecting members for each experimental group. zThat way, as you add individuals who randomly differ to each group, the groups increasingly resemble the population (and, therefore, each other) in all possible regards.

Second, in experimental work, we examine the differences among the means of experimental groups. Before extrapolating to the population from differences observed among group means, we must be sure that we are not simply seeing the results of sampling fluctuation.

We must have an index of how much variation among the means simply reflects sampling fluctuation. In most of the designs we will study in Chapters 9-11, MS W tells us how much variation among the means we should have simply from sampling fluctuation. Individual differences play a large part in determining MS W.

Thus, rather than making a science of human behavior impossible, the fact that individuals differ plays a critical role in the research designs and statistical tools that have been developed.