Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.

Slides:

Advertisements

Similar presentations

A small taste of inferential statistics

Advertisements

Chapter 16 Inferential Statistics

1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.

Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Sampling Distribution Models.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.

Statistical Techniques I EXST7005 Lets go Power and Types of Errors.

t scores and confidence intervals using the t distribution

1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.

Chapter Sampling Distributions and Hypothesis Testing.

Chapter 2 Simple Comparative Experiments

Stat 217 – Day 15 Statistical Inference (Topics 17 and 18)

Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.

The t-test Inferences about Population Means when population SD is unknown.

Standard Error of the Mean

INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.

Standard Error and Research Methods

Hypothesis Testing. A Research Question Everybody knows men are better drivers than women. Hypothesis: A tentative explanation that accounts for a set.

Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.

Psyc 235: Introduction to Statistics DON’T FORGET TO SIGN IN FOR CREDIT!

Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.

Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.

CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.

Section #4 October 30 th Old: Review the Midterm & old concepts 1.New: Case II t-Tests (Chapter 11)

Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median.

STAT 5372: Experimental Statistics Wayne Woodward Office: Office: 143 Heroy Phone: Phone: (214) URL: URL: faculty.smu.edu/waynew.

Intermediate Applied Statistics STAT 460

1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.

F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.

Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.

A Sampling Distribution

Estimation of Statistical Parameters

1 Statistical Inference Greg C Elvers. 2 Why Use Statistical Inference Whenever we collect data, we want our results to be true for the entire population.

Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.

Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.

Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed.

Copyright © 2009 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.

1 Chapter 18 Sampling Distribution Models. 2 Suppose we had a barrel of jelly beans … this barrel has 75% red jelly beans and 25% blue jelly beans.

Final review - statistics Spring 03 Also, see final review - research design.

1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.

1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.

PSY2004 Research Methods PSY2005 Applied Research Methods Week Five.

Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Sampling Distribution Models.

Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.

Data Analysis Econ 176, Fall Populations When we run an experiment, we are always measuring an outcome, x. We say that an outcome belongs to some.

Analyzing Statistical Inferences How to Not Know Null.

Experimental Psychology PSY 433 Appendix B Statistics.

Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.

Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.

Descriptive & Inferential Statistics Adopted from ;Merryellen Towey Schulz, Ph.D. College of Saint Mary EDU 496.

 The point estimators of population parameters ( and in our case) are random variables and they follow a normal distribution. Their expected values are.

Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.

R. G. Bias | School of Information | SZB 562BB | Phone: | i 1 INF397C Introduction to Research in Information Studies.

Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.

Welcome to MM570 Psychological Statistics

Inference: Probabilities and Distributions Feb , 2012.

Statistical Techniques

Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.

From the population to the sample The sampling distribution FETP India.

Inference About Means Chapter 23. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it’d be nice.

Chapter 13 Understanding research results: statistical inference.

Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.

Tests of Significance We use test to determine whether a “prediction” is “true” or “false”. More precisely, a test of significance gets at the question.

Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.

Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,

Sampling Distribution Models

INF397C Introduction to Research in Information Studies Spring, Day 12

Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.

Presentation transcript:

Week 6. Statistics etc. GRS LX 865 Topics in Linguistics

Update on our sentence processing experiment… Quick graph of reaction time per region Quick graph of reaction time per region

Update Seems nice; there’s a difference in region 5 (where the NP, I, they, John were) and also in region 6. From slowest to fastest, John, he, NP, I. Seems nice; there’s a difference in region 5 (where the NP, I, they, John were) and also in region 6. From slowest to fastest, John, he, NP, I. Something like what we expected—but wait… Something like what we expected—but wait…

Update Two further things that this didn’t account for: Two further things that this didn’t account for: Different people read at different speeds. Different people read at different speeds. I is a lot shorter than the photographer. Might it go faster? I is a lot shorter than the photographer. Might it go faster? To take account of people’s reading speeds, tried average RT per character on the fillers. To take account of people’s reading speeds, tried average RT per character on the fillers.

Subject RT/c Average RT per character was pretty much all over the map. Average RT per character was pretty much all over the map. So at least it seemed worth factoring out. So at least it seemed worth factoring out. Overhead? Overhead?

Items? It’s also important to look at the items. Were any always incorrect? Those might have been too hard or had something else wrong with them. ? It’s also important to look at the items. Were any always incorrect? Those might have been too hard or had something else wrong with them. ? (Not clear that we actually care whether the answer was right) (Not clear that we actually care whether the answer was right)

End result so far? So, taking that all into account, I ended up with this… Not what we were going for. So, taking that all into account, I ended up with this… Not what we were going for.

So… There’s still work to be done. Since I’m not sure exactly what work that is, once again… no lab work to do. There’s still work to be done. Since I’m not sure exactly what work that is, once again… no lab work to do. Instead, we’ll talk about statistics generally… Instead, we’ll talk about statistics generally… Places to go: Places to go:

Measuring things When we go out into the world and measure something like reaction time for reading a word, we’re trying to investigate the underlying phenomenon that gives rise to the reaction time. When we go out into the world and measure something like reaction time for reading a word, we’re trying to investigate the underlying phenomenon that gives rise to the reaction time. When we measure reaction time of reading I vs. they, we are trying to find out of there is a real, systematic difference between them (such that I is generally faster). When we measure reaction time of reading I vs. they, we are trying to find out of there is a real, systematic difference between them (such that I is generally faster).

Measuring things So, suppose for any given person, it takes A ms to read I and B ms to read they. So, suppose for any given person, it takes A ms to read I and B ms to read they. If our measurement worked perfectly, we’d get A whenever we measure for I and B whenever we measure for they. If our measurement worked perfectly, we’d get A whenever we measure for I and B whenever we measure for they. But it’s a noisy world. But it’s a noisy world.

Measuring things Measurement never works perfectly. Measurement never works perfectly. There is always additional noise of some kind or another. You’re likely to get a value near A when you measure I, but you’re not guaranteed to get A. There is always additional noise of some kind or another. You’re likely to get a value near A when you measure I, but you’re not guaranteed to get A. Similarly, there are differences between subjects, differences between items, differences of still other sorts… Similarly, there are differences between subjects, differences between items, differences of still other sorts…

A common goal Commonly what we’re after is an answer to the question: are these two things that we’re measuring actually different? Commonly what we’re after is an answer to the question: are these two things that we’re measuring actually different? So, we measure for I and for they. Of the measurements we’ve gotten, I seems to be around A, they seems to be around B, and B is a bit longer than A. The question is: given the inherent noise of measurement, how likely is it that we got that different just by chance? So, we measure for I and for they. Of the measurements we’ve gotten, I seems to be around A, they seems to be around B, and B is a bit longer than A. The question is: given the inherent noise of measurement, how likely is it that we got that different just by chance?

Some stats talk There are two major uses for statistics: There are two major uses for statistics: Describing a set of data in some comprehensible way Describing a set of data in some comprehensible way Drawing inferences from a sample about a population. Drawing inferences from a sample about a population. That last one is the useful one for us; by picking some random representative sample of the population, we can estimate characteristics of the whole population by measuring things in our sample. That last one is the useful one for us; by picking some random representative sample of the population, we can estimate characteristics of the whole population by measuring things in our sample.

Normally… Many things we measure, with their noise taken into account, can be described (at least to a good approximation) by this “bell-shaped” normal distribution. Many things we measure, with their noise taken into account, can be described (at least to a good approximation) by this “bell-shaped” normal distribution. Often as we do statistics, we implicitly assume that this is the case… Often as we do statistics, we implicitly assume that this is the case…

First some descriptive stuff Central tendency: Central tendency: What’s the usual value for this thing we’re measuring? What’s the usual value for this thing we’re measuring? Various ways to do it, most common way is by using the arithmetic mean (“average”). Various ways to do it, most common way is by using the arithmetic mean (“average”). Average is determined by adding up the measurements and dividing by the number of measurements. Average is determined by adding up the measurements and dividing by the number of measurements.

Descriptive stats Spread Spread How often is the measurement right around the mean? How far out does it get? How often is the measurement right around the mean? How far out does it get? Range (maximum - minimum), kind of basic. Range (maximum - minimum), kind of basic. Variance, standard deviation: a more sophisticated measure of the width of the measurement distribution. Variance, standard deviation: a more sophisticated measure of the width of the measurement distribution. You describe a normal distribution in terms of two parameters, mean and standard deviation. You describe a normal distribution in terms of two parameters, mean and standard deviation.

Interesting facts about stdev About 68% of the observations will be within one standard deviation of the mean. About 68% of the observations will be within one standard deviation of the mean. About 95% of the observations will be within two standard deviations of the mean. About 95% of the observations will be within two standard deviations of the mean. Percentile (mean 80, score 75, stdev 5): 15.9 Percentile (mean 80, score 75, stdev 5): 15.9

So, more or less, … If we knew the actual mean of the variable we’re measuring and the standard deviation, we can be 95% sure that any given measurement we do will land within two standard deviations of that mean— and 68% sure that it will be within one. If we knew the actual mean of the variable we’re measuring and the standard deviation, we can be 95% sure that any given measurement we do will land within two standard deviations of that mean— and 68% sure that it will be within one. Of course, we can’t know the actual mean. But we’d like to. Of course, we can’t know the actual mean. But we’d like to.

Confidence intervals It turns out that you kind run this logic in reverse as well, coming up with a confidence interval (I won’t tell you how precisely, but here’s the idea): It turns out that you kind run this logic in reverse as well, coming up with a confidence interval (I won’t tell you how precisely, but here’s the idea): Given where you see the measurements coming up, they must be 68% likely to be within 1 CI of the mean, and 95% likely to be within 2 CI of the mean, so the more measurements you have the better guess you can make. Given where you see the measurements coming up, they must be 68% likely to be within 1 CI of the mean, and 95% likely to be within 2 CI of the mean, so the more measurements you have the better guess you can make. A 95% CI like < µ < means “we’re 95% confident that the real mean is in there”. A 95% CI like < µ < means “we’re 95% confident that the real mean is in there”.

Hypothesis testing Testing to see if the means generating two distributions are actually different. Testing to see if the means generating two distributions are actually different. The idea is to determine how likely it is that we could get the difference we observe by chance. After all, you could roll 25 6’es in a row, it’s just very unlikely. (1/6)^25. (Null hypothesis = chance). The idea is to determine how likely it is that we could get the difference we observe by chance. After all, you could roll 25 6’es in a row, it’s just very unlikely. (1/6)^25. (Null hypothesis = chance). Once you estimate the sample means and standard deviations, this is something you basically look up (t-test, based on number of observations you make). This is what you see reported as p. Once you estimate the sample means and standard deviations, this is something you basically look up (t-test, based on number of observations you make). This is what you see reported as p. “p < 0.05” means there’s only a 5% chance this happened by accident. “p < 0.05” means there’s only a 5% chance this happened by accident.

Significance Generally, 0.05 is taken to be the level of “significance”—if the difference you measure only has a 5% chance of having arisen by pure accident, than that difference is significant. Generally, 0.05 is taken to be the level of “significance”—if the difference you measure only has a 5% chance of having arisen by pure accident, than that difference is significant. There’s no real magic about 0.05, it’s just a convention. Hard to say that and are seriously qualitatively different. There’s no real magic about 0.05, it’s just a convention. Hard to say that and are seriously qualitatively different.

ANOVA Analysis of variance—same as the t-test, except for more than two means at once. Still trying to discover if there are differences in the underlying distributions of several means that are unlikely to have arisen just by chance. Analysis of variance—same as the t-test, except for more than two means at once. Still trying to discover if there are differences in the underlying distributions of several means that are unlikely to have arisen just by chance. I hope to come back to this. Perhaps it can be tacked on to a different lab. I hope to come back to this. Perhaps it can be tacked on to a different lab.

Statistical power In general, the more samples you get, the better off you are—the more statistical power your analysis has. Also, the lower the variance, the significant level you’ve chosen. In general, the more samples you get, the better off you are—the more statistical power your analysis has. Also, the lower the variance, the significant level you’ve chosen. Technically, statistical power has to do with how likely it is that you will correctly reject a false null hypothesis. Technically, statistical power has to do with how likely it is that you will correctly reject a false null hypothesis. H0 true H0 false Reject H0 Type I error Correct Do not reject H0 Correct Type II error

Correlation and Chi square Correlation between two two measured variables is often measured in terms of (Pearson’s) r. Correlation between two two measured variables is often measured in terms of (Pearson’s) r. If r is close to 1 or -1, the value of one variable can predict quite accurate the value of the other. If r is close to 1 or -1, the value of one variable can predict quite accurate the value of the other. If r is close to 0, predictive power is low. If r is close to 0, predictive power is low. Chi-square test is supposed to help us decide if two conditions/factors are independent of one another or not. (Does knowing one help predict the effect of the other?)

Much more to it… Mainly I just wanted you to see some terminology. I hope to get some workable data from some experiment or lab we do that we can put into a stats program, perhaps just WebStat. Mainly I just wanted you to see some terminology. I hope to get some workable data from some experiment or lab we do that we can put into a stats program, perhaps just WebStat. …

                      