Statistics for Linguistics Students Michaelmas 2004 Week 3 Bettina Braun www.phon.ox.ac.uk/~bettina.

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Inference Sampling distributions Hypothesis testing.
Significance and probability Type I and II errors Practical Psychology 1 Week 10.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and.
Statistical Issues in Research Planning and Evaluation
Introduction to Statistics
Topic 6: Introduction to Hypothesis Testing
HYPOTHESIS TESTING Four Steps Statistical Significance Outcomes Sampling Distributions.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Behavioural Science II Week 1, Semester 2, 2002
DATA ANALYSIS I MKT525. Plan of analysis What decision must be made? What are research objectives? What do you have to know to reach those objectives?
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 9: Hypothesis Tests for Means: One Sample.
Chapter Sampling Distributions and Hypothesis Testing.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
IENG 486 Statistical Quality & Process Control
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview Central Limit Theorem The Normal Distribution The Standardised Normal.
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Choosing Statistical Procedures
AM Recitation 2/10/11.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Experimental Statistics - week 2
Overview of Statistical Hypothesis Testing: The z-Test
Week 9 Chapter 9 - Hypothesis Testing II: The Two-Sample Case.
Overview Definition Hypothesis
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Fundamentals of Hypothesis Testing: One-Sample Tests
Chapter 8 Introduction to Hypothesis Testing
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Statistics for Linguistics Students Michaelmas 2004 Week 1 Bettina Braun.
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Week 8 Chapter 8 - Hypothesis Testing I: The One-Sample Case.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Chapter 8 Introduction to Hypothesis Testing
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
LECTURE 19 THURSDAY, 14 April STA 291 Spring
N318b Winter 2002 Nursing Statistics Hypothesis and Inference tests, Type I and II errors, p-values, Confidence Intervals Lecture 5.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Chapter 20 Testing Hypothesis about proportions
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Week111 The t distribution Suppose that a SRS of size n is drawn from a N(μ, σ) population. Then the one sample t statistic has a t distribution with n.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Chapter 6: Analyzing and Interpreting Quantitative Data
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
© Copyright McGraw-Hill 2004
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
Formulating the Hypothesis null hypothesis 4 The null hypothesis is a statement about the population value that will be tested. null hypothesis 4 The null.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Applied Quantitative Analysis and Practices LECTURE#14 By Dr. Osman Sadiq Paracha.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Created by Erin Hodgess, Houston, Texas Section 7-1 & 7-2 Overview and Basics of Hypothesis Testing.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Hypothesis Testing and Statistical Significance
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Chapter 9 Introduction to the t Statistic
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Hypothesis Tests: One Sample
Week 10 Chapter 16. Confidence Intervals for Proportions
Hypothesis Testing: Hypotheses
Hypothesis Testing.
Presentation transcript:

Statistics for Linguistics Students Michaelmas 2004 Week 3 Bettina Braun

Overview Discussion of last assignment Z-scores Sampling distributions Confidence intervals Hypothesis testing Type I and Type II errors

General comments Please let every file you submit contain your initials and the week the assignment was given! Please put your name somewhere on the page Paste figures into the doc-file (or rtf-file) and only submit the.sav-files Name the x- and y-axis of the figures and give them a title Do not work with var0001 (name and label varia bles) Scale figures so that numbers are readible

Manipulating figures If you want to copy SPSS-figures into your document, it is sensible to increase the font sizes (otherwise they’ll be too difficult to read). Also, you might want to change the title or legend,... Double click on any figure

Measures of central tendency Interval data, roughly normally distributed data (less appropriate for skewed distributions)  mean (although mode and median should give same results!) Interval data, strongly skewed  mode, median Categorical data (different versions, …)  mode

Sentence lengths Sentence length Very likely that most of the sentences do not exceed 20 to 30 words but there will be few sentences that are very long… N.B. It is likely that distribution of sentence lengths in Th. Mann are skewed to the left…

Preference for 3 resynthesised versions Suppose this were the outcome version# subjects a20 b37 c18 Coding: a=1, b=2, c=3  mean = 1.97 Mode is more meaningful! If you are reporting a mean, one might think there is a normal distribution

Merging datasets Yearresults , , , ,00 … , , , , , ,00 Year 90year … This is how to organise observations from the same person in different years

Describe this distribution

Normal distribution (Gaussian distribution) Example: IQ scores, mean=100, sd=16 Mean = Median = Mode

z-scores Z-score: deviation of given score from the mean in terms of standard deviations

How likely is a given event? Example: time to utter a particular sentence: x = 3.45s and sd =.84s Questions: –What proportion of the population of utterance times will fall below 3s? –What proportion would lie between 3s and 4s? –What is the time value below which we will find 1% of the data?

Sample mean and sd as parameter estimators Mean and standard deviation of the population are unknown But we can use the sample mean and sd as estimators for the parameters of the unknown population

Sample mean and sd as estimators Population parameter Sample statistics mean Standard deviation Degrees of freedom: scores that contain new information; better estimator for parameter

From sample statistics to population parameters We only know the statistics of our sample Sample statistics will differ from population parameters Knowledge about sampling distribution of the statistic (i.e. how it behaved if large samples were taken) would tell us how well the statistic estimated the parameter (degree of confidence)

Sampling distribution Population (mean 4.9, sd 3.1) 100 samples with n=50 3 examples: Taken from ALAB/Lab5/LAB4_BA2.HTMwww.fw.umn.edu/FW5601/ ALAB/Lab5/LAB4_BA2.HTM

Sampling distribution Relative frequency of 100 means: sample mean: 4.9 sample sd: 0.46 Note: –Shape of sampling distribution roughly normal –Mean of sampling distribution is population mean –Sample sd smaller than population sd

Central limit theorem Terminology: Standard deviation of the sampling distribution of the means is called standard error of the mean (SE) n=30

Experimental research Often, we are interested if human behaviour is dependent on certain factors. E.g. –Is the speech rate dependent on the dialectal region? –Do foreigners and native speakers produce sentences with the same number of words?

Dependent and independent variables Independent variable: –Variable(s) manipulated by the experimenter –experimenter determines the values it will assume –Independent variables may have a number of different levels Dependent variable: –Measure of behaviour (not manipulated or controlled by experimenter)

Examples What are the dependent and independent variables in the following questions? –Is the speech rate dependent on the dialectal region of the speakers? –Do foreigners and native speakers produce sentences with the same number of words? –Is the articulatory precision dependent on the part-of-speech? –Do different word orders influence the grammatiality judgements of subjects?

Null-hypothesis H0 Generally phrased to negate the possiblity of a relationship between the independent and dependent variables If the null-hypothesis is true, there is no interaction between dependent and independent variables Alternative hypothesis contradicts null- hypothesis

Statistical tests of significance Allows to evaluate the probability that the observed sample values would occur if the null hypothesis were true If that probability is sufficiently low, the null hypothesis can be rejected In other words: provide evidence for conlcuding (with a specified risk of error) that there are or are no real differences between conditions in the population

p-value Probability that values of the statistic like the one observed would occur if the null hypothesis were true In other words: how unusual is the observed test statistic compared to what H0 predicts? The smaller p, the more unusual the observed data if H0 were true (e.g. p=0.45 very usual, compared to p=0.001)

Type I error Type I error: –Rejection of a true null hypothesis –That is, in reality, there is no relationship between independent and dependent variable but you conclude there is –Probability of type I error is called α –α is usually determined before you run an experiment (often set at 5% or 1%)

Type II error Type II error: –Failure to reject a false null hypothesis –That is, in reality, there is a relation between the independent variable and the dependent one(s) but you conclude there is none –Probability of type II error is called β –In contrast to α, β cannot be precisely controlled

Reducing the Type II error β can be reduced by –Using an α-level of.05 (instead of a more stringent one) –Using as many subjects as can be reasonably obtained –Selecting the levels of the independent variable so as to maximise the size of the effect –Reducing variability (e.g. controlling more variables)

Organise SPSS tables Every independent variable and every dependent variable has its own column Independent variables are often found before dependent variables It is wise to compare the distributions of the conditions before statistical tests of significance (histograms, boxplots) –Either select the condition you are interested in –Or split the output according to the different levels –You can also compare boxplots for the different conditions

Data exploration Error bars show the 95% confidence interval for the mean (i.e. the mean and the area where 95% of the data fall in)

Data exploration Error bars show the 95% confidence interval for the mean (i.e. the mean and the area where 95% of the data fall in) One independent variable –Error bar (simple, groups of variables) Two independent variables –Error bar (clustered, groups of variables)