S3: Chapter 4 – Goodness of Fit and Contingency Tables Dr J Frost Last modified: 30 th August 2015.

Slides:



Advertisements
Similar presentations
Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.
Advertisements

Statistical Techniques I
Hypothesis testing Another judgment method of sampling data.
Presentation on Probability Distribution * Binomial * Chi-square
S2 Chapter 7: Hypothesis Testing Dr J Frost Last modified: 3 rd October 2014.
Inferential Statistics & Hypothesis Testing
Sections 4.1 and 4.2 Overview Random Variables. PROBABILITY DISTRIBUTIONS This chapter will deal with the construction of probability distributions by.
Chapter 4 Probability Distributions
Chapter Goals After completing this chapter, you should be able to:
BHS Methods in Behavioral Sciences I
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 10 Notes Class notes for ISE 201 San Jose State University.
Class notes for ISE 201 San Jose State University
Chapter Topics Confidence Interval Estimation for the Mean (s Known)
CHAPTER 6 Statistical Analysis of Experimental Data
Inference about a Mean Part II
Part III: Inference Topic 6 Sampling and Sampling Distributions
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 15, Slide 1 Chapter 15 Random Variables.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Lecture Slides Elementary Statistics Twelfth Edition
BCOR 1020 Business Statistics
Quiz 10  Goodness-of-fit test  Test of independence.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 15 The.
S1: Chapter 9 The Normal Distribution Dr J Frost Last modified: 4 th March 2014.
S1: Chapter 8 Discrete Random Variables Dr J Frost Last modified: 25 th October 2013.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Inference for regression - Simple linear regression
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Probability Distributions Chapter 6.
What is a probability distribution? It is the set of probabilities on a sample space or set of outcomes.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chapter 11 Chi-Square Procedures 11.3 Chi-Square Test for Independence; Homogeneity of Proportions.
Chi-squared Tests. We want to test the “goodness of fit” of a particular theoretical distribution to an observed distribution. The procedure is: 1. Set.
A Course In Business Statistics 4th © 2006 Prentice-Hall, Inc. Chap 9-1 A Course In Business Statistics 4 th Edition Chapter 9 Estimation and Hypothesis.
Binomial Experiment A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that has the following properties:
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Week 21 Conditional Probability Idea – have performed a chance experiment but don’t know the outcome (ω), but have some partial information (event A) about.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
S2 Chapter 6: Populations and Samples
S2 Chapter 2: Poisson Distribution Dr J Frost Last modified: 20 th September 2015.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
© Copyright McGraw-Hill 2004
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Chi-square Test Dr. T. T. Kachwala. Using the Chi-Square Test 2 The following are the two Applications: 1. Chi square as a test of Independence 2.Chi.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
THE NORMAL DISTRIBUTION
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Probability Distributions Chapter 6.
Probability Distributions ( 확률분포 ) Chapter 5. 2 모든 가능한 ( 확률 ) 변수의 값에 대해 확률을 할당하는 체계 X 가 1, 2, …, 6 의 값을 가진다면 이 6 개 변수 값에 확률을 할당하는 함수 Definition.
Test of Goodness of Fit Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007.
Tests of hypothesis Contents: Tests of significance for small samples
Continuous Probability Distributions
Chapter 15 Random Variables.
Statistical Analysis Professor Lynne Stokes
The Chi-Squared Test Learning outcomes
S2 Chapter 6: Populations and Samples
Distribution functions
Chapter 11 Goodness-of-Fit and Contingency Tables
Chi-Square Test Dr Kishor Bhanushali.
Econometric Models The most basic econometric model consists of a relationship between two variables which is disturbed by a random error. We need to use.
Discrete Event Simulation - 4
Further Stats 1 Chapter 6 :: Chi-Squared Tests
STAT 312 Introduction Z-Tests and Confidence Intervals for a
Lecture Slides Elementary Statistics Twelfth Edition
Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007
Analyzing the Association Between Categorical Variables
Lecture Slides Elementary Statistics Twelfth Edition
Goodness of Fit.
Further Stats 1 Chapter 5 :: Central Limit Theorem
Chapter 9 Hypothesis Testing: Single Population
Presentation transcript:

S3: Chapter 4 – Goodness of Fit and Contingency Tables Dr J Frost Last modified: 30 th August 2015

Testing a Model Going back to Chapter 1 of S1 (that chapter that every teacher skips), we had the idea of modelling: DataModel Simplifying assumptions e.g. Collected heights of people in the population Why might we want to use a model for a data? It often makes calculations from the data easier, e.g. for heights in the population, if we assume a Normal Distribution, we could then calculate probabilities of someone having a given height range. This might be difficult if we used the raw data. This chapter mostly concerns how well a chosen model fits the observed data. If our simplifying assumptions were justified, we should find the model is a good fit. ?

Expected Frequency vs Observed Frequencies Number I throw a die (which may be fair) 120 times and observe the counts of each possible number. An obvious thing we might want to do is hypothesise whether or not the die is fair based on the counts seen. We need some sensible way to measure the difference between the observed and expected frequencies. Why the squared? It ensures difference is positive. ?? ?? ?

Number  Possible observed counts given that expected count is 20. Suppose we standardised this normal distribution (representing the possible observed frequencies for one particular outcome), so that 0 means the observed frequency is equal to the expected frequency, and that we square this random variable to ensure the difference is positive. Possible observed counts (now standardised and squared) i.e. possible deviation of the observed frequency from the expected frequency

Degrees of Freedom Number So when in combining the normal distributions for each outcome to give some kind of total measure of possible deviation of observed frequencies from expected frequencies, it doesn’t make sense to add another normal distribution for the last outcome, because the observed frequency can’t actually vary! (which goes against the notion of a “random variable”) ?

Example: Hypothesis Testing Number Test, at the 5% significance level, whether or not the observed frequencies could be modelled by a discrete uniform distribution. Number123456Total ? ? ? ? ? ? Critical region 5%

Test Your Understanding A 3-sided spinner is spun 150 times, and counts of the three outcomes are shown. Test, at the 1% significance level, whether or not spinner is fair. Number123Total Number123Total Observed ?

Exercise 4A

General Method for Goodness of Fit We have so far tested against a discrete uniform distribution, but we can obviously test against any other distribution in exactly the same way.

Testing a Binomial Distribution as Model Expected freq Bro Tip: You can use tables and find differences to retrieve probabilities ? ? ????????? ????????? ? ? ? ? ? ?

A study of the number of girls in families with five children was done on 100 such families. The results are summarised in the following table. Test, at the 5% significance level, whether or not a binomial distribution is a good model >3Total ? ? ? ? ? ? ? ? ? ?

? ? ? ?

Test Ye Understanding S3 May 2012 Q6 ? ?

Testing a Poisson Distribution as Model The numbers of telephone calls arriving at an exchange in six-minute periods were recorded over a period of 8 hours, with the following results. Can these results be modelled by a Poisson distribution? Test at the 5% significance level ? ? ? ? ?? ? ? ?? Just 1- the rest. ? ?

Exercise 4B

Goodness of Fit Tests for Continuous Distributions We might want to test how our data fits a normal distribution. ? ? ?

Example During observations on the height of 200 male students the following data were observed: a.Test at the 0.05 level to see if the height of male students could be modelled by a normal distribution with mean 172 and standard deviation 6. b.Describe how you would modify this test if the mean and variance were unknown. Height (cm) Freq Notice that by calculating the z- probability for the upper bound each time, we can reuse it as the lower bound in the next range. ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Example During observations on the height of 200 male students the following data were observed: a.Test at the 0.05 level to see if the height of male students could be modelled by a normal distribution with mean 172 and standard deviation 6. b.Describe how you would modify this test if the mean and variance were unknown. ? Height (cm) Freq

Test Your Understanding June 2013 Q4 a ? b ? c ? (Note that this table does NOT have gaps)

Continuous Uniform Distribution ?

Example Question In a study on the habits of a flock of starlings, the direction in which they headed when they left their roost in the mornings was recorded over 240 days. The direction was found by recording if they headed between certain features of the landscape. The compass bearings of these features were than measured. The results are given below. Suggest a suitable distribution, and test to see if the data supports this model. Direction (degrees) Frequency Why possibly suitable ? ? ? ? ? ??? ???

Test Your Understanding June 2010 Q6 ?

Exercise 4C

Contingency Tables Grade Totals School Totals So far, we have repeated a single event to get counts, e.g. throwing a single die multiple times, or in this case sampling grades from a single school and taking counts of each grade. We then determined how well this fit a particular distribution (uniform, binomial, etc.)

Contingency Tables Grade Totals School Totals ? i.e. there is not any association between the two criterion Determine to the 5% significance level whether school and grade are dependent. ? ?

Grade Totals School Totals Contingency Tables Grade Totals School50 70 Totals Expected Frequencies ? ? ? ? ? ?

Contingency Tables Grade Totals School Totals ? ?

Contingency Tables ? ? ?

Test Your Understanding June 2010 Q5 ?

Exercise 4D Question 4 onwards.