S2 Chapter 6: Populations and Samples

Slides:



Advertisements
Similar presentations
Chapter 6 Sampling and Sampling Distributions
Advertisements

Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
S2 Chapter 7: Hypothesis Testing Dr J Frost Last modified: 3 rd October 2014.
Sampling Distributions
Suppose we are interested in the digits in people’s phone numbers. There is some population mean (μ) and standard deviation (σ) Now suppose we take a sample.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Chapter 7 Sampling and Sampling Distributions
Part III: Inference Topic 6 Sampling and Sampling Distributions
Statistical inference Population - collection of all subjects or objects of interest (not necessarily people) Sample - subset of the population used to.
1 Some terminology Population - the set of all cases about which we have some interest. Sample - the cases we have selected from the population (randomly)
1 Ch6. Sampling distribution Dr. Deshi Ye
S3: Chapter 4 – Goodness of Fit and Contingency Tables Dr J Frost Last modified: 30 th August 2015.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
1 Sampling Distributions Lecture 9. 2 Background  We want to learn about the feature of a population (parameter)  In many situations, it is impossible.
Chapter 6 Lecture 3 Sections: 6.4 – 6.5.
S2 Chapter 6: Populations and Samples
S2 Chapter 2: Poisson Distribution Dr J Frost Last modified: 20 th September 2015.
Ka-fu Wong © 2003 Chap 6- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Chapter 6 Lecture 3 Sections: 6.4 – 6.5. Sampling Distributions and Estimators What we want to do is find out the sampling distribution of a statistic.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Chapter5 Statistical and probabilistic concepts, Implementation to Insurance Subjects of the Unit 1.Counting 2.Probability concepts 3.Random Variables.
Tom.h.wilson Department of Geology and Geography West Virginia University Morgantown, WV.
Sampling Distributions
Virtual University of Pakistan
Sampling Distributions
Chapter 6: Sampling Distributions
Sampling Distributions
Confidence Intervals and Sample Size
GCSE: Data Collection & Sampling
CHAPTER 6 Random Variables
Confidence Intervals and Sample Size
Sampling Distributions
Ch 9 實習.
ESTIMATION.
Sampling Why use sampling? Terms and definitions
Normal Distribution and Parameter Estimation
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
STAT 311 Chapter 1 - Overview and Descriptive Statistics
CHAPTER 6 Random Variables
Dr J Frost S3: Chapter 2 – Sampling Dr J Frost
Chapter 6: Sampling Distributions
Chapter 8: Fundamental Sampling Distributions and Data Descriptions:
Sampling terminology Sampling is an important Mathematical tool used in a wide range of fields, from piloting new drugs to testing electrical & mechanical.
Chapter 5 Sampling Distributions
Lecture 13 Sections 5.4 – 5.6 Objectives:
Combining Random Variables
Chapter 5 Sampling Distributions
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
Dr J Frost GCSE: Data Collection Dr J Frost Last.
Chapter 9 Hypothesis Testing.
Chapter 4 – Part 3.
Chapter 7 Sampling Distributions
Chapter 5 Some Important Discrete Probability Distributions
Econometric Models The most basic econometric model consists of a relationship between two variables which is disturbed by a random error. We need to use.
Lesson 2: Binomial Distribution
Probability distributions
Chapter 9.1: Sampling Distributions
Sampling Distributions
Chapter 7: Sampling Distributions
Further Stats 1 Chapter 5 :: Central Limit Theorem
CHAPTER 2: Basic Summary Statistics
Chapter 8: Fundamental Sampling Distributions and Data Descriptions:
12/12/ A Binomial Random Variables.
Simulation Berlin Chen
Ch 9 實習.
Chapter 8: Binomial and Geometric Distributions
Chapter 5: Sampling Distributions
Fundamental Sampling Distributions and Data Descriptions
Presentation transcript:

S2 Chapter 6: Populations and Samples Dr J Frost (jfrost@tiffin.kingston.sch.uk) www.drfrostmaths.com Last modified: 2nd November 2015

Populations and samples A population is: the full collection of people or things. A sample is: some subset of the population intended to represent the population. ? ? Advantages of sampling Cheaper/quicker than taking a census. Useful when testing of items results in their destruction (e.g. life-time of light bulb) ? Data obtained from all members of the population is known as a census. Disadvantages of sampling ? Potential for bias. Natural variation between any two samples due to variation in data. ?

Sampling key terms Sample ! Each individual thing in the population that can be sampled is known as a sampling unit. ! The list of all those within the population that can be sampled is known as the sampling frame.

Random sampling ! Suppose that the heights of people in a population are represented using a random variable 𝑋, where 𝑋 is (as you might expect), normally distributed, e.g. 𝑋~𝑁 1.5, 0.3 𝑓 𝑥 Bro Helping Hand: This might conceptually seem confusing as a population is a list of things. The population can be represented as a distribution where the outcomes are possible samples. For example, if a population is all possible lottery tickets, then the distribution representing it is a uniform distribution whose outcomes are all the possible tickets. 𝑥 ℎ𝑒𝑖𝑔ℎ𝑡 We want a sample with 𝒏 things in it. How could we represent the possible choice of 1st member of our sample? ? A random variable 𝑋 1 where 𝑋 1 ~𝑁 1.5, 0.3 Bro Helping Hand: Notice we’re representing the possible choice of the item in the sample, not the item itself. 𝑋 1 must have the same distribution as 𝑋, because our sample item is drawn from the population. How could we represent the possible choice of 𝑛th member of our sample? ? A random variable 𝑋 𝑛 where 𝑋 𝑛 ~𝑁 1.5, 0.3

Random sampling ! A simple random sample, of size 𝑛, is one taken so that every possible sample of size 𝑛 has an equal chance of being selected. It consists of the observations 𝑋 1 , 𝑋 2 , …, 𝑋 𝑛 from a population where each 𝑋 𝑖 : Are independent random variables Have the same distribution as the population. This means for example that if the first person chosen for our sample is Indian, that doesn’t make it any less or more likely our second choice will be Indian, i.e. our second choice is independent of the first. This will all become a lot clearer once we do an example…

Random sampling We might wish to calculate some numerical property of a population or a sample, e.g. mean, variance, mode, range. ! A population parameter is a quantity calculated from the population. ! A statistic is a quantity calculated (solely) from the observations in a sample. e.g. 𝑋 2 + 𝑋 5 + 𝑋 8 3 is a statistic (the average of the 2nd, 5th and 8th items in the sample) Σ 𝑋 2 𝑛 − Σ𝑋 𝑛 2 is a statistic. But Σ𝑋 𝑛 − 𝜇 2 is not as it involves the population mean 𝜇, which is not known purely from the sample. The idea of a statistic is that the we hope it resembles the equivalent population parameter. For example, if we’re trying to find the mean age in England, we might take a sample, calculate the sample mean age 𝑋 , and hope this represents the ‘true’ unknown population mean age 𝜇… (Recall that sample mean = 𝑋 and population mean =𝜇)

Sampling Distribution of a Statistic ! The sampling distribution of a statistic gives all the values of a statistic and the probability that each would happen by chance alone. 0 1 1 0 1 1 2 0 0 0 BOB Suppose we had 10 families which form the population of an island (The Isle of Bob), for which we know the number of children in each family. Suppose we took a (very small!) sample of 2 families. Statistics for this sample could be the mode number of children, median, maximum, mean, … #3: Thus we now have a distribution over possible values of the statistics across all possible samples we could have had, i.e. the ‘sampling distribution’. Sampling distribution for sample maximum. 𝑋 1 𝑋 2 𝑃 𝑋 1 , 𝑋 2 𝑀𝑒𝑑𝑖𝑎𝑛 𝑀𝑎𝑥 Max 𝑴 𝑷(𝑴) 0.25 1 0.56 2 0.19 ? Possible Samples? 0 0 0 1 0 2 1 0 1 1 1 2 2 0 2 1 2 2 0.5×0.5=0.25 0.5×0.4=0.2 0.5×0.1=0.05 0.4×0.5=0.2 0.4×0.4=0.16 0.4×0.1=0.04 0.1×0.5=0.05 0.1×0.4=0.04 0.1×0.1=0.01 ? ? 0.5 1 1.5 2 ? 1 2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Note: Because each thing in the sample is independently drawn from the population, we technically have sampling with replacement, and hence the same item could be in the sample twice. In practice however (and in exams) you won’t have to worry about this, as the population in exams is assumed to be infinitely large. Let’s reflect on what we did. #1: We considered all possible samples, and the probability of each sample occurring. ? ? ? #2: We’re interested in some statistic for each sample (let’s say the sample maximum)

Exam Example ? ? ? Key Points: Edexcel S2 May 2013 Q1 a b c 𝑚 1 2 5 a) Ensure you don’t forget possibilities through other possible orderings, etc. b) If we know all the possible values of the statistic, we can find the probability of the last by just subtracting from 1 (as it’s a probability distribution!) As per tip, ordering matters! (1p, 5p, 5p), (5p, 1p, 5p), (5p, 5p, 1p) (2p, 5p, 5p), (5p, 2p, 5p), (5p, 5p, 2p) (5p, 5p, 5p) For first three possibilities, probability is 3×0.5× 0.3 2 =0.135 For next three: 3×0.2× 0.3 2 =0.054 Last: 0.3 3 =0.027 𝑃(𝑀=5)=0.216 Possible values of the statistic 𝑀 (the median) is 1p, 2p, 5p. 𝑃 𝑀=1 = 3× 0.5 2 ×0.2 + 3× 0.5 2 ×0.3 + 0.5 3 =0.5 𝑃 𝑀=2 =1−0.5−0.216=0.284 (since this is the only other possibility) a ? b ? c ? 𝑚 1 2 5 𝑃 𝑀=𝑚 0.5 0.284 0.216

Test Your Understanding Edexcel S2 June 2007 Q4 Step 1: List possible samples (and statistic for each if possible). Step 2: Use this to work out the probability of obtaining each value of the statistic. ? 𝑚 5 10 𝑃 𝑀=𝑚 0.15625 0.84375

Sampling Distribution by Inspection Sometimes it is not practical to list out all the possible samples, but we can tell what the sampling distribution is by thinking about what the statistic represents. Q A school wishes to introduce a school uniform and is seeking to find out the support this idea has among the students at the school. The random variable 𝑋 is defined as: 𝑋= 1, 𝑖𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡 𝑤𝑜𝑢𝑙𝑑 𝑠𝑢𝑝𝑝𝑜𝑟𝑡 𝑖𝑑𝑒𝑎 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 Suggest a suitable population and the parameter of interest. A random sample of 15 students is asked if they would support the idea. The random sample is represented by 𝑋 1 , 𝑋 2 ,…, 𝑋 15 . Write down the sampling distribution of the Statistic 𝑌= 𝑖=1 15 𝑋 𝑖 The population is the responses of the people, represented as 0s and 1s (note, it is not the people themselves!) Population parameter of interest (based on the original question based by the school) is the proportion 𝑝 of students who support the idea. Think what 𝑌 actually represents… 𝑌=the number of students who support the idea. Since sample is random, observations are independent, each with a constant probability 𝑝 of “success”. These are conditions for a Binomial Distribution! 𝒀~𝑩 𝟏𝟓,𝒑 a ? b ?

More Wordy Questions Bob has a cupcake factory. He is trying to establish the proportion of cupcakes that are poisonous; he assumes 15%. He has ID numbers for all the cupcakes. He takes a sample of 20 cupcakes. Identify the sampling frame. The list of id numbers of all the cupcakes. Identify the sampling distribution of the number of poisonous cupcakes in the sample. If 𝑪 is the number of poisonous cupcakes in the sample, then 𝑪~𝑩(𝟐𝟎,𝟎.𝟏𝟓). Bro Note: The mark schemes likes the idea of a ‘list’ and the idea that things in the sampling frame can be clearly identified. ? ?

Exercise 6B ? ? ? ? 7 1 5 Continue onto Exercise 6C if you’re done. a A forester wants to estimate the height of the trees in a forest. He measures the heights of 50 randomly selected trees and works out the mean height. Is this a statistic? Yes as it is based only on the sample. A bag contains a large number of coins. 50% are 50p coins. 25% are 20p coins, 25% are 10p coins. Find the mean 𝜇 and variance 𝜎 2 for the value of this population of coins. 𝝈 𝟐 =𝟑𝟏𝟖.𝟕𝟓, 𝝁=𝟑𝟐.𝟓 A random sample of 2 coins is chosen from the bag. List all the possible samples that can be chosen. 𝟓𝟎,𝟓𝟎 , 𝟓𝟎,𝟐𝟎 , 𝟐𝟎,𝟓𝟎 , 𝟓𝟎,𝟏𝟎 , 𝟏𝟎,𝟓𝟎 , 𝟐𝟎,𝟐𝟎 , 𝟐𝟎,𝟏𝟎 , 𝟏𝟎,𝟐𝟎 , 𝟏𝟎,𝟏𝟎 Find the sampling distribution for the mean 𝑋 = 𝑋 1 + 𝑋 2 2 7 A supermarket sells a large number of 3-litre and 2-litre cartons of milk. They are sold in the ratio 3:2. Find the mean and variance of the milk content of this population of cartons. A random sample of 3 cartons is taken from the shelves ( 𝑋 1 , 𝑋 2 , 𝑋 3 ). List all of the possible samples. Find the sampling distribution of the mean 𝑋 . Find the sampling distribution of the mode 𝑀. Find the sampling distribution of the median 𝑁 of these samples. 1 ? 5 a ? b ? c 𝑚 50 35 30 20 15 10 𝑃 𝑀=𝑚 0.25 0.0625 0.125 ? Continue onto Exercise 6C if you’re done.