Econ 140 Lecture 31 Univariate Populations Lecture 3
Econ 140 Lecture 32 Today’s Plan Univariate statistics - distribution of a single variable Making inferences about population parameters from sample statistics - (For future reference: how can we relate the ‘a’ and ‘b’ parameters from last lecture to sample data) Dealing with two types of probability –‘A priori’ classical probability – Empirical classical
Econ 140 Lecture 33 A Priori Classical Probability Characterized by a finite number of known outcomes The expected value of Y can be defined as The expected value will always be the mean value µ Y is the population mean is the sample mean The outcome of an experiment is a randomized trial
Econ 140 Lecture 34 Flipping Coins Example: flipping 2 fair coins –Possible outcomes are: HH, TT, HT, TH –we know there are only 4 possible outcomes –we get discreet outcomes because there are a finite number of possible outcomes –We can represent known outcomes in a matrix
Econ 140 Lecture 35 Flipping Coins (2) The probability of some event A is –where m is the number of events keeping with event A and n is the total number of possible events. –If A is the number of heads when flipping 2 coins we can represent the probability distribution function like this:
Econ 140 Lecture 36 Flipping Coins (3) If we graph the PDF we get The expected value is = 0(0.25) + 1(0.5) + 2(0.25)
Econ 140 Lecture 37 Empirical Classical Probability Characterized by an infinite number of possible outcomes With empirical classical probability, we use sample data to make inferences about underlying population parameters –Most of the time, we don’t know what the population values are, so we need to use a sample Example: GPAs in the Econ 140 population –We can take a sample of every 5th person in the room –Assuming that our sample is random, we’ll have a representative sample of the population
Econ 140 Lecture 38 Empirical Classical Probability Statisticians/economists collect sample data for many other purposes CPS is another example: sampling occurs at the household level CPS uses weights to correct data for oversampling –Over-sampling would be if we picked 1 in 3 in front of the room and only 1 in 5 in the back of the room. In that case we would over-sample the front –There’s a spreadsheet example on the course website (the weighted mean is our best guess of the population mean, whereas the unweighted mean is the sample mean)
Econ 140 Lecture 39 Empirical Classical Probability On the course website you’ll find an Excel spreadsheet that we will use to calculate the following: –Expected value –PDF and CDF –Weights to translate sample data into population estimates –Examine the difference between the sample (unweighted) mean and the estimated population (weighted) mean: Weighted mean = sum(EARNWKE*EARNWT)/sum(EARNWT) This approximates the population mean estimate
Econ 140 Lecture 310 Empirical Classical Probability(3) So how do we construct a PDF for our spreadsheet example? –Pick sensible earnings bands (ie 10 bands of $100) –We can pick as many bands was we want - the greater the number of bands, the more accurate the shape of the PDF to the ‘true population’. More bands = more calculation!
Econ 140 Lecture 311 Empirical Classical Probability(2) Constructing PDFs: –Count the number of observations in each band to get an absolute frequency –Use weights to translate sample frequencies into estimates of the population frequencies –Calculate relative frequencies for each band by dividing the absolute frequency for the band by the total frequency
Econ 140 Lecture 312 Empirical Classical Probability(4) –An alternative way to approximate the PDF: –When we have k bands, always check: if the probabilities don’t sum to 1, we’ve made a mistake!
Econ 140 Lecture 313 Empirical Classical Probability(5) Going back to our expected value… The expected value of Y will be: –The p k are frequencies and they can be weighted or not –The Y k are the earnings bands midpoints (50, 150, 250, and so on in the spreadsheet) From our spreadsheet example our weighted mean was $ and the unweighted mean was $ –Since the sample is so large, the is little difference between the sample (unweighted) mean and the population (weighted) mean
Econ 140 Lecture 314 Empirical Classical Probability(6) We can also calculate the weighted and unweighted expected values: E(Weighted value): $ E(Unweightedvalue:$ Why are the expected values different from the means? –We lose some information (bands for the wage data) in calculating the expected values! So why would we want to weight the observations? –With a small sample of what we think is a large population, we might not have sampled randomly. We use weights to make the sample more closely resemble the population.
Econ 140 Lecture 315 Empirical Classical Probability(7) The mean is the first moment of distribution of earnings We may also want to consider how variable earnings are –we can do this by finding the variance, or standard error Calculate the variance –In our example, the unweighted variance is: –The weighted variance is –The difference between the two is
Econ 140 Lecture 316 Empirical Classical Probability(8) The weighted PDF is pink It’s tough to see, but the weighting scheme makes the population distribution tighter
Econ 140 Lecture 317 Empirical Classical Probability(9) We can use our PDF to answer: –What is the probability that someone earns between $300 and $400? But we can’t use this PDF to answer: –What is the probability that someone earns between $253 and $316? Why? –The second quesiton can’t be answered using our PDF because $253 and $316 fall somewhere within the earnings bands, not at the endpoints
Econ 140 Lecture 318 What we’ve done ‘A priori’ empirical classical probability –There are a finite number of possible outcomes –Flipping coins example Empirical classical probability –There are an infinite number of possible outcomes –Difference between sample and population means –Difference between sample and population expected values –Difference in calculating PDF’s of a Univariate population.