Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 A REVIEW OF QUME 232  The Statistical Analysis of Economic (and related) Data.

Similar presentations


Presentation on theme: "1 A REVIEW OF QUME 232  The Statistical Analysis of Economic (and related) Data."— Presentation transcript:

1 1 A REVIEW OF QUME 232  The Statistical Analysis of Economic (and related) Data

2 2 Brief Overview of the Course

3 3 This course is about using data to measure causal effects.

4 4 In this course you will:

5 5 Types of Data – Cross Sectional  Cross-sectional data is a random sample  Each observation is a new individual, firm, etc. with information at a point in time  If the data is not a random sample, we have a sample-selection problem

6 6 Types of Data – Time Series  Time series data has a separate observation for each time period – e.g. stock prices  Since not a random sample, different problems to consider  Trends and seasonality will be important

7 7 Types of Data – Panel  Can pool random cross sections and treat similar to a normal cross section. Will just need to account for time differences.  Can follow the same random individual observations over time – known as panel data or longitudinal data

8 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-8 Summations  The  symbol is a shorthand notation for discussing sums of numbers.  It works just like the + sign you learned about in elementary school.

9 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-9 Algebra of Summations

10 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-10 Summations: A Useful Trick

11 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-11 Double Summations  The “Secret” to Double Summations: keep a close eye on the subscripts.

12 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-12 Descriptive Statistics  How can we summarize a collection of numbers?  Mean: the arithmetic average. The mean is highly sensitive to a few large values (outliers).  Median: the midpoint of the data. The median is the number above which lie half the observed numbers and below which lie the other half. The median is not sensitive to outliers.

13 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-13 Descriptive Statistics (cont.)  Mode: the most frequently occurring value.  Variance: the mean squared deviation of a number from its own mean. The variance is a measure of the “spread” of the data.  Standard deviation: the square root of the variance. The standard deviation provides a measure of a typical deviation from the mean.

14 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-14 Descriptive Statistics (cont.)  Covariance: the covariance of two sets of numbers, X and Y, measures how much the two sets tend to “move together.” If Cov(X,Y)  0, then if X is above its mean, we would expect that Y would also be above its mean.

15 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-15 Descriptive Statistics (cont.)  Correlation Coefficient: the correlation coefficient between X and Y “norms” the covariance by the standard deviations of X and Y. You can think of this adjustment as a unit correction. The correlation coefficient will always fall between -1 and 1.

16 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-16 A Quick Example

17 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-17 A Quick Example (cont.)

18 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-18 A Quick Example (cont.)

19 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-19 Populations and Samples  Two uses for statistics:  Describe a set of numbers  Draw inferences from a set of numbers we observe to a larger population  The population is the underlying structure which we wish to study. Surveyors might want to relate 6000 randomly selected voters to all the voters in the United States. Macroeconomists might want to relate data about unemployment and inflation from 1958–2004 to the underlying process linking unemployment and inflation, to predict future realizations.

20 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-20 Populations and Samples (cont.)  We cannot observe the entire population.  Instead, we observe a sample drawn from the population of interest.  In the Monte Carlo demonstration from last time, an individual dataset was the sample and the Data Generating Process described the population.

21 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-21 Populations and Samples (cont.)  The descriptive statistics we use to describe data can also describe populations.  What is the mean income in the United States?  What is the variance of mortality rates across countries?  What is the covariance between gender and income?

22 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-22 Populations and Samples (cont.)  In a sample, we know exactly the mean, variance, covariance, etc. We can calculate the sample statistics directly.  We must infer the statistics for the underlying population.  Means in populations are also called expectations.

23 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-23 Populations and Samples (cont.)  If the true mean income in the United States is , then we expect a simple random sample to have sample mean .  In practice, any given sample will also include some “sampling noise.” We will observe not , but  + .  If we have drawn our sample correctly, then on average the sampling error over many samples will be 0.  We write this as E(  ) = 0

24 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-24 Expectations  Expectations are means over all possible samples (think “super” Monte Carlo).  Means are sums.  Therefore, expectations follow the same algebraic rules as sums.  See the Statistics Appendix for a formal definition of Expectations.

25 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-25 Algebra of Expectations  k is a constant.  E(k) = k  E(kY) = kE(Y)  E(k+Y) = k + E(Y)  E(Y+X) = E(Y) + E(X)  E(  Y i ) =  E(Y i ), where each Y i is a random variable.

26 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-26 Law of Iterated Expectations  The expected value of the expected value of Y conditional on X is the expected value of Y.  If we take expectations separately for each subpopulation (each value of X), and then take the expectation of this expectation, we get back the expectation for the whole population.

27 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-27 Variances  Population variances are also expectations.

28 Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-28 Algebra of Variances  One value of independent observations is that Cov(Y i,Y j ) = 0, killing all the cross-terms in the variance of the sum.

29 29 Review of Probability and Statistics

30 30 The California Test Score Data Set

31 31 Initial look at the data: (You should already know how to interpret this table)  This table doesn’t tell us anything about the relationship between test scores and the STR.

32 32 Do districts with smaller classes have higher test scores? Scatterplot of test score v. student-teacher ratio What does this figure show?

33 33 We need to get some numerical evidence on whether districts with low STRs have higher test scores – but how?

34 34 Initial data analysis: Compare districts with “small” (STR < 20) and “large” (STR ≥ 20) class sizes: 1.Estimation of  = difference between group means 2.Test the hypothesis that  = 0 3.Construct a confidence interval for  Class SizeAverage score ( ) Standard deviation (s B Y B ) n Small657.419.4238 Large650.017.9182

35 35 1. Estimation

36 36 2. Hypothesis testing

37 37 Compute the difference-of-means t-statistic:

38 38 3. Confidence interval

39 39 What comes next…

40 40 Review of Statistical Theory

41 41 (a) Population, random variable, and distribution

42 42 Population distribution of Y

43 43 (b) Moments of a population distribution: mean, variance, standard deviation, covariance, correlation

44 44 Moments, ctd.

45 45

46 46 so is the correlation… The covariance between Test Score and STR is negative:

47 47 The correlation coefficient is defined in terms of the covariance:

48 48 The correlation coefficient measures linear association

49 Sampling Statistical Inference Problems with Sampling: Selection bias Survivor bias Non-response bias Response bias 49

50 50 Distribution of Y 1,…, Y n under simple random sampling

51 51

52 52 Things we want to know about the sampling distribution:

53 53 Mean and variance of sampling distribution of, ctd.

54 54 The sampling distribution of when n is large

55 55 The Law of Large Numbers:

56 56 The Central Limit Theorem (CLT):

57 57

58 58

59 59 Calculating the p-value, ctd.

60 60 Calculating the p-value with  Y known:

61 61 Estimator of the variance of Y :

62 62 Computing the p-value with estimated:

63 63 What is the link between the p-value and the significance level?

64 64 At this point, you might be wondering,...

65 65 Comments on this recipe and the Student t-distribution

66 66 Comments on Student t distribution, ctd.

67 67 Comments on Student t distribution, ctd.

68 68 Comments on Student t distribution, ctd.

69 69 The Student-t distribution – summary

70 70

71 71 Confidence intervals, ctd.

72 72 Summary:

73 73 Let’s go back to the original policy question:


Download ppt "1 A REVIEW OF QUME 232  The Statistical Analysis of Economic (and related) Data."

Similar presentations


Ads by Google