Download presentation
Presentation is loading. Please wait.
1
1 A REVIEW OF QUME 232 The Statistical Analysis of Economic (and related) Data
2
2 Brief Overview of the Course
3
3 This course is about using data to measure causal effects.
4
4 In this course you will:
5
5 Types of Data – Cross Sectional Cross-sectional data is a random sample Each observation is a new individual, firm, etc. with information at a point in time If the data is not a random sample, we have a sample-selection problem
6
6 Types of Data – Time Series Time series data has a separate observation for each time period – e.g. stock prices Since not a random sample, different problems to consider Trends and seasonality will be important
7
7 Types of Data – Panel Can pool random cross sections and treat similar to a normal cross section. Will just need to account for time differences. Can follow the same random individual observations over time – known as panel data or longitudinal data
8
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-8 Summations The symbol is a shorthand notation for discussing sums of numbers. It works just like the + sign you learned about in elementary school.
9
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-9 Algebra of Summations
10
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-10 Summations: A Useful Trick
11
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-11 Double Summations The “Secret” to Double Summations: keep a close eye on the subscripts.
12
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-12 Descriptive Statistics How can we summarize a collection of numbers? Mean: the arithmetic average. The mean is highly sensitive to a few large values (outliers). Median: the midpoint of the data. The median is the number above which lie half the observed numbers and below which lie the other half. The median is not sensitive to outliers.
13
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-13 Descriptive Statistics (cont.) Mode: the most frequently occurring value. Variance: the mean squared deviation of a number from its own mean. The variance is a measure of the “spread” of the data. Standard deviation: the square root of the variance. The standard deviation provides a measure of a typical deviation from the mean.
14
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-14 Descriptive Statistics (cont.) Covariance: the covariance of two sets of numbers, X and Y, measures how much the two sets tend to “move together.” If Cov(X,Y) 0, then if X is above its mean, we would expect that Y would also be above its mean.
15
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-15 Descriptive Statistics (cont.) Correlation Coefficient: the correlation coefficient between X and Y “norms” the covariance by the standard deviations of X and Y. You can think of this adjustment as a unit correction. The correlation coefficient will always fall between -1 and 1.
16
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-16 A Quick Example
17
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-17 A Quick Example (cont.)
18
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-18 A Quick Example (cont.)
19
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-19 Populations and Samples Two uses for statistics: Describe a set of numbers Draw inferences from a set of numbers we observe to a larger population The population is the underlying structure which we wish to study. Surveyors might want to relate 6000 randomly selected voters to all the voters in the United States. Macroeconomists might want to relate data about unemployment and inflation from 1958–2004 to the underlying process linking unemployment and inflation, to predict future realizations.
20
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-20 Populations and Samples (cont.) We cannot observe the entire population. Instead, we observe a sample drawn from the population of interest. In the Monte Carlo demonstration from last time, an individual dataset was the sample and the Data Generating Process described the population.
21
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-21 Populations and Samples (cont.) The descriptive statistics we use to describe data can also describe populations. What is the mean income in the United States? What is the variance of mortality rates across countries? What is the covariance between gender and income?
22
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-22 Populations and Samples (cont.) In a sample, we know exactly the mean, variance, covariance, etc. We can calculate the sample statistics directly. We must infer the statistics for the underlying population. Means in populations are also called expectations.
23
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-23 Populations and Samples (cont.) If the true mean income in the United States is , then we expect a simple random sample to have sample mean . In practice, any given sample will also include some “sampling noise.” We will observe not , but + . If we have drawn our sample correctly, then on average the sampling error over many samples will be 0. We write this as E( ) = 0
24
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-24 Expectations Expectations are means over all possible samples (think “super” Monte Carlo). Means are sums. Therefore, expectations follow the same algebraic rules as sums. See the Statistics Appendix for a formal definition of Expectations.
25
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-25 Algebra of Expectations k is a constant. E(k) = k E(kY) = kE(Y) E(k+Y) = k + E(Y) E(Y+X) = E(Y) + E(X) E( Y i ) = E(Y i ), where each Y i is a random variable.
26
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-26 Law of Iterated Expectations The expected value of the expected value of Y conditional on X is the expected value of Y. If we take expectations separately for each subpopulation (each value of X), and then take the expectation of this expectation, we get back the expectation for the whole population.
27
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-27 Variances Population variances are also expectations.
28
Copyr ight © 2006 Pears on Addis on- Wesle y. All rights reserv ed. 4-28 Algebra of Variances One value of independent observations is that Cov(Y i,Y j ) = 0, killing all the cross-terms in the variance of the sum.
29
29 Review of Probability and Statistics
30
30 The California Test Score Data Set
31
31 Initial look at the data: (You should already know how to interpret this table) This table doesn’t tell us anything about the relationship between test scores and the STR.
32
32 Do districts with smaller classes have higher test scores? Scatterplot of test score v. student-teacher ratio What does this figure show?
33
33 We need to get some numerical evidence on whether districts with low STRs have higher test scores – but how?
34
34 Initial data analysis: Compare districts with “small” (STR < 20) and “large” (STR ≥ 20) class sizes: 1.Estimation of = difference between group means 2.Test the hypothesis that = 0 3.Construct a confidence interval for Class SizeAverage score ( ) Standard deviation (s B Y B ) n Small657.419.4238 Large650.017.9182
35
35 1. Estimation
36
36 2. Hypothesis testing
37
37 Compute the difference-of-means t-statistic:
38
38 3. Confidence interval
39
39 What comes next…
40
40 Review of Statistical Theory
41
41 (a) Population, random variable, and distribution
42
42 Population distribution of Y
43
43 (b) Moments of a population distribution: mean, variance, standard deviation, covariance, correlation
44
44 Moments, ctd.
45
45
46
46 so is the correlation… The covariance between Test Score and STR is negative:
47
47 The correlation coefficient is defined in terms of the covariance:
48
48 The correlation coefficient measures linear association
49
Sampling Statistical Inference Problems with Sampling: Selection bias Survivor bias Non-response bias Response bias 49
50
50 Distribution of Y 1,…, Y n under simple random sampling
51
51
52
52 Things we want to know about the sampling distribution:
53
53 Mean and variance of sampling distribution of, ctd.
54
54 The sampling distribution of when n is large
55
55 The Law of Large Numbers:
56
56 The Central Limit Theorem (CLT):
57
57
58
58
59
59 Calculating the p-value, ctd.
60
60 Calculating the p-value with Y known:
61
61 Estimator of the variance of Y :
62
62 Computing the p-value with estimated:
63
63 What is the link between the p-value and the significance level?
64
64 At this point, you might be wondering,...
65
65 Comments on this recipe and the Student t-distribution
66
66 Comments on Student t distribution, ctd.
67
67 Comments on Student t distribution, ctd.
68
68 Comments on Student t distribution, ctd.
69
69 The Student-t distribution – summary
70
70
71
71 Confidence intervals, ctd.
72
72 Summary:
73
73 Let’s go back to the original policy question:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.