Download presentation
Presentation is loading. Please wait.
1
Advanced Quantitative Techniques
Lab 4
2
Sept 29th Two-sample test of means – unpaired & paired
Intro to covariance & variance
3
Data! https://www.icpsr.umich.edu/icpsrweb/ICPSR/
Inegi.gov
4
Two-sample test of means
BEFORE: Single sample hypothesis test: Test one population mean compared to a particular value NOW: Two sample hypothesis test: Test the difference between TWO population means. Primarily interested in whether the difference in the means equals zero. If so, no difference between means of the two populations.
5
Paired vs. Unpaired Data
The data may either be paired (dependent) OR not paired (independent). PAIRED DATA: Same group, two trials. The sizes of each sample are equal. One-to-one correspondence between values in two samples. UNPAIRED DATA: Different groups. The sizes of each sample may or may not be equal. No one-to-one correspondence between values in two samples.
6
Key Steps Choose the two variables to compare
Determine α error allowance Determine whether you’re working with paired or unpaired (independent) data. If samples are the same size, data could be paired or unpaired. If samples are not the same size, data is definitely unpaired. Identify whether you’re working with small or large samples. State assumptions as necessary. Note that STATA always assumes a large paired sample; if you’re working with small sample(s), you must assume that the population data are normal and (if unpaired) that variances are equal
7
Large Samples: Central Limit Theorem (CLT) applies
CLT applies and NO extra assumptions necessary But, by definition, random and representative: n1 > 30 and n2 > 30 and the two samples are selected randomly from the two populations. Small Samples: TWO Assumptions If one or both samples are small and the data is paired or unpaired, the following assumption is needed: 1) Both populations of data are normally distributed If the samples are unpaired, the additional following assumption is needed: 2) The variances of the populations of data are equal to each other (even though sample variances may be different)
8
Two-sample Hypothesis Tests: STATA and Assumptions Matrix
Small Sample(s) (Either or Both <= 30) Large Samples (Both > 30) Paired Random Samples STATA Command: − ttest var1=var2 ASSUMPTIONS: − The population of differences is normal − Sample is random CLT applies Unpaired Random − ttest var1=var2, unpaired − Both populations of data normal − Both population (data) variances equal − Samples are random − ttest var1=var2, unpaired unequal
9
Q1: gender & earnings Do men make more than women?
use gss2002_chapter7 *recode variable of interest *limit by full-time employees tab rincom98 mvdecode rincom98, mv(24) gen inc = rincom98 replace inc = 500 if rincom98 == 1 replace inc = 2500 if rincom98 == 2 replace inc = 3500 if rincom98 == 3 replace inc = 4500 if rincom98 == 4 replace inc = 5500 if rincom98 == 5 replace inc = 6500 if rincom98 == 6 replace inc = 7500 if rincom98 == 7 replace inc = 9000 if rincom98 == 8 replace inc = if rincom98 == 9 replace inc = if rincom98 == 10 replace inc = if rincom98 == 11 replace inc = if rincom98 == 12 replace inc = if rincom98 == 13 replace inc = if rincom98 == 14 replace inc = if rincom98 == 15 replace inc = if rincom98 == 16 replace inc = if rincom98 == 17 replace inc = if rincom98 == 18 replace inc = if rincom98 == 19 replace inc = if rincom98 == 20 replace inc = if rincom98 == 21 replace inc = if rincom98 == 22 replace inc = if rincom98 == 23 label variable inc "Income Category 1 to 23" tabulate rincom98 inc, missing ttest inc, by(sex) ttest inc if wrkstat == 1, by(sex)
10
Q2: Target’s International Expansion
The American retail store Target has hired you to decide if their upcoming international expansion should focus on the Asian market or the Latin American market. Use retail spending as a proxy for (to represent) market size. Based on the data, what advice would you give Target? Is average retail spending the same in these two regions considering an alpha of 5%?
11
Q1: The Hypothesis Paired or Unpaired?
Unpaired (independent) State the Null and Alternative Hypotheses (or !=0) What are the sample sizes? import excel Lab_4_Data.xls sum AsiaPacific LatinAmerica Asia: n = 57 Latin America: n = 31
12
Q1: The Stata Command State appropriate assumptions:
Samples are random and representative CLT applies ttest LatinAmerica= AsiaPacific, unpaired unequal
13
Q1: The Solution Which p-value is appropriate? Compared to which alpha? P-value = Alpha = 0.05 Reject the null hypothesis? What about errors? P-value is less than alpha, reject the null hypothesis. The Asian retail market and the Latin American retail market are not the same size. There is a 5% chance that I rejected the null when I shouldn’t have (Type 1 Error). The Asian market is larger than the Latin American market. Therefore, Target should invest in Asia.
14
Q2: David Ricardo Fan Club
The David Ricardo Fan Club just published a report claiming that increased globalization means everybody can buy more. See the Yearly Retail Sales spreadsheet (second sheet in Lab 4 Data). Based on this data, did retail spending increase between 1999 and 2003? Use alpha of 2%. Note: use the “clear” command to get rid of prior data, then paste the new data into STATA.
15
Q2: The Hypothesis Paired or Unpaired?
State the Null and Alternative Hypotheses What are the sample sizes? sum Year1999 Year2003 1999: n = 42 2003: n = 42
16
Q2: The Stata Command State appropriate assumptions:
Samples are random and representative CLT applies ttest Year2003= Year1999, level(98)
17
Q2: The Solution Which p-value is appropriate? Compared to which alpha? P-value = Alpha = Reject the null hypothesis? What about errors? P-value is more than alpha, fail to reject the null hypothesis. The fans of David Ricardo cannot claim that retail spending increased between 1999 and 2003 at a 2% level of error. There is a possibility of Type II error. What if the alpha was 3%? With 3% alpha, I could reject the null hypothesis.
18
Variance and Covariance
Variance measures how different values of x vary around the mean Covariance measures how two variables vary linearly with each other: how they change together - Positive when pair of values differs from respective means in same direction - Negative when pair of values differs from respective means in opposite directions
19
Variance and Covariance
Population Sample
20
Correlation: What is it?
Correlation measures the strength of the linear relationship between two variables, x and y. The closer to +1 or to -1, the stronger the correlation. ‘y’ usually represents the dependent variable and ‘x’ usually represents the independent variable(s). You can test correlation between any variables, independent or dependent, or any combination. Correlation can be positive or negative. Correlation is ‘scaleless’ (unit-less) and assumes a value between –1 and 1.
21
Correlation: Important Points
Both variables must be quantitative, not categorical. Does not imply causation! Correlation is usually written as r (Pearson’s r) One basic rule when interpreting the correlation coefficient is to look first at the scatterplot to see if the relationship between variables is linear - If it is, you may calculate the correlation coefficient. - There are many situations where it is not sensible to calculate the correlation coefficient.
22
Non-Linear Examples Separate Not Linear Meaningless
Line says nothing about cycles Meaningless Look at individually
23
Command: Scatterplot Relation between 311 calls & vacancy rate?
calls_per_thousand & vacant generate vacant_rate= vacant/ HSE_UNIT*100 twoway (scatter calls_per_thousand vacant_rate) (lfit calls_per_thousand vacant_rate)
24
Command: correlate (corr)
corr calls_per_thousand vacant_rate
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.