Download presentation
1
Stata教學 第四講 兩個樣本之間的比較 ©Ming-chi Chen 社會統計
2
打開85q1family.dta這個社會變遷基本資料調查第三期第二次家庭的Stata資料檔 因為中文相容性問題有一些亂碼,辨識不易
可以打開85q1_format.txt看變數名稱以及變數值名稱 以j2、j3為例 j2問受訪者「拾.2.通常您平均每週大約花多少時間做家務工作?_______ 小時」 j3問受訪者「拾.3.通常您的配偶平均每週大約花多少時間做家務工作?_______小時」 ©Ming-chi Chen 社會統計
3
我們的資料裡有變數標籤,但是因為相容性的關係會有亂碼 查看是否有亂碼? Data-data editor
在j2這個變數名稱上click一下,下面一整欄的數值都反白了 滑鼠右鍵-variable-properties-label 出現的中文是「通常您平均牢週大約花多少時間做家務工作︺」 把亂碼改好 也將j3變數標籤的亂碼改好 ©Ming-chi Chen 社會統計
4
查看變數有無異常值 關掉Data editor視窗 用box plot來看有無極端值
Graphics-easy graphs-box plot-main-在variable的空格裡鍵入j2 ©Ming-chi Chen 社會統計
5
用box plot來看有無極端值 ©Ming-chi Chen 社會統計
6
同樣方法也可以查看j3的極端值 也可以直接在指令欄 ©Ming-chi Chen 社會統計
7
這就是指令欄 ©Ming-chi Chen 社會統計
8
在指令欄裡直接鍵入 Graph box j2 然後按enter ©Ming-chi Chen 社會統計
9
Summarize varname, detail
指令欄鍵入summarize j2, detail 或statistics-summaries, tables, &tests-summary statistics-summary statistics ©Ming-chi Chen 社會統計
10
太愛做家事了吧! 高得不合理 . 通常您平均每週大約花多少時間做家務工作?
Percentiles Smallest 1% 5% 10% Obs 25% Sum of Wgt 50% Mean Largest Std. Dev 75% 90% Variance 95% Skewness 99% Kurtosis 太愛做家事了吧! 高得不合理 ©Ming-chi Chen 社會統計
11
Recode極端值 我們到85q1_format.txt去看,發現 J2 J3 996"不知道" 998"不適用" 999"拒答"
所以要把995以上定義為system missing Recode j2 995/max=. 這裡的句點.就是Stata系統定義的缺失值。 ©Ming-chi Chen 社會統計
12
一週只有168小時,所以應該合理換算,以一天16小時算,一週112小時
. summarize j2, detail 通常您平均每週大約花多少時間做家務工作? Percentiles Smallest 1% 5% 10% Obs 25% Sum of Wgt 50% Mean Largest Std. Dev 75% 90% Variance 95% Skewness 99% Kurtosis
13
用inspect來看大致分佈以及缺失個案數Data-describe data-inspect variables
. inspect j2 j2: 通常您平均每週大約花多少時間做家務工作 Number of Observations Total Integers Nonintegers | # Negative | # Zero | # Positive | # | # Total | # Missing (47 unique values)
14
Recode j2 168=112 ©Ming-chi Chen 社會統計
15
. inspect j2 j2: 通常您平均每週大約花多少時間做家務工作 Number of Observations Total Integers Nonintegers | # Negative | # Zero | # Positive | # | # Total | # Missing (46 unique values)
16
. sum j2, detail 通常您平均每週大約花多少時間做家務工作? Percentiles Smallest 1% 5% 10% Obs 25% Sum of Wgt 50% Mean Largest Std. Dev 75% 90% Variance 95% Skewness 99% Kurtosis
17
. inspect j3 j3: 通常您的配偶平均每週大約花多少時間做家 Number of Observations Total Integers Nonintegers | # Negative | # Zero | # Positive | # | # # Total | # # Missing (54 unique values)
18
. summarize j3, detail 通常您的配偶平均每週大約花多少時間做家務工作? Percentiles Smallest 1% 5% 10% Obs 25% Sum of Wgt 50% Mean Largest Std. Dev 75% 90% Variance 95% Skewness 99% Kurtosis
19
Missing value & recode Recode j3 990/max=. Recode j3 168=112
©Ming-chi Chen 社會統計
20
. recode j3 168=112 (j3: 4 changes made) . inspect j3 j3: 通常您的配偶平均每週大約花多少時間做家 Number of Observations Total Integers Nonintegers | # Negative | # Zero | # Positive | # | # Total | # Missing (50 unique values)
21
. summarize j3, detail 通常您的配偶平均每週大約花多少時間做家務工作? Percentiles Smallest 1% 5% 10% Obs 25% Sum of Wgt 50% Mean Largest Std. Dev 75% 90% Variance 95% Skewness 99% Kurtosis
22
Recode j3 112/max=112 Tabulate j3 ©Ming-chi Chen 社會統計
23
------------+----------------------------------- Total | 1,407 100.00
70 | 80 | 84 | 85 | 90 | 98 | 100 | 105 | 112 | Total | , ©Ming-chi Chen 社會統計
24
Data-data editor-找的A1這個變數-滑鼠右鍵 Variable-properties-label改成性別
來看看男女的差別 A1.這題是性別,男是1,女是2。 Data-data editor-找的A1這個變數-滑鼠右鍵 Variable-properties-label改成性別 Value label-define/modify-define-label name 輸入gender-OK-value鍵入1-text鍵入男-OK value鍵入1-text鍵入男-OK-cancel-close-value label選擇gender-OK 關掉Data editor視窗 ©Ming-chi Chen 社會統計
25
男女的家務分擔是否有不同? Statistics-Summaries, tables, & tests-tables-One/Two-way table of summary statistics 自變數 依變數 ©Ming-chi Chen 社會統計
26
| 通常您平均每週大約花多少時間做家務工作 | 性別 | Mean Std. Dev. Freq.
| Summary of | 通常您平均每週大約花多少時間做家務工作 | 性別 | Mean Std. Dev Freq. 男 | 女 | Total | 差別很大嗎? ©Ming-chi Chen 社會統計
27
母體變異數未知但已知相等 Statistics-Summaries, tables, & tests-Classical tests of hypotheses-Group mean comparison tests 自變數 依變數 信賴水準 ©Ming-chi Chen 社會統計
28
. ttest j2, by(a1) level(99) Two-sample t test with equal variances Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] 男 | 女 | combined | diff | diff = mean(男) - mean(女) t = Ho: diff = degrees of freedom = Ha: diff < Ha: diff != Ha: diff > 0 Pr(T < t) = Pr(|T| > |t|) = Pr(T > t) =
29
母體變異數未知但已知不相等 以上的方法是假設母體變異數未知但已知相等。 不管樣本大小,統計軟體一般用t檢定
那如果母體變異數未知但已知不相等,怎麼辦? ©Ming-chi Chen 社會統計
30
母體變異數未知但已知不相等 Statistics-Summaries, tables, & tests-Classical tests of hypotheses-Group mean comparison tests 自由度需要比較複雜,由Welch提出的運算方式 變異數不相等 ©Ming-chi Chen 社會統計
31
男女性負擔家務工作時數的差異,在母體變異數未知但已知不等的情況下
. ttest j2, by(a1) unequal welch level(99) Two-sample t test with unequal variances Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] 男 | 女 | combined | diff | diff = mean(男) - mean(女) t = Ho: diff = Welch's degrees of freedom = Ha: diff < Ha: diff != Ha: diff > 0 Pr(T < t) = Pr(|T| > |t|) = Pr(T > t) = ©Ming-chi Chen 社會統計
32
變異數相等與否的Levene檢定 Statistics-Summaries, tables, & tests-Classical tests of hypotheses-Group variance comparison tests 自變數 依變數 ©Ming-chi Chen 社會統計
33
變異數相等與否的Levene檢定 sd(男) / sd(女)不等於一,p值顯示可以拒斥變異數相等的虛無假設
. sdtest j2, by(a1) level(99) Variance ratio test Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] 男 | 女 | combined | ratio = sd(男) / sd(女) f = Ho: ratio = degrees of freedom = 967, 880 Ha: ratio < Ha: ratio != Ha: ratio > 1 Pr(F < f) = *Pr(F < f) = Pr(F > f) = sd(男) / sd(女)不等於一,p值顯示可以拒斥變異數相等的虛無假設 ©Ming-chi Chen 社會統計
34
根據Levene檢定的結果,選擇變異數不相等的假設比較正確。 也就是男性分擔家務的時數顯著地少於女性。
©Ming-chi Chen 社會統計
35
已婚未婚者的家務工作負擔的比較 A5為受訪者的婚姻狀況 1為未婚,2為已婚,3為其他 已婚者家務負擔比較大嗎? ©Ming-chi Chen
社會統計
36
已婚未婚者的家務工作負擔的比較 仿照男女的比較 得到如下的錯誤回報 . ttest j2, by(a5) level(99)
more than 2 groups found, only 2 allowed r(420); 這是因為a5這個變數有三個變數值:未婚、已婚和其他 要用條件是來限制,僅比較未婚者和已婚者 ©Ming-chi Chen 社會統計
37
Statistics-Summaries, tables, & tests-Classical tests of hypotheses-Group mean comparison tests
©Ming-chi Chen 社會統計
38
變異數相等 . ttest j2 if a5!=3, by(a5) level(99)
Two-sample t test with equal variances Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] 未婚 | 已婚 | combined | diff | diff = mean(未婚) - mean(已婚) t = Ho: diff = degrees of freedom = Ha: diff < Ha: diff != Ha: diff > 0 Pr(T < t) = Pr(|T| > |t|) = Pr(T > t) = ©Ming-chi Chen 社會統計
39
變異數不相等 . ttest j2 if a5!=3, by(a5) unequal welch level(99)
Two-sample t test with unequal variances Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] 未婚 | 已婚 | combined | diff | diff = mean(未婚) - mean(已婚) t = Ho: diff = Welch's degrees of freedom = Ha: diff < Ha: diff != Ha: diff > 0 Pr(T < t) = Pr(|T| > |t|) = Pr(T > t) = ©Ming-chi Chen 社會統計
40
Levene檢定 無法拒斥變異數相等的虛無假設 . sdtest j2 if a5!=3, by(a5) level(99)
Variance ratio test Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] 未婚 | 已婚 | combined | ratio = sd(未婚) / sd(已婚) f = Ho: ratio = degrees of freedom = 305, 1530 Ha: ratio < Ha: ratio != Ha: ratio > 1 Pr(F < f) = *Pr(F < f) = Pr(F > f) = ©Ming-chi Chen 社會統計
41
兩層群體的比較 已婚男女間,未婚男女間是否有差異? 婚姻是否不利於女性(至少就花在家務勞動上的時間而言)? ©Ming-chi Chen
社會統計
42
變異數相等 Statistics-Summaries, tables, & tests-Classical tests of hypotheses-Group mean comparison tests ©Ming-chi Chen 社會統計
43
多重比較變異數相等 . by a5, sort : ttest j2 if a5!=3, by(a1) level(99)
-> a5 = 未婚 Two-sample t test with equal variances Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] 男 | 女 | combined | diff | diff = mean(男) - mean(女) t = Ho: diff = degrees of freedom = Ha: diff < Ha: diff != Ha: diff > 0 Pr(T < t) = Pr(|T| > |t|) = Pr(T > t) =
44
多重比較變異數相等 -> a5 = 已婚 Two-sample t test with equal variances
Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] 男 | 女 | combined | diff | diff = mean(男) - mean(女) t = Ho: diff = degrees of freedom = Ha: diff < Ha: diff != Ha: diff > 0 Pr(T < t) = Pr(|T| > |t|) = Pr(T > t) =
45
多重比較變異數不相等 . by a5, sort : ttest j2 if a5!=3, by(a1) unequal welch level(99) -> a5 = 未婚 Two-sample t test with unequal variances Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] 男 | 女 | combined | diff | diff = mean(男) - mean(女) t = Ho: diff = Welch's degrees of freedom = Ha: diff < Ha: diff != Ha: diff > 0 Pr(T < t) = Pr(|T| > |t|) = Pr(T > t) =
46
多重比較變異數不相等 -> a5 = 已婚 Two-sample t test with unequal variances
Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] 男 | 女 | combined | diff | diff = mean(男) - mean(女) t = Ho: diff = Welch's degrees of freedom = Ha: diff < Ha: diff != Ha: diff > 0 Pr(T < t) = Pr(|T| > |t|) = Pr(T > t) =
47
多層次比較變異數相等檢定 . by a5, sort : sdtest j2 if a5!=3, by(a1) level(99)
-> a5 = 未婚 Variance ratio test Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] 男 | 女 | combined | ratio = sd(男) / sd(女) f = Ho: ratio = degrees of freedom = 176, 128 Ha: ratio < Ha: ratio != Ha: ratio > 1 Pr(F < f) = *Pr(F > f) = Pr(F > f) =
48
多層次比較變異數相等檢定 -> a5 = 已婚 Variance ratio test
Group | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] 男 | 女 | combined | ratio = sd(男) / sd(女) f = Ho: ratio = degrees of freedom = 783, 746 Ha: ratio < Ha: ratio != Ha: ratio > 1 Pr(F < f) = *Pr(F < f) = Pr(F > f) =
49
Box Plot箱型圖的比較 ©Ming-chi Chen 社會統計
50
單身男性和已婚男性是否有差別? 單身女性和已婚女性是否有差別? ©Ming-chi Chen 社會統計
51
配對樣本 結婚對女性不利? 前例的分析中,我們比較已婚者與未婚者從事家務時間的差異,由此來推論婚前婚後可能產生的變化。
但「婚前組」與「婚後組」是由不同受訪者所構成的獨立樣本。 如果「是否結婚」與某些人格特質有關,則我們不知道是因為「婚姻」本身造成行為上的改變,還是具有某種行為傾向的人比較容易選擇婚姻。即我們的分析可能隱藏「自我選擇」 self-selection的問題 ©Ming-chi Chen 社會統計
52
配對樣本 為了證明婚姻對從事家務時間的影響不是來自於自我選擇,更好的分析樣本為長期追蹤資料 (longitudinal data),即能追蹤同一個受訪者,在婚前及婚後所產生行為上的變化。 但這種樣本資料的蒐集十分費時費力。 ©Ming-chi Chen 社會統計
53
配對樣本 夫妻之間從事家務的時間是否有顯著的差異? 我們可以用兩種方式來分析:
將已婚男性與已婚女性當作兩獨立樣本,比較所有先生的平均值與太太的平均值是否有差異? ©Ming-chi Chen 社會統計
54
配對樣本 但夫妻從事家務的時間不是「獨立事件」,先生多分擔,太太自然可以少做。
因此應該比較同一家庭中,夫與妻從事家務的時間是否有差異,而不是比較所有的「夫」的平均值與所有「妻」的平均值。 ©Ming-chi Chen 社會統計
55
Statistics-Summaries, tables, & tests-Classical tests of hypotheses-Mean comparison tests, paired data 1st-2nd ©Ming-chi Chen 社會統計
56
夫妻之間的家務分工 配偶間相減,但是是妻減夫還是夫減妻? 僅知夫妻間有差異 比配偶少,且達顯著水準 Paired t test
Variable | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] j2 | j3 | diff | mean(diff) = mean(j2 - j3) t = Ho: mean(diff) = degrees of freedom = Ha: mean(diff) < Ha: mean(diff) != Ha: mean(diff) > 0 Pr(T < t) = Pr(|T| > |t|) = Pr(T > t) = 僅知夫妻間有差異 比配偶少,且達顯著水準 ©Ming-chi Chen 社會統計
57
配對樣本 如果要比較先生與太太從事家務時間的多寡,則應該如何分析? 男女分開分析 ©Ming-chi Chen 社會統計
58
產生新的變數並定義其計算式 Generate h_work = (j3 –j2)
Replace h_work=(j2 –j3) if a1==2 ©Ming-chi Chen 社會統計
59
One sample mean comparison test
Statistics-Summaries, tables, & tests-Classical tests of hypotheses-one sample mean comparison test ©Ming-chi Chen 社會統計
60
已婚女性的負擔 . ttest h_work == 0, level(99) One-sample t test
Variable | Obs Mean Std. Err. Std. Dev. [99% Conf. Interval] h_work | mean = mean(h_work) t = Ho: mean = degrees of freedom = Ha: mean < Ha: mean != Ha: mean > 0 Pr(T < t) = Pr(|T| > |t|) = Pr(T > t) = 已婚女性的負擔 ©Ming-chi Chen 社會統計
61
質化變數(比例)的比較 K1問「如果母親外出工作,對還沒上學的小孩比較不好。」
1非常贊成,2贊成,3不贊成,4非常不贊成,5無意見, 6不知道,7不瞭解題意, 9拒答,0未答 Recode k1 (1 2=1)(3 4=0)(else=.) 把這個依變數變成1和0兩個數值而已。 ©Ming-chi Chen 社會統計
62
Statistics-Summaries, tables, & tests-Classical tests of hypotheses-Group proportion test
©Ming-chi Chen 社會統計
63
Two-sample test of proportion 男: Number of obs = 935
. prtest k1, by(a1) level(99) Two-sample test of proportion 男: Number of obs = 女: Number of obs = Variable | Mean Std. Err z P>|z| [99% Conf. Interval] 男 | 女 | diff | | under Ho: diff = prop(男) - prop(女) z = Ho: diff = 0 Ha: diff < Ha: diff != Ha: diff > 0 Pr(Z < z) = Pr(|Z| < |z|) = Pr(Z > z) = ©Ming-chi Chen 社會統計
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.