1 Part IB. Descriptive Statistics Multivariate Statistics ( 多變量統計 ) Focus: Multiple regression Spring 2007.

Slides:

Advertisements

Similar presentations

1 Econometrics. 2 Ch1 The nature and scope of Econometrics Y: dependent var. => effect ( 果 ) X 1, …X k : independent var. => cause ( 因 ) Ch2-Ch5:Review.

Advertisements

03/19/2003 Week #4 江支弘 Chapter 4 Making Predictions: Regression Analysis.

1 政大公企中心產業人才投資課程 -- 企業決策分析方法 -- 黃智聰政大公企中心產業人才投資課程課程名稱：企業決策分析方法授課老師：黃智聰授課內容：企業決策分析之報告結果與計量模型型式之選擇參考書目： Hill, C. R., W. E. Griffiths, and G. G. Judge,

Chapter Four Parameter Estimation and Statistical Inference.

McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. 肆資料分析與表達.

第七章抽樣與抽樣分配蒐集統計資料最常見的方式是抽查。這牽涉到兩個問題：抽出的樣本是否具有代表性?是否能反應出母體的特徵?

Section 1.2 Describing Distributions with Numbers 用數字描述分配.

1 政治大學財政所與東亞所選修 -- 應用計量分析 -- 中國財政研究黃智聰政治大學財政所與東亞所選修課程名稱：應用計量分析 -- 中國財政研究授課老師：黃智聰授課內容：最小平方估計式的性質、簡單迴歸模型之推論參考書目： Hill, C. R., W. E. Griffiths, and.

社研法助教課， 2007/04/11 如何閱讀 SPSS 圖表（迴歸分析篇） By 黃昱珽. 小考題目大華用 SPSS 得到以下的資料：（圖表見下面）說明 : BABYMORT = 嬰兒死亡率， GDP_CAP = 一國國民生產毛額， LIT_FEMA = 女性識字率。資料來源 : 聯合國，

1 Simple Regression ( 簡單迴歸分析 ) Social Research Methods 2109 & 6507 Spring, 2006 March 8, 9, 13, 2006.

1 政治大學財政所與東亞所選修 -- 應用計量分析 -- 中國財政研究黃智聰政治大學財政所與東亞所選修課程名稱：應用計量分析 -- 中國財政研究授課老師：黃智聰授課內容：簡單線性迴歸模型：共線性與虛擬變數參考書目： Hill, C. R., W. E. Griffiths, and G.

Section 2.3 Least-Squares Regression 最小平方迴歸

STAT0_sampling Random Sampling  母體： Finite population & Infinity population  由一大小為 N 的有限母體中抽出一樣本數為 n 的樣本，若每一樣本被抽出的機率是一樣的，這樣本稱為隨機樣本 (random sample)

第 4 章迴歸的同步推論與其他主題.

1 政大公企中心產業人才投資課程 -- 企業決策分析方法 -- 黃智聰政大公企中心產業人才投資課程課程名稱：企業決策分析方法授課老師：黃智聰授課內容：利用分公司之追蹤資料進行企業決策分析參考書目： Hill, C. R., W. E. Griffiths, and G. G. Judge,

Structural Equation Modeling Chapter 7 觀察變數路徑分析＝路徑分析觀察變數路徑分析.

STAT0_corr1 二變數的相關性  變數之間的關係是統計研究上的一大目標  討論二分類變數的相關性，以列聯表來表示  討論二連續隨機變數時，可以作 x-y 散佈圖觀察它們的關係強度  以相關係數來代表二者關係的強度.

Quantitative Data Analysis Social Research Methods 2109 & 6507 Spring, 2006 March

Section 2.2 Correlation 相關係數. 散佈圖 1 散佈圖 2 散佈圖的盲點兩座標軸的刻度不同，散佈圖的外觀呈現的相聯性強度，會有不同的感受。散佈圖 2 相聯性看起來比散佈圖 1 來得強。以統計數字相關係數做為客觀標準。

Chapter 9 Hypothesis tests with the t statistic. 當母體  為未知時 ( 我們通常不知 ) ，用樣本 s 來取代因為用 s 來估計  ，所呈現出來的分佈已不是 z distribution ，而是 t distribution.

McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. 肆資料分析與表達.

1 政治大學東亞所選修 -- 計量分析與中國大陸研究黃智聰政治大學東亞所選修課程名稱：計量分析與中國大陸研究（量化分析）授課老師：黃智聰授課內容：時間序列與橫斷面資料的共用參考書目： Hill, C. R., W. E. Griffiths, and G. G. Judge, (2001),

1 Part IC. Descriptive Statistics Multivariate Statistics ( 多變量統計 ) Focus: Multiple Regression ( 多元迴歸、複迴歸 ) Spring 2007.

2009fallStat_samplec.i.1 Chap10 Sampling distribution (review) 樣本必須是隨機樣本 (random sample) ，才能代表母體 Sample mean 是一隨機變數，隨著每一次抽出來的樣本值不同，它的值也不同，但會有規律性為了要知道估計的精確性，必需要知道樣本平均數.

1 開南大學公管所與國企所合開選修課 -- 量化分析與應用 -- 黃智聰開南大學公管所與國企所合開選修課課程名稱：量化分析與應用授課老師：黃智聰最小平方估計式的性質、簡單迴歸模型之推論參考書目： Hill, C. R., W. E. Griffiths, and G. G. Judge,

民意調查的分析 II 蔡佳泓政大選舉研究中心副研究員. 課程大綱假設的檢定研究假設 H1: 研究假設 ( 例： X 與 Y 相關 ) H0: 虛無假設 ( 例： X 與 Y 無關 ) 檢定結果：接受虛無假設或拒斥虛無假設，但不代表接受研究假設.

Quant_reg11 第三章迴歸分析  如何估計一合理的股價？  影響股價的因素：紅利 (dividend) 、報酬率、營業額、公司利潤、其它 ( 不確定因素 )  每一因素的影響程度可能不一樣  以一數學式描述股價 =β 1 ( 紅利 ) +β 2 ( 報酬率 ) +β 3 (

1 政治大學財政所與東亞所選修 -- 應用計量分析 -- 中國財政研究黃智聰政治大學財政所與東亞所選修課程名稱：應用計量分析 -- 中國財政研究授課老師：黃智聰授課內容：簡單線性迴歸模型：報告結果與選擇函數型式參考書目： Hill, C. R., W. E. Griffiths, and.

1 政治大學公企中心必修課 -- 社會科學研究方法（量化分析） -- 黃智聰政治大學公企中心必修課課程名稱：社會科學研究方法（量化分析）授課老師：黃智聰授課內容：簡單線性迴歸模型：共線性與虛擬變數參考書目： Hill, C. R., W. E. Griffiths, and G. G.

1 開南大學公管所與國企所合開選修課 -- 量化分析與應用 -- 黃智聰開南大學公管所與國企所合開選修課課程名稱：量化分析與應用授課老師：黃智聰授課內容：簡單線性迴歸模型：共線性與虛擬變數參考書目： Hill, C. R., W. E. Griffiths, and G. G. Judge,

1 政大公企中心產業人才投資課程 -- 企業決策分析方法 -- 黃智聰政大公企中心產業人才投資課程課程名稱：企業決策分析方法授課老師：黃智聰授課內容：企業在時間落差因素之決策考量參考書目： Hill, C. R., W. E. Griffiths, and G. G. Judge, (2001),

1 政大公企中心產業人才投資課程 -- 企業決策分析方法 -- 黃智聰政大公企中心產業人才投資課程課程名稱：企業決策分析方法授課老師：黃智聰授課內容：質化因素在企業決策分析之重要性參考書目： Hill, C. R., W. E. Griffiths, and G. G. Judge, (2001),

1 政大公企中心產業人才投資課程 -- 企業決策分析方法 -- 黃智聰政大公企中心產業人才投資課程課程名稱：企業決策分析方法授課老師：黃智聰授課內容：企業質化決策之應用與分析參考書目： Hill, C. R., W. E. Griffiths, and G. G. Judge, (2001),

選舉制度、政府結構與政黨體系 Cox (1997) Electoral institutions, cleavage strucuters, and the number of parties.

&TwoDOE Class 90a1 &2 Simple Comparative Experiments Statistical Plots Sampling and Sampling Distributions Hypothesis Testing Confidence Interval.

1 政治大學公企中心必修課 -- 社會科學研究方法（量化分析） -- 黃智聰政治大學公企中心必修課課程名稱：社會科學研究方法（量化分析）授課老師：黃智聰授課內容：最小平方估計式的性質、簡單迴歸模型之推論參考書目： Hill, C. R., W. E. Griffiths, and G.

Analysis of Variance (ANOVA) CH 13 變異數分析. What is ANOVA? n 檢定 3 個或 3 個以上的母體平均數是否相等的統計檢定 n 檢定多個母體平均數是否相同 n 比較大二、大三、大四學生實習滿意度是否一樣 ? ( 來自相同的 population)

1 開南大學公管所與國企所合開選修課 -- 量化分析與應用 -- 黃智聰開南大學公管所與國企所合開選修課課程名稱：量化分析與應用授課老師：黃智聰授課內容：簡單線性迴歸模型：隨機解釋變數與時間落差分配模型參考書目： Hill, C. R., W. E. Griffiths, and G.

政治大學公企中心必修課-- 社會科學研究方法（量化分析）--黃智聰

1 政大公企中心產業人才投資課程 -- 企業決策分析方法 -- 黃智聰政大公企中心產業人才投資課程課程名稱：企業決策分析方法授課老師：黃智聰企業決策計量模型之估計與特性參考書目： Hill, C. R., W. E. Griffiths, and G. G. Judge, (2001), Undergraduate.

Structural Equation Modeling Chapter 1 模式存在與否之需，見仁見智；但是，人無模式，就無決策 SEM 概論.

觀測量的權權的觀念與計算.

1 政治大學國務院國安碩專班選修課 -- 社會科學研究方法（量化分析） -- 黃智聰政治大學國務院國安碩專班選修課課程名稱：社會科學研究方法（量化分析）授課老師：黃智聰授課內容：簡單線性迴歸模型：共線性與虛擬變數參考書目： Hill, C. R., W. E. Griffiths, and.

變異數分析迴歸分析因素分析區別分析集區分析

1 政治大學國務院國安碩專班選修課 -- 社會科學研究方法（量化分析） -- 黃智聰政治大學國務院國安碩專班選修課課程名稱：社會科學研究方法（量化分析）授課老師：黃智聰授課內容：最小平方估計式的性質、簡單迴歸模型之推論參考書目： Hill, C. R., W. E. Griffiths,

845: Gas Station Numbers ★★★ 題組： Problem Set Archive with Online Judge 題號： 845: Gas Station Numbers. 解題者：張維珊解題日期： 2006 年 2 月題意：將輸入的數字，經過重新排列組合或旋轉數字，得到比原先的數字大，

1 政治大學國務院國安碩專班選修課 -- 社會科學研究方法（量化分析） -- 黃智聰政治大學國務院國安碩專班選修課課程名稱：社會科學研究方法（量化分析）授課老師：黃智聰授課內容：簡單線性迴歸模型：報告結果與選擇函數型式參考書目： Hill, C. R., W. E. Griffiths,

Structural Equation Modeling Chapter 6 CFA 根據每個因素有多重指標，以減少測量誤差並可建立問卷的構念效度驗證性因素分析.

描述統計描述統計(Descriptive Statistics)-將蒐集到的資料加以整理和記錄,並以數字和統計圖表的方式來分析及解釋資料所具有的特性. 基本統計值(平均數,中位數,標準差,變異量….) 相關性測量(卡方,相關係數,迴歸…)

Unit 3 ：變異數分析 --ANOVA 3.1 範例說明行銷研究方面， One-Way ANOVA 可用以研擬市場區隔及目標選擇策略。教育研究方面，此一模式可用以評估教師之教學績效。農業研究方面，此一模式則可用以挑選使玉米收穫量極大化的肥料。

1 開南大學公管所與國企所合開選修課 -- 量化分析與應用 -- 黃智聰開南大學公管所與國企所合開選修課課程名稱：量化分析與應用授課老師：黃智聰授課內容：簡單線性迴歸模型：非線性模型、異質變異、自我相關參考書目： Hill, C. R., W. E. Griffiths, and G.

1 政治大學公企中心必修課 -- 社會科學研究方法（量化分析） -- 黃智聰政治大學公企中心必修課課程名稱：社會科學研究方法（量化分析）授課老師：黃智聰授課內容：簡單線性迴歸模型：非線性模型、異質變異、自我相關參考書目： Hill, C. R., W. E. Griffiths, and.

1 開南大學公管所與國企所合開選修課 -- 量化分析與應用 -- 黃智聰開南大學公管所與國企所合開選修課課程名稱：量化分析與應用授課老師：黃智聰授課內容：簡單線性迴歸模型：報告結果與選擇函數型式參考書目： Hill, C. R., W. E. Griffiths, and G. G. Judge,

Probability Distribution 機率分配汪群超 12/12. 目的：產生具均等分配的數值 (Data) ，並以『直方圖』的功能計算出數值在不同範圍內出現的頻率，及繪製數值的分配圖，以反應出該機率分配的特性。

Chapter 7 Sampling Distribution

1 政治大學財政所與東亞所選修 -- 應用計量分析 -- 中國財政研究黃智聰政治大學財政所與東亞所選修課程名稱：應用計量分析 -- 中國財政研究授課老師：黃智聰授課內容：簡單線性迴歸模型：非線性模型、異質變異、自我相關參考書目： Hill, C. R., W. E. Griffiths,

1 政大公企中心產業人才投資課程 -- 企業決策分析方法 -- 黃智聰政大公企中心產業人才投資課程課程名稱：企業決策分析方法授課老師：黃智聰授課內容：非線性因素與異質變異在企業決策之解決參考書目： Hill, C. R., W. E. Griffiths, and G. G. Judge,

Cluster Analysis 目的 – 將資料分成幾個相異性最大的群組基本問題 – 如何衡量事務之間的相似性 – 如何將相似的資料歸入同一群組 – 如何解釋群組的特性.

冷凍空調自動控制 - 系統性能分析李達生. Focusing here … 概論自動控制理論發展自控系統設計實例 Laplace Transform 冷凍空調自動控制控制系統範例控制元件作動原理控制系統除錯自動控制理論系統穩定度分析系統性能分析 PID Controller 自動控制實務.

連續隨機變數連續變數：時間、分數、重量、……

Inference for Simple Regression Social Research Methods 2109 & 6507 Spring 2006 March 15, 16, 2006.

Multi Regression 1 多元迴歸分析【研究問題】學生性別、數學焦慮、數學態度、數學投入動機是否可有效預測學生的數學成就？其預測力如何？學生性別、數學焦慮、數學態度、數學投入動機是否可有效預測學生的數學成就？其預測力如何？【方法分析】研究問題中，由於預測變項包括「學生性別」、

Ch05 確定研究變項.

1 政治大學國務院國安碩專班選修課 -- 社會科學研究方法（量化分析） -- 黃智聰政治大學國務院國安碩專班選修課課程名稱：社會科學研究方法（量化分析）授課老師：黃智聰授課內容：簡單線性迴歸模型：非線性模型、異質變異、自我相關參考書目： Hill, C. R., W. E. Griffiths,

Regression 相關 –Cross table –Bivariate –Contingency Cofficient –Rank Correlation 簡單迴歸多元迴歸.

Chapter 12 Estimation 統計估計. Inferential statistics Parametric statistics 母數統計 ( 母體為常態或大樣本 ) 假設檢定 hypothesis testing  對有關母體參數的假設，利用樣本資料，決定接受或不接受該假設的方法.

McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. 肆資料分析與表達.

1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 

Copyright © 2011 Pearson Education, Inc. Linear Patterns Chapter 19.

Presentation transcript:

1 Part IB. Descriptive Statistics Multivariate Statistics ( 多變量統計 ) Focus: Multiple regression Spring 2007

2 Regression Analysis ( 迴歸分析 ) Y = f(X): Y is a function of X Regression analysis: a method of determining the specific function relating Y to X Linear Regression ( 線性迴歸分析 ): a popular model in social science A brief review offered here –Can see ppt files on the course website

3 Example: summarize the relationship with a straight line

4 Draw a straight line, but how? ( 怎麼畫那條直線 ?)

5 Notice that some predictions are not completely accurate.

6 How to draw the line? Purpose: draw the regression line to give the most accurate predictions of y given x Criteria for “accurate”: Sum of (observed y – predicted y) 2 = sum of (prediction errors) 2 [ 觀察值與估計值之差的平方和 ] Called the sum of squared errors or sum of the squared residuals (SSE)

7 Ordinary Least Squares (OLS) Regression ( 普通最小平方法 ) The regression line is drawn so as to minimize the sum of the squared vertical distances from the points to the line ( 讓 SSE 最小 ) This line minimize squared predictive error This line will pass through the middle of the point cloud ( 迴歸線從資料群中間穿過 )(think as a nice choice to describe the relationship)

8 To describe a regression line (equation): Algebraically, line described by its intercept ( 截距 ) and slope ( 斜率 ) Notation: y = the dependent variable x = the independent variable y_hat = predicted y, based on the regression line β = slope of the regression line α= intercept of the regression line

9 The meaning of slope and intercept: slope = change in (y_hat) for a 1 unit change in x (x 一單位的改變導致 y 估計值的變化 ) intercept = value of (y_hat) when x is 0 解釋截距與斜率時要注意到 x and y 的單位

10 General equation of a regression line: (y_hat) = α +βx where α and β are chosen to minimize: sum of (observed y – predicted y) 2 A formula for α and β which minimize this sum is programmed into statistical programs and calculators

11 An example of a regression line

12 Fit: how much can regression explain? ( 迴歸能解釋 y 多少的變異？ ) Look at the regression equation again: (y_hat) = α +βx y = α +βx + ε Data = what we explain + what we don’t explain Data = predicted + residual ( 資料有我們不能解釋的與可解釋的部分，即能預估的與誤差的部分）

13 In regression, we can think “fit” in this way: Total variation = sum of squares of y explained variation = total variation explained by our predictions unexplained variation = sum of squares of residuals R 2 = (explained variation)/ (total variation) （判定係數） [y 全部的變易量中迴歸分析能解釋的部分 ]

14 R 2 = r 2 NOTE: a special feature of simple regression (OLS), this is not true for multiple regression or other regression methods. [ 注意：這是簡單迴歸分析的特性，不適用於多元迴歸分析或其他迴歸分析 ]

15 Some cautions about regression and R 2 It’s dangerous to use R 2 to judge how “good” a regression is. ( 不要用 R 2 來判斷迴歸的適用性 ) –The “appropriateness” of regression is not a function of R 2 When to use regression? –Not suitable for non-linear shapes [you can modify non-linear shapes] – regression is appropriate when r (correlation) is appropriate as a measure

16 補充 : Proportional Reduction of Error (PRE)( 消減錯誤的比例 ) PRE measures compare the errors of predictions under different prediction rules; contrasts a naïve to sophisticated rule R 2 is a PRE measure Naïve rule = predict y_bar Sophisticated rule = predict y_hat R 2 measures reduction in predictive error from using regression predictions as contrasted to predicting the mean of y

17 Cautions about correlation and regression: Extrapolation is not appropriate Regression: pay attention to lurking or omitted variables –Lurking (omitted) variables: having influence on the relationship between two variables but is not included among the variables studied –A problem in establishing causation Association does not imply causation. –Association alone: weak evidence about causation –Experiments with random assignment are the best way to establish causation.

18 Inference for Simple Regression

19 Regression Equation Equation of a regression line: (y_hat) = α +βx y = α +βx + ε y = dependent variable x = independent variable β = slope = predicted change in y with a one unit change in x α= intercept = predicted value of y when x is 0 y_hat = predicted value of dependent variable

20 Global test--F 檢定 : 檢定迴歸方程式有無解釋能力 ( β= 0 )

21

22 The regression model ( 迴歸模型 ) Note: the slope and intercept of the regression line are statistics (i.e., from the sample data). To do inference, we have to think of α and β as estimates of unknown parameters.

23 Inference for regression Population regression line: μ y = α +βx estimated from sample: (y_hat) = a + bx b is an unbiased estimator ( 不偏估計式 )of the true slope β, and a is an unbiased estimator of the true intercept α

24 Sampling distribution of a (intercept) and b (slope) Mean of the sampling distribution of a is α Mean of the sampling distribution of b is β

25 Sampling distribution of a (intercept) and b (slope) Mean of the sampling distribution of a is α Mean of the sampling distribution of b is β The standard error of a and b are related to the amount of spread about the regression line (σ) Normal sampling distributions; with σ estimated use t-distribution for inference

26 The standard error of the least-squares line Estimate σ (spread about the regression line using residuals from the regression) recall that residual = (y –y_hat) Estimate the population standard deviation about the regression line (σ) using the sample estimates

27 Estimate σ from sample data

28 Standard Error of Slope (b) The standard error of the slope has a sampling distribution given by: Small standard errors of b means our estimate of b is a precise estimate of β SE b is directly related to s; inversely related to sample size (n) and S x

29 Confidence Interval for regression slope A level C confidence interval for the slope of “true” regression line β is b ± t * SE b Where t* is the upper (1-C)/2 critical value from the t distribution with n-2 degrees of freedom To test the hypothesis H 0 : β= 0, compute the t statistic: t = b/ SE b In terms of a random variable having the t,n-2 distribution

30 Significance Tests for the slope Test hypotheses about the slope of β. Usually: H 0 : β= 0 (no linear relationship between the independent and dependent variable) Alternatives: H A : β ＞ 0 or H A : β ＜ 0 or H A : β ≠ 0

31

32 Statistical inference for intercept We could also do statistical inference for the regression intercept, α Possible hypotheses: H 0 : α = 0 H A : α≠ 0 t-test based on a, very similar to prior t-tests we have done For most substantive applications, interested in slope (β), not usually interested in α

33 Example: SPSS Regression Procedures and Output To get a scatterplot (): 統計圖 (G) → 散佈圖 (S) → 簡單 → 定義（選 x 及 y ） To get a correlation coefficient: 分析 (A) → 相關 (C) → 雙變量 To perform simple regression 分析 (A) → 迴歸方法 (R) → 線性 (L) （選 x 及 y ）（還可選擇儲存預測值及殘差）

34 SPSS Example: Infant mortality vs. Female Literacy, 1995 UN Data

35 Example: correlation between infant mortality and female literacy

36 Regression: infant mortality vs. female literacy, 1995 UN Data

37 Regression: infant mortality vs. female literacy, 1995 UN Data

38 Hypothesis test example 大華正在分析教育成就的世代差異，他蒐集到 117 組父子教育程度的資料。父親的教育程度是自變項，兒子的教育程度是依變項。他的迴歸公式是： y_hat = *x 迴歸斜率的標準誤差 (standard error) 是 : 在 α=0.05 ，大華可得出父親與兒子的教育程度是有關連的嗎？ 2. 對所有父親的教育程度是大學畢業的男孩而言，這些男孩的平均教育程度預測值是多少？ 3. 有一男孩的父親教育程度是大學畢業，預測這男孩將來的教育程度會是多少？