Class 3 Relationship Between Variables SKEMA Ph.D programme 2010-2011 Lionel Nesta Observatoire Français des Conjonctures Economiques

Slides:



Advertisements
Similar presentations
Hypothesis Testing Steps in Hypothesis Testing:
Advertisements

Inference for Regression
Bivariate Analyses.
Hypothesis Testing IV Chi Square.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques CERAM February-March-April.
QUANTITATIVE DATA ANALYSIS
Cross Tabulation and Chi Square Test for Independence.
Chi Square Test Dealing with categorical dependant variable.
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Quantitative Methods For Social Sciences Lionel Nesta Observatoire Français des Conjonctures Economiques CERAM February-March-April.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Lecture 5 Correlation and Regression
Inferential Statistics
Analyzing Data: Bivariate Relationships Chapter 7.
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
Statistics for the Social Sciences Psychology 340 Fall 2013 Tuesday, November 19 Chi-Squared Test of Independence.
AM Recitation 2/10/11.
Statistical Analysis I have all this data. Now what does it mean?
Introduction to Linear Regression and Correlation Analysis
Class 4 Ordinary Least Squares SKEMA Ph.D programme Lionel Nesta Observatoire Français des Conjonctures Economiques
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Statistical Analysis I have all this data. Now what does it mean?
Statistics 11 Correlations Definitions: A correlation is measure of association between two quantitative variables with respect to a single individual.
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Correlation Patterns.
Two Way Tables and the Chi-Square Test ● Here we study relationships between two categorical variables. – The data can be displayed in a two way table.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Next Colin Clarke-Hill and Ismo Kuhanen 1 Analysing Quantitative Data 1 Forming the Hypothesis Inferential Methods - an overview Research Methods Analysing.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
C M Clarke-Hill1 Analysing Quantitative Data Forming the Hypothesis Inferential Methods - an overview Research Methods.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
HYPOTHESIS TESTING BETWEEN TWO OR MORE CATEGORICAL VARIABLES The Chi-Square Distribution and Test for Independence.
Class 4 Ordinary Least Squares CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Class 3 Relationship Between Variables CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques
Non-parametric Tests e.g., Chi-Square. When to use various statistics n Parametric n Interval or ratio data n Name parametric tests we covered Tuesday.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Chi-Square Analyses.
PART 2 SPSS (the Statistical Package for the Social Sciences)
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Chapter 13 Understanding research results: statistical inference.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Nonparametric Statistics
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Interpretation of Common Statistical Tests Mary Burke, PhD, RN, CNE.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
I. ANOVA revisited & reviewed
Nonparametric Statistics
Chapter 9: Non-parametric Tests
Making Use of Associations Tests
Correlation – Regression
Hypothesis Testing Review
Comparing k Populations
Nonparametric Statistics
The Chi-Square Distribution and Test for Independence
Comparing k Populations
Association, correlation and regression in biomedical research
Parametric versus Nonparametric (Chi-square)
Making Use of Associations Tests
CLASS 6 CLASS 7 Tutorial 2 (EXCEL version)
Presentation transcript:

Class 3 Relationship Between Variables SKEMA Ph.D programme Lionel Nesta Observatoire Français des Conjonctures Economiques

Qualitative × Qualitative × Quantitative × Quantitative Which variables are we looking at ? Relationship Between Variables ANOVA

ANOVA: ANalysis Of VAriance  ANOVA is a generalization of Student t test  Student test applies to two categories only:  H 0 : μ 1 = μ 2  H 1 : μ 1 ≠ μ 2  ANOVA is a method to test whether group means are equal or not.  H 0 : μ 1 = μ 2 = μ 3 =... = μ n  H 1 : At least one mean differs significantly

ANOVA This method is called after the fact that it is based on measures of variance. The F-statistics is a ratio comparing the variance due to group differences (explained variance) with the variance due to other phenomena (unexplained variance). Higher F means more explanatory power, thus more significance of groups.

Revenues (in million of US $ ) Sector 1Sector 2Sector 3 Firm Firm Firm Firm Firm

Revenues (in million of US $ ) Sector 1Sector 2Sector 3 Firm Firm Firm Firm Firm 534.8

Revenues (in million of US $ ) Sector 1Sector 2Sector 3 Firm Firm Firm Firm Firm Do sectors differ significantly in their revenues? H 0 : μ 1 = μ 2 = μ 3 =... = μ n H 1 : At least one mean differs significantly.

ANOVA df = (k – 1) df = n – kdf = n – 1 residual This decomposition produces Fisher’s Statistics as follows:

Origin of variationSSd.f.MSSF-StatProb>F SS-between SS-within (residual) SS-total The result tells me that I can reject the null Hypothesis H 0 with 0.03% chances of rejecting the null Hypothesis H 0 while H 0 holds true (being wrong). I WILL TAKE THE CHANCE!!! The ANOVA decomposition on Revenues

Comparison of Means Using Student t with STATA We still use the same command ttest ttest var1, by(varcat) For example: ttest lnassets, by(type) ttest lnrd, by(year) ttest lnrdi, by(type) Beware! Unlike ANOVA, Student t test can only be perfomed to compare two categories.

ANOVA under STATA We still use the same command anova anova var1 varcat For example: anova lnassets isic anova lnrd isic anova lnrdi isic anova cours titype

Stata Instruction Sum of Squares F-Stat P value STATA Application: ANOVA

Anova Example in Published Paper

 Verify that US companies are larger than those from the rest of the world with an ANOVA  Are there systematic  Sectoral differences in terms of labour; R&D, sales  Write out H 0 and H 1 for each variables  Analyse  Comparer les moyennes  ANOVA à un fateur  What do you conclude at 5% level?  What do you conclude at 1% level? SPSS Application: ANOVA

SPSS Application: t test comparing means

Qualitative × Qualitative × Quantitative × Quantitative Which variables are we looking at ? Relationship Between Variables Chi-Square Independence Test

Introduction to Chi-Square  This part devoted to the study of whether two qualitative (categorical) variables are independent: H 0 : Independent: the two qualitative variables do not exhibit any systematic association. H 1 : Dependent: the category of one qualitative variable is associated with the category of another qualitative variable in some systematic way which departs significantly from randomness.

The Four Steps Towards The Test 1.Build the cross tabulation to compute observed joint frequencies 2.Compute expected joint frequencies under the assumption of independence 3.Compute the Chi-square ( χ²) distance between observed and expected joint frequencies 4.Compute the significance of the χ² distance and conclude on H 0 and H 1

1. Cross Tabulation  A cross tabulation displays the joint distribution of two or more variables. They are usually referred to as a contingency tables.  A contingency table describes the distribution of two (or more) variables simultaneously. Each cell shows the number of respondents that gave a specific combination of responses, that is, each cell contains a single cross tabulation.

1. Cross Tabulation  We have data on two qualitative and categorical dimensions and we wish to know whether they are related  Region (AM, ASIA, EUR)  Type of company (DBF, LDF)

1. Cross Tabulation  We have data on two qualitative and categorical dimensions and we wish to know whether they are related  Region (AM, ASIA, EUR)  Type of company (DBF, LDF)

1. Cross Tabulation  We have data on two qualitative and categorical dimensions and we wish to know whether they are related  Region (AM, ASIA, EUR)  Type of company (DBF, LDF)

1. Cross Tabulation  Crossing Region (AM, ASIA, EUR) × Type of company (DBF, LDF)  tabulate continent type

2. Expected Joint Frequencies  In order to say something on the relationship between two categorical variables, it would be nice to produce expected, also called theoretical, frequencies under the assumption of independence between the two variables.

 tabulate continent type, expected 2. Expected Joint Frequencies

3. Computing the χ² statistics  We can now compare what we observe with what we should observe, would the two variables be independent. The larger the difference, the less independent the two variables. This difference is termed a Chi-Square distance. With a contingency table of n lines and m columns, the statistics follows a χ² distribution with ( n -1)×( m -1) degree of freedom, with the lowest expected frequency being at least 5.

3. Computing the χ² statistics  tabulate continent type, expected chi2

4. Conclusion on H 0 versus H 1 We reject H 0 with 0.00% chances of being wrong I will take the chance, and I tentatively conclude that the type of companies and the regional origins are not independent. Using our appreciative knowledge on biotechnology, it makes sense: biotechnology was first born in the USA, with European companies following and Asian (i.e. Japanese) companies being mainly large pharmaceutical companies. Most DBFs are found in the US, then in Europe. This is less true now.

 Analyse  Statistiques descriptives  Tableaux Croisés  Cellule  Observé & Théorique 2. SPSS : Expected Joint Frequencies

 Analyse  Statistiques descriptives  Tableaux Croisés  Statistique  Chi-deux 3. SPSS : Computing the χ² statistics

Qualitative × Qualitative × Quantitative × Quantitative Which variables are we looking at ? Relationship Between Variables Correlations

Introduction to Correlations  This part is devoted to the study of whether – and the extent to which – two or more quantitative variables are related:  Positively correlated: the values of one variable “varying somewhat in step” with the values of another variable  Negatively correlated: the values of one continuous variable “varying somewhat in opposite step” with the values of another variable  Not correlated: the values of one continuous variable “varying randomly” when the values of another variable vary.

Scatter Plot of R&D and Patents (log)

 The Pearson product-moment correlation coefficient is a measure of the co-relation between two variables x and y.  Pearson's r reflects the intensity of linear relationship between two variables. It ranges from +1 to -1.  r near 1 : Positive Correlation  r near -1 : Negative Correlation  r near 0 : No or poor correlation Pearson’s Linear Correlation Coefficient r

Cov(x,y) : Covariance between x and y  x et  y : Standard deviation of x and Standard deviation of y n : Number of observations Pearson’s Linear Correlation Coefficient r

  corr lpat lassets lrd lrdi lpat_assets

 Is  significantly different from 0 ?  H 0 : r x,y = 0  H 1 : r x,y  0  t* : if t* > t  with (n – 2) degree of freedom and critical probability α (5%), we reject H 0 and conclude that r significantly different from 0. Pearson’s Linear Correlation Coefficient r

  pwcorr lpat lassets lrd lrdi lpat_assets, sig

 Assumptions of Pearson’s r  There is a linear relationships between x and y  Both x and y are continuous random variables  Both variables are normally distributed  Equal differences between measurements represent equivalent intervals. We may want to relax (one of) these assumptions Pearson’s Linear Correlation Coefficient r

Spearman’s Rank Correlation Coefficient ρ  Spearman's rank correlation is a non parametric measure of the intensity of a correlation between two variables, without making any assumptions about the distribution of the variables, i.e. about the linearity, normality or scale of the relationship.   near 1 : Positive Correlation   near -1 : Negative Correlation   near 0 : No or poor correlation

d² : the difference between ranks of paired values of x and y n : Number of observations  ρ is simply a special case of the Pearson product-moment coefficient in which the data are converted to ranks before calculating the coefficient. Spearman’s Rank Correlation Coefficient ρ

 spearman lpat lassets lrd lrdi lpat_assets

Spearman’s Rank Correlation Coefficient ρ  spearman lpat lassets lrd lrdi lpat_assets, stats(rho p)

Pearson’s r or Spearman’s ρ ?  Relationship between tastes and levels of consumption on a large sample? (ρ)  Relationship between income and Consumption on a large sample? (r)  Relationship between income and Consumption on a small sample? Both (ρ) and (r)

 Analyse  Corrélation  Bivariée  Click on Pearson Pearson’s Linear Correlation Coefficient r

 Analyse  Corrélation  Bivariée  Click on “Spearman” Spearman’s Rank Correlation Coefficient ρ