19 May Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government.

Slides:



Advertisements
Similar presentations
CHAPTER TWELVE ANALYSING DATA I: QUANTITATIVE DATA ANALYSIS.
Advertisements

Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Bivariate Regression Analysis
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
QUANTITATIVE DATA ANALYSIS
Statistics: An Introduction Alan Monroe: Chapter 6.
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
The Simple Regression Model
Chapter Eighteen MEASURES OF ASSOCIATION
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Session 7.1 Bivariate Data Analysis
SOWK 6003 Social Work Research Week 10 Quantitative Data Analysis
1 Econ 240A Power Outline Review Projects 3 Review: Big Picture 1 #1 Descriptive Statistics –Numerical central tendency: mean, median, mode dispersion:
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Analysis of Research Data
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Social Research Methods
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
Chapter 8: Bivariate Regression and Correlation
Understanding Research Results
LIS 570 Summarising and presenting data - Univariate analysis continued Bivariate analysis.
Introduction to Linear Regression and Correlation Analysis
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.
ASSOCIATION BETWEEN INTERVAL-RATIO VARIABLES
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
Chapter 15 Correlation and Regression
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Chapter 15 Data Analysis: Testing for Significant Differences.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Agenda Review Association for Nominal/Ordinal Data –  2 Based Measures, PRE measures Introduce Association Measures for I-R data –Regression, Pearson’s.
Chapter Eleven A Primer for Descriptive Statistics.
Descriptive Statistics Descriptive Statistics describe a set of data.
Bivariate Regression Analysis The most useful means of discerning causality and significance of variables.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Chapter 8 – 1 Chapter 8: Bivariate Regression and Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
UNDERSTANDING RESEARCH RESULTS: DESCRIPTION AND CORRELATION © 2012 The McGraw-Hill Companies, Inc.
METHODS IN BEHAVIORAL RESEARCH NINTH EDITION PAUL C. COZBY Copyright © 2007 The McGraw-Hill Companies, Inc.
Descriptive Statistics Descriptive Statistics describe a set of data.
DESCRIPTIVE STATISTICS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
L. Liu PM Outreach, USyd.1 Survey Analysis. L. Liu PM Outreach, USyd.2 Types of research Descriptive Exploratory Evaluative.
I. Introduction to Data and Statistics A. Basic terms and concepts Data set - variable - observation - data value.
Exam 1 Review GOVT 120. Review: Levels of Analysis Theory: Concept 1 is related to Concept 2 Hypothesis: Variable 1 (IV) is related to Variable 2 (DV)
Correlation & Regression Analysis
Copyright © 2011, 2005, 1998, 1993 by Mosby, Inc., an affiliate of Elsevier Inc. Chapter 19: Statistical Analysis for Experimental-Type Research.
Univariate Point Estimation Confidence Interval Estimation Bivariate: Linear Regression Multivariate: Multiple Regression 1 Chapter 4: Statistical Approaches.
Educational Research Descriptive Statistics Chapter th edition Chapter th edition Gay and Airasian.
Chapter 12 Understanding Research Results: Description and Correlation
Bivariate & Multivariate Regression Analysis
Introduction to Regression Analysis
Making Comparisons All hypothesis testing follows a common logic of comparison Null hypothesis and alternative hypothesis mutually exclusive exhaustive.
Bi-variate #1 Cross-Tabulation
APPROACHES TO QUANTITATIVE DATA ANALYSIS
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Social Research Methods
LEARNING OUTCOMES After studying this chapter, you should be able to
Descriptive and Inferential
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Theme 4 Elementary Analysis
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Regression Part II.
Presentation transcript:

19 May Crawford School 1 Basic Statistics – 1 Semester 1, 2009 POGO8096/8196: Research Methods Crawford School of Economics and Government

219 May Crawford School This week Introduction Introduction Data and variables Data and variables Statistics and statistical analysis Statistics and statistical analysis Univariate analysis Univariate analysis Bivariate analysis Bivariate analysis Relationships between variables Relationships between variables Regression analysis Regression analysis Correlational anlaysis Correlational anlaysis

319 May Crawford School Data and variables – 1 Data are observed numerical facts for analysis. Data are observed numerical facts for analysis. Survey data Survey data Time-series data Time-series data Cross-section data Cross-section data Q. What is the unit of analysis/observations? Q. What is the unit of analysis/observations? A variable is an empirical property that can take on two or more different values. A variable is an empirical property that can take on two or more different values.

419 May Crawford School Data and variables – 2 Levels of measurement (review) Levels of measurement (review) Nominal variable (categorical) Nominal variable (categorical) Ordinal variable (categorical) Ordinal variable (categorical) Interval variable (continuous) Interval variable (continuous) Dichotomous variable (or “dummy variable”) Dichotomous variable (or “dummy variable”) It is a variable that has two, and only two, possible values or categories. It is a variable that has two, and only two, possible values or categories. e.g., {voted, abstained}, {male, female}, {yes, no}. e.g., {voted, abstained}, {male, female}, {yes, no}.

519 May Crawford School Data and variables – 3 Most questions in a survey are nominal, ordinal or dichotomous. Most questions in a survey are nominal, ordinal or dichotomous. Interval variables are common in time-series data and cross-section data. Interval variables are common in time-series data and cross-section data. Dichotomous variables can be used to measure institutional differences in cross-section data and structural changes in time-series data. Dichotomous variables can be used to measure institutional differences in cross-section data and structural changes in time-series data. e.g., in cross-national data; 0 if democracy, 1 otherwise e.g., in cross-national data; 0 if democracy, 1 otherwise e.g., in yearly data; 0 if before 1995, 1 if 1995 onward e.g., in yearly data; 0 if before 1995, 1 if 1995 onward

619 May Crawford School Statistics A statistic is a numerical summary of data. A statistic is a numerical summary of data. Univariate statistics Univariate statistics Numerical summaries of a particular variable. Numerical summaries of a particular variable. e.g., the “proportion” of respondents in a survey supporting a proposed policy change. e.g., the “proportion” of respondents in a survey supporting a proposed policy change. Bivariate/multivariate statistics Bivariate/multivariate statistics Numerical summaries of relationships between variables. Numerical summaries of relationships between variables. e.g., the “correlation” between inequality and growth. e.g., the “correlation” between inequality and growth.

719 May Crawford School Statistical analysis Statistical analysis includes two main activities: Statistical analysis includes two main activities: Statistical measurement Statistical measurement It consists of measuring statistics (a plural form of statistic), including measuring relationships between variables. It consists of measuring statistics (a plural form of statistic), including measuring relationships between variables. Statistical inference Statistical inference It consists of estimating how likely it is that a particular result (e.g., correlation between variables) could be due to chance. It consists of estimating how likely it is that a particular result (e.g., correlation between variables) could be due to chance.

819 May Crawford School Univariate statistics – 1 Measures of central tendency Measures of central tendency Mean Mean Median (the middle value) Median (the middle value) Mode (the most frequently occurring value) Mode (the most frequently occurring value) Measures of dispersion Measures of dispersion Range (the distance from the lowest to the highest value) Range (the distance from the lowest to the highest value) Concentration (the relative frequency of occurring of a score) Concentration (the relative frequency of occurring of a score) Standard deviation Standard deviation

919 May Crawford School Univariate statistics – 2 NominalOrdinalIntervalDichotomous Mean  (proportion) Median  Mode  Range  Concent.  Std. Dev.  (  ) Check both if two different measures are available. Which measures do we (usually) use for each type of variables?

1019 May Crawford School Observations = 1,307 Japanese voters 2 (Primary) 3 (Secondary) 4 (University) Example – Education

1119 May Crawford School Observations = 50 US States Mean = 5.5 Median = 4.0 Minimum = 0.5 Maximum = 33.0 Std. Dev. = 6.0 Example – Population Q. Other examples of skewed variables?

1219 May Crawford School Bivariate relationships A variable is related or unrelated to another. A variable is related or unrelated to another. A variable is positively or negatively related to another. A variable is positively or negatively related to another. A variable is strongly or weakly related to another. A variable is strongly or weakly related to another. A variable has a large or small effect on another. A variable has a large or small effect on another. A variable is significantly or insignificantly related to another [“statistical inference”]. A variable is significantly or insignificantly related to another [“statistical inference”].

1319 May Crawford School Related or unrelated?

1419 May Crawford School Positively or negatively related?

1519 May Crawford School Strongly or weakly related?

1619 May Crawford School Large or small effect?

1719 May Crawford School The level of measurement matters Depending on the level of measurement, … Depending on the level of measurement, … You cannot measure whether the relationships between variables is positive or negative, if one of the variables is nominal. You cannot measure whether the relationships between variables is positive or negative, if one of the variables is nominal. You can measure whether a variable has a large or small effect on another, only if the two variables are interval. You can measure whether a variable has a large or small effect on another, only if the two variables are interval. You can always measure whether variables are strongly or weakly related, regardless of the variables’ levels of measurement. You can always measure whether variables are strongly or weakly related, regardless of the variables’ levels of measurement.

1819 May Crawford School Bivariate analysis with categorical variables A visual presentation A visual presentation The way data on two nominal or ordinal categorical variables are customarily presented is by use of a “cross tabulation” or “contingency table”. The way data on two nominal or ordinal categorical variables are customarily presented is by use of a “cross tabulation” or “contingency table”. Bivariate statistics for categorical variables? Bivariate statistics for categorical variables? There are some bivariate statistics, such as Lamda, Gamma, Phi, Tau-b, etc. None of these measures is all that satisfactory and is not free from drawbacks. There are some bivariate statistics, such as Lamda, Gamma, Phi, Tau-b, etc. None of these measures is all that satisfactory and is not free from drawbacks.

1919 May Crawford School Cross tabulation – 1 EducationIncome LowMiddleHigh Middle Low Numbers in cells are the numbers of observations. Numbers in cells are the numbers of observations. There is a positive correlation between the two variables, but you cannot say how much change is produced in one variable by a change in another. There is a positive correlation between the two variables, but you cannot say how much change is produced in one variable by a change in another.

2019 May Crawford School Cross tabulation – 2 Voted for candidate … Party support LaborLiberalOthers Mr. A Mr. B Mr. C There is a correlation between the two variables, but you can say neither whether the correlation is positive or negative, nor how much change is produced in one variable by a change in another. There is a correlation between the two variables, but you can say neither whether the correlation is positive or negative, nor how much change is produced in one variable by a change in another.

2119 May Crawford School Bivariate analysis with interval variables A visual presentation A visual presentation A “scattergram” or “scatterplot” A “scattergram” or “scatterplot” The horizontal axis is used for the independent variable (X) and the vertical axis for the dependent variable (Y). The horizontal axis is used for the independent variable (X) and the vertical axis for the dependent variable (Y). Bivariate statistics for interval variables Bivariate statistics for interval variables The “effect-descriptive” characteristics of a scattergram is the “regression coefficient.” The “effect-descriptive” characteristics of a scattergram is the “regression coefficient.” The “correlational” characteristics of a scattergram is the “correlation coefficient.” The “correlational” characteristics of a scattergram is the “correlation coefficient.”

2219 May Crawford School Regression analysis – 1 Find the single line that best approximates the pattern in the dots of a scattergram. Find the single line that best approximates the pattern in the dots of a scattergram. The best method (OLS) is to choose the line that minimizes the squared differences between observed values of the dependent variable and its predicted values. The best method (OLS) is to choose the line that minimizes the squared differences between observed values of the dependent variable and its predicted values.

2319 May Crawford School Regression analysis – 2 The regression equation: The regression equation: y = a + bx y is the predicted value of the dependent variable. y is the predicted value of the dependent variable. x is the value of the independent variable. x is the value of the independent variable. a is the “intercept” of the regression line. a is the “intercept” of the regression line. b is the “slope” of the regression equation. b is the “slope” of the regression equation. The main quantity of interest!

2419 May Crawford School Regression analysis – 3 The slope, often simply called the “regression coefficient,” is the most valuable part of this equation for most purposes in empirical research. The slope, often simply called the “regression coefficient,” is the most valuable part of this equation for most purposes in empirical research. Why? Because it provides a single, precise summary measure of how great an impact the independent variable has on the dependent variable. Why? Because it provides a single, precise summary measure of how great an impact the independent variable has on the dependent variable. It is important to know that researchers must assume the direction of causation. It is important to know that researchers must assume the direction of causation.

2519 May Crawford School Regression analysis – 4 Residuals Residuals Some observations are higher or lower than the predicted values on the regression line. Some observations are higher or lower than the predicted values on the regression line. The “residual” = the observed value – the predicted value. The “residual” = the observed value – the predicted value. Examining the residuals often helps us find some other factors affecting the dependent variable. (See Figure 8-8 on Shively p. 121, as an example.) Examining the residuals often helps us find some other factors affecting the dependent variable. (See Figure 8-8 on Shively p. 121, as an example.)

2619 May Crawford School An Example – 1 Lijphart, Arend Patterns of Democracy, Chapter 5 (Party Systems). Lijphart, Arend Patterns of Democracy, Chapter 5 (Party Systems). 36 democracies 36 democracies X = the effective number of political parties X = the effective number of political parties Y = the number of issue dimensions Y = the number of issue dimensions X is expected to have a positive impact on Y. X is expected to have a positive impact on Y. A regression equation: Y = a + b X. A regression equation: Y = a + b X. Estimate a and b using OLS. Estimate a and b using OLS.

2719 May Crawford School An Example – 2 Predicted equation: Y = X Prediction (e.g., US) X = 2.4 Y (observed) = 1 Y (predicted) = 1.71 Residual = − 0.71 Over-prediction for US US

2819 May Crawford School The “regression coefficient” measures how much difference the independent variable makes in the dependent variable. The “regression coefficient” measures how much difference the independent variable makes in the dependent variable. The “correlation coefficient” (or “r”) measures how widely data spread around a regression line. The “correlation coefficient” (or “r”) measures how widely data spread around a regression line. Correlation analysis – 1

2919 May Crawford School Correlation analysis – 2 A complete lack of relationship: r = 0 A complete lack of relationship: r = 0 A completely negative relationship: r = –1 A completely negative relationship: r = –1 A completely positive relationship r = 1 A completely positive relationship r = 1 Some positive relationship: 0 < r < 1 Some positive relationship: 0 < r < 1 Some negative relationship: –1 < r < 0 Some negative relationship: –1 < r < 0 An example (Lijphart): r = An example (Lijphart): r = 0.84.

3019 May Crawford School An Example – 1 X1X1 % of respondents who agree with the US military action in Afghanistan X2X2 % of respondents who agree that should take part with the US in military action against Afghanistan. X3X3 % of respondents who think American foreign policy has a positive effect on. X4X4 % of respondents who are worried that the war between US and its allies against terrorism may grow into a broader war against Islam.

3119 May Crawford School An Example – 2 X1X1 X2X2 X3X3 X4X4 X1X X2X X3X X4X4 – – Source: Gallup International, End of Year Terrorism Poll Number of countries included in the sample = 59.

3219 May Crawford School Remarks If you are interested in causal relationship between variables, regression analysis is superior to correlation analysis. If you are interested in causal relationship between variables, regression analysis is superior to correlation analysis. Correlation analysis is often done as a first-cut analysis prior to regression analysis. Correlation analysis is often done as a first-cut analysis prior to regression analysis. In regression analysis, you need to decide a direction of causation (i.e., impact of X on Y) and control the effects of other variables. In regression analysis, you need to decide a direction of causation (i.e., impact of X on Y) and control the effects of other variables.

3319 May Crawford School Next week Statistical inference Statistical inference Multivariate analysis Multivariate analysis More topics (if we have time) More topics (if we have time)