Data Analysis.

Slides:



Advertisements
Similar presentations
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Advertisements

Chapter 17 Overview of Multivariate Analysis Methods
Data Analysis Statistics. Inferential statistics.
QUANTITATIVE DATA ANALYSIS
Descriptive Statistics Primer
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Data Analysis Statistics. Inferential statistics.
Analysis of Variance & Multivariate Analysis of Variance
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
Today Concepts underlying inferential statistics
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Relationships Among Variables
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Inferential Statistics
Understanding Research Results
Marketing Research Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides.
Business Research Methods William G. Zikmund Chapter 24 Multivariate Analysis.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
PPA 501 – A NALYTICAL M ETHODS IN A DMINISTRATION Lecture 3b – Fundamentals of Quantitative Research.
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
Chapter 15 Correlation and Regression
Chapter 15 Data Analysis: Testing for Significant Differences.
1 Multivariate Analysis (Source: W.G Zikmund, B.J Babin, J.C Carr and M. Griffin, Business Research Methods, 8th Edition, U.S, South-Western Cengage Learning,
Chapter 24 Multivariate Statistical Analysis © 2010 South-Western/Cengage Learning. All rights reserved. May not be scanned, copied or duplicated, or posted.
Business Research Methods William G. Zikmund Chapter 24 Multivariate Analysis.
User Study Evaluation Human-Computer Interaction.
Experimental Research Methods in Language Learning Chapter 11 Correlational Analysis.
Research & Statistics Looking for Conclusions. Statistics Mathematics is used to organize, summarize, and interpret mathematical data 2 types of statistics.
Observation & Analysis. Observation Field Research In the fields of social science, psychology and medicine, amongst others, observational study is an.
METHODS IN BEHAVIORAL RESEARCH NINTH EDITION PAUL C. COZBY Copyright © 2007 The McGraw-Hill Companies, Inc.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Examining Relationships in Quantitative Research
Correlation Analysis. Correlation Analysis: Introduction Management questions frequently revolve around the study of relationships between two or more.
C M Clarke-Hill1 Analysing Quantitative Data Forming the Hypothesis Inferential Methods - an overview Research Methods.
Chapter Twelve Copyright © 2006 John Wiley & Sons, Inc. Data Processing, Fundamental Data Analysis, and Statistical Testing of Differences.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Chapter 16 Data Analysis: Testing for Associations.
Academic Research Academic Research Dr Kishor Bhanushali M
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
Examining Relationships in Quantitative Research
Three Broad Purposes of Quantitative Research 1. Description 2. Theory Testing 3. Theory Generation.
Introduction to Basic Statistical Tools for Research OCED 5443 Interpreting Research in OCED Dr. Ausburn OCED 5443 Interpreting Research in OCED Dr. Ausburn.
Chapter Eight: Using Statistics to Answer Questions.
Data Analysis.
Chapter 6: Analyzing and Interpreting Quantitative Data
Correlation & Regression Analysis
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Copyright © 2011, 2005, 1998, 1993 by Mosby, Inc., an affiliate of Elsevier Inc. Chapter 19: Statistical Analysis for Experimental-Type Research.
HL Psychology Internal Assessment
Descriptive Statistics Research Writing Aiden Yeh, PhD.
Chapter 13 Understanding research results: statistical inference.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Chapter 15 Analyzing Quantitative Data. Levels of Measurement Nominal measurement Involves assigning numbers to classify characteristics into categories.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Appendix I A Refresher on some Statistical Terms and Tests.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Statistics & Evidence-Based Practice
Chapter 12 Understanding Research Results: Description and Correlation
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
APPROACHES TO QUANTITATIVE DATA ANALYSIS
12 Inferential Analysis.
LEARNING OUTCOMES After studying this chapter, you should be able to
12 Inferential Analysis.
Product moment correlation
15.1 The Role of Statistics in the Research Process
Chapter Nine: Using Statistics to Answer Questions
Presentation transcript:

Data Analysis

In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for analysis (Data Preparation) Describing the data (Descriptive Statistics) Testing Hypotheses and Models (Inferential Statistics)

Data Preparation involves checking or logging the data in; checking the data for accuracy; entering the data into the computer; transforming the data; and developing and documenting a database structure that integrates the various measures.

Descriptive Statistics Used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data. With descriptive statistics you are simply describing what is, what the data shows.

Inferential statistics investigate questions, models and hypotheses. In many cases, the conclusions from inferential statistics extend beyond the immediate data alone. For instance, we use inferential statistics to try to infer from the sample data what the population thinks. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study.

Types of Statistical Analysis Univariate Statistical Analysis Tests of hypotheses involving only one variable. Testing of statistical significance Bivariate Statistical Analysis Tests of hypotheses involving two variables. Multivariate Statistical Analysis Statistical analysis involving three or more variables or sets of variables. 6 6

Statistical Analysis: Key Terms Hypothesis Unproven proposition: a supposition that tentatively explains certain facts or phenomena. An assumption about nature of the world. Null Hypothesis No difference in sample and population. Alternative Hypothesis Statement that indicates the opposite of the null hypothesis. 7 7

Statistical Analysis: Key Terms Hypothesis Unproven proposition: a supposition that tentatively explains certain facts or phenomena. An assumption about nature of the world. Null Hypothesis No difference in sample and population. Alternative Hypothesis Statement that indicates the opposite of the null hypothesis. 8 8

Choosing the Appropriate Statistical Technique Choosing the correct statistical technique requires considering: Type of question to be answered Number of variables involved Level of scale measurement 9 9

Univariate analysis Univariate analysis involves the examination across cases of one variable at a time. There are three major characteristics of a single variable that we tend to look at: the distribution the central tendency the dispersion In most situations, we would describe all three of these characteristics for each of the variables in our study.

The Distribution The distribution is a summary of the frequency of individual values or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of persons who had each value.

Distributions may also be displayed using percentages Distributions may also be displayed using percentages. For example, you could use percentages to describe the: percentage of people in different income levels percentage of people in different age ranges percentage of people in different ranges of standardized test scores

Central Tendency The central tendency of a distribution is an estimate of the "center" of a distribution of values. There are three major types of estimates of central tendency: Mean Median Mode

The sum of these 8 values is 167, so the mean is 167/8 = 20.875. 15, 20, 21, 20, 36, 15, 25, 15 The sum of these 8 values is 167, so the mean is 167/8 = 20.875. If we order the 8 scores shown above, we would get: 15,15,15,20,20,21,25,36 There are 8 scores and score #4 and #5 represent the halfway point. Since both of these scores are 20, the median is 20. If the two middle scores had different values, you would have to interpolate to determine the median.

To determine the mode, you might again order the scores as shown above, and then count each one. The most frequently occurring value is the mode. In our example, the value 15 occurs three times and is the mode

Dispersion Dispersion refers to the spread of the values around the central tendency. There are two common measures of dispersion, the range and the standard deviation. The range is simply the highest value minus the lowest value. In our example distribution, the high value is 36 and the low is 15, so the range is 36 - 15 = 21.

The Standard Deviation is a more accurate and detailed estimate of dispersion because an outlier can greatly exaggerate the range (as was true in this example where the single outlier value of 36 stands apart from the rest of the values. The Standard Deviation shows the relation that set of scores has to the mean of the sample.

15 - 20.875 = -5.875 20 - 20.875 = -0.875 21 - 20.875 = +0.125 20 - 20.875 = -0.875 36 - 20.875 = 15.125 15 - 20.875 = -5.875 25 - 20.875 = +4.125 15 - 20.875 = -5.875 N 8 Mean 20.8750 Median 20.0000 Mode 15.00 Std. Deviation 7.0799 Variance 50.1250 Range 21.00

Bivariate analysis The correlation is one of the most common and most useful statistics. A correlation is a single number that describes the degree of relationship between two variables. Let's assume that we want to look at the relationship between two variables, height (in inches) and self esteem.

Person Height Self Esteem 1 68 4.1 2 71 4.6 3 62 3.8 4 75 4.4 5 58 3.2 6 60 3.1 7 67 8 9 4.3 10 69 3.7 11 3.5 12 13 63 14 3.3 15 3.4 16 17 65 18 19 20 61 3.6 Height is measured in inches. Self esteem is measured based on the average of 10 1-to-5 rating items (where higher scores mean higher self esteem)

Variable Mean StDev Variance Sum Minimum Maximum Range Height 65.4 4.4057 19.4105 1308 58 75 17 Self Esteem 3.755 0.4261 0.18155 75.1 3.1 4.6 1.5

Calculating the Correlation So, the correlation for our twenty cases is .73, which is a fairly strong positive relationship

Testing the Significance of a Correlation Once you've computed a correlation, you can determine the probability that the observed correlation occurred by chance. That is, you can conduct a significance test. Most often you are interested in determining the probability that the correlation is a real one and not a chance occurrence. In this case, you are testing the mutually exclusive hypotheses: Null Hypothesis: r = 0 Alternative Hypothesis: r <> 0

With these three pieces of information you need to first determine the significance level. Here, use the common significance level of alpha = .05 The df is simply equal to N-2 or, in this example, is 20-2 = 18. Finally, decide whether you are doing a one-tailed or two-tailed test. In this example, since there is no strong prior theory to suggest whether the relationship between height and self esteem would be positive or negative, we opt for the two-tailed test With these three pieces of information -- the significance level (alpha = .05)), degrees of freedom (df = 18), and type of test (two-tailed)

The null hypothesis is rejected and the alternative is accepted the critical value is .4438. This means that if our correlation is greater than .4438 or less than -.4438 (remember, this is a two-tailed test), we can conclude that the odds are less than 5 out of 100 that this is a chance occurrence. Since the correlation of .73 (higher), we conclude that it is not a chance finding and that the correlation is "statistically significant". The null hypothesis is rejected and the alternative is accepted

Pearson Product-Moment Correlation Matrix for Salesperson 26 26

Other Correlations The specific type of correlation illustrated here is known as the Pearson Product Moment Correlation. It is appropriate when both variables are measured at an interval level. However there are a wide variety of other types of correlations for other circumstances. for instance, if you have two ordinal variables, you could use the Spearman rank Order Correlation (rho) or the Kendall rank order Correlation (tau). When one measure is a continuous interval level one and the other is dichotomous (i.e., two-category) you can use the Point-Biserial Correlation. For other situations, consulting the web-based statistics selection program, Selecting Statistics at http://trochim.human.cornell.edu/selstat/ssstart.htm.

Regression Analysis Simple (Bivariate) Linear Regression A measure of linear association that investigates straight-line relationships between a continuous dependent variable and an independent variable that is usually continuous, but can be a categorical dummy variable. The Regression Equation (Y = α + βX ) Y = the continuous dependent variable X = the independent variable α = the Y intercept (regression line intercepts Y axis) β = the slope of the coefficient (rise over run) 28 28

The Regression Equation Parameter Estimate Choices β is indicative of the strength and direction of the relationship between the independent and dependent variable. α (Y intercept) is a fixed point that is considered a constant (how much Y can exist without X) Standardized Regression Coefficient (β) Estimated coefficient of the strength of relationship between the independent and dependent variables. Expressed on a standardized scale where higher absolute values indicate stronger relationships (range is from -1 to 1). 29 29

Simple Regression Results Example 30 30

What is Multivariate Data Analysis? Research that involves three or more variables, or that is concerned with underlying dimensions among multiple variables, will involve multivariate statistical analysis. Methods analyze multiple variables or even multiple sets of variables simultaneously. Business or economic problems involve multivariate data analysis: most employee motivation research customer psychographic profiles research that seeks to identify viable market segments 31 31

Which Multivariate Approach Is Appropriate? 32 32

Classifying Multivariate Techniques Dependence Techniques Explain or predict one or more dependent variables. Needed when hypotheses involve distinction between independent and dependent variables. Types: Multiple regression analysis Multiple discriminant analysis Multivariate analysis of variance 33 33

Classifying Multivariate Techniques (cont’d) Interdependence Techniques Give meaning to a set of variables or seek to group things together. Used when researchers examine questions that do not distinguish between independent and dependent variables. Types: Factor analysis Cluster analysis Multidimensional scaling 34 34

Classifying Multivariate Techniques (cont’d) Influence of Measurement Scales The nature of the measurement scales will determine which multivariate technique is appropriate for the data. Selection of a multivariate technique requires consideration of the types of measures used for both independent and dependent sets of variables. Nominal and ordinal scales are nonmetric. Interval and ratio scales are metric. 35 35

Which Multivariate Dependence Technique Should I Use? 36 36

Which Multivariate Interdependence Technique Should I Use? 37 37

Interpreting Multiple Regression Multiple Regression Analysis An analysis of association in which the effects of two or more independent variables on a single, interval-scaled dependent variable are investigated simultaneously. Dummy variable The way a dichotomous (two group) independent variable is represented in regression analysis by assigning a 0 to one group and a 1 to the other. 38 38

Multiple Regression Analysis A Simple Example Assume that a toy manufacturer wishes to explain store sales (dependent variable) using a sample of stores from Canada and Europe. Several hypotheses are offered: H1: Competitor’s sales are related negatively to sales. H2: Sales are higher in communities with a sales office than when no sales office is present. H3: Grammar school enrollment in a community is related positively to sales. 39 39

Multiple Regression Analysis (cont’d) Regression Coefficients in Multiple Regression Partial correlation The correlation between two variables after taking into account the fact that they are correlated with other variables too. R2 in Multiple Regression The coefficient of multiple determination in multiple regression indicates the percentage of variation in Y explained by all independent variables. 40 40

Interpreting Multiple Regression Results 41 41

ANOVA (n-way) and MANOVA Multivariate Analysis of Variance (MANOVA) A multivariate technique that predicts multiple continuous dependent variables with multiple categorical independent variables. 42 42

ANOVA (n-way) and MANOVA (cont’d) Interpreting N-way (Univariate) ANOVA Examine overall model F-test result. If significant, proceed. Examine individual F-tests for individual variables. For each significant categorical independent variable, interpret the effect by examining the group means. For each significant, continuous covariate, interpret the parameter estimate (b). For each significant interaction, interpret the means for each combination. 43 43

Discriminant Analysis A statistical technique for predicting the probability that an object will belong in one of two or more mutually exclusive categories (dependent variable), based on several independent variables. To calculate discriminant scores, the linear function used is: 44 44

Factor Analysis A type of analysis used to discern the underlying dimensions or regularity in phenomena. Its general purpose is to summarize the information contained in a large number of variables into a smaller number of factors. 45 45

Multidimensional Scaling Measures objects in multidimensional space on the basis of respondents’ judgments of the similarity of objects. 46 46