Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Analysis.

Similar presentations


Presentation on theme: "Data Analysis."— Presentation transcript:

1 Data Analysis

2 In most social research the data analysis involves three major steps, done in roughly this order:
Cleaning and organizing the data for analysis (Data Preparation) Describing the data (Descriptive Statistics) Testing Hypotheses and Models (Inferential Statistics)

3 Data Preparation involves checking or logging the data in; checking the data for accuracy; entering the data into the computer; transforming the data; and developing and documenting a database structure that integrates the various measures.

4 Descriptive Statistics
Used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data. With descriptive statistics you are simply describing what is, what the data shows.

5 Inferential statistics
investigate questions, models and hypotheses. In many cases, the conclusions from inferential statistics extend beyond the immediate data alone. For instance, we use inferential statistics to try to infer from the sample data what the population thinks. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study.

6 Types of Statistical Analysis
Univariate Statistical Analysis Tests of hypotheses involving only one variable. Testing of statistical significance Bivariate Statistical Analysis Tests of hypotheses involving two variables. Multivariate Statistical Analysis Statistical analysis involving three or more variables or sets of variables. 6 6

7 Statistical Analysis: Key Terms
Hypothesis Unproven proposition: a supposition that tentatively explains certain facts or phenomena. An assumption about nature of the world. Null Hypothesis No difference in sample and population. Alternative Hypothesis Statement that indicates the opposite of the null hypothesis. 7 7

8 Statistical Analysis: Key Terms
Hypothesis Unproven proposition: a supposition that tentatively explains certain facts or phenomena. An assumption about nature of the world. Null Hypothesis No difference in sample and population. Alternative Hypothesis Statement that indicates the opposite of the null hypothesis. 8 8

9 Choosing the Appropriate Statistical Technique
Choosing the correct statistical technique requires considering: Type of question to be answered Number of variables involved Level of scale measurement 9 9

10 Univariate analysis Univariate analysis involves the examination across cases of one variable at a time. There are three major characteristics of a single variable that we tend to look at: the distribution the central tendency the dispersion In most situations, we would describe all three of these characteristics for each of the variables in our study.

11 The Distribution The distribution is a summary of the frequency of individual values or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of persons who had each value.

12 Distributions may also be displayed using percentages
Distributions may also be displayed using percentages. For example, you could use percentages to describe the: percentage of people in different income levels percentage of people in different age ranges percentage of people in different ranges of standardized test scores

13 Central Tendency The central tendency of a distribution is an estimate of the "center" of a distribution of values. There are three major types of estimates of central tendency: Mean Median Mode

14 The sum of these 8 values is 167, so the mean is 167/8 = 20.875.
15, 20, 21, 20, 36, 15, 25, 15 The sum of these 8 values is 167, so the mean is 167/8 = If we order the 8 scores shown above, we would get: 15,15,15,20,20,21,25,36 There are 8 scores and score #4 and #5 represent the halfway point. Since both of these scores are 20, the median is 20. If the two middle scores had different values, you would have to interpolate to determine the median.

15 To determine the mode, you might again order the scores as shown above, and then count each one. The most frequently occurring value is the mode. In our example, the value 15 occurs three times and is the mode

16 Dispersion Dispersion refers to the spread of the values around the central tendency. There are two common measures of dispersion, the range and the standard deviation. The range is simply the highest value minus the lowest value. In our example distribution, the high value is 36 and the low is 15, so the range is = 21.

17 The Standard Deviation
is a more accurate and detailed estimate of dispersion because an outlier can greatly exaggerate the range (as was true in this example where the single outlier value of 36 stands apart from the rest of the values. The Standard Deviation shows the relation that set of scores has to the mean of the sample.

18 = = = = = = = = N 8 Mean Median Mode 15.00 Std. Deviation 7.0799 Variance Range 21.00

19 Bivariate analysis The correlation is one of the most common and most useful statistics. A correlation is a single number that describes the degree of relationship between two variables. Let's assume that we want to look at the relationship between two variables, height (in inches) and self esteem.

20 Person Height Self Esteem 1 68 4.1 2 71 4.6 3 62 3.8 4 75 4.4 5 58 3.2 6 60 3.1 7 67 8 9 4.3 10 69 3.7 11 3.5 12 13 63 14 3.3 15 3.4 16 17 65 18 19 20 61 3.6 Height is measured in inches. Self esteem is measured based on the average of 10 1-to-5 rating items (where higher scores mean higher self esteem)

21 Variable Mean StDev Variance Sum Minimum Maximum Range Height 65.4 4.4057 1308 58 75 17 Self Esteem 3.755 0.4261 75.1 3.1 4.6 1.5

22 Calculating the Correlation
So, the correlation for our twenty cases is .73, which is a fairly strong positive relationship

23 Testing the Significance of a Correlation
Once you've computed a correlation, you can determine the probability that the observed correlation occurred by chance. That is, you can conduct a significance test. Most often you are interested in determining the probability that the correlation is a real one and not a chance occurrence. In this case, you are testing the mutually exclusive hypotheses: Null Hypothesis: r = 0 Alternative Hypothesis: r <> 0

24 With these three pieces of information
you need to first determine the significance level. Here, use the common significance level of alpha = .05 The df is simply equal to N-2 or, in this example, is 20-2 = 18. Finally, decide whether you are doing a one-tailed or two-tailed test. In this example, since there is no strong prior theory to suggest whether the relationship between height and self esteem would be positive or negative, we opt for the two-tailed test With these three pieces of information -- the significance level (alpha = .05)), degrees of freedom (df = 18), and type of test (two-tailed)

25 The null hypothesis is rejected and the alternative is accepted
the critical value is This means that if our correlation is greater than or less than (remember, this is a two-tailed test), we can conclude that the odds are less than 5 out of 100 that this is a chance occurrence. Since the correlation of .73 (higher), we conclude that it is not a chance finding and that the correlation is "statistically significant". The null hypothesis is rejected and the alternative is accepted

26 Pearson Product-Moment Correlation Matrix for Salesperson
26 26

27 Other Correlations The specific type of correlation illustrated here is known as the Pearson Product Moment Correlation. It is appropriate when both variables are measured at an interval level. However there are a wide variety of other types of correlations for other circumstances. for instance, if you have two ordinal variables, you could use the Spearman rank Order Correlation (rho) or the Kendall rank order Correlation (tau). When one measure is a continuous interval level one and the other is dichotomous (i.e., two-category) you can use the Point-Biserial Correlation. For other situations, consulting the web-based statistics selection program, Selecting Statistics at

28 Regression Analysis Simple (Bivariate) Linear Regression
A measure of linear association that investigates straight-line relationships between a continuous dependent variable and an independent variable that is usually continuous, but can be a categorical dummy variable. The Regression Equation (Y = α + βX ) Y = the continuous dependent variable X = the independent variable α = the Y intercept (regression line intercepts Y axis) β = the slope of the coefficient (rise over run) 28 28

29 The Regression Equation
Parameter Estimate Choices β is indicative of the strength and direction of the relationship between the independent and dependent variable. α (Y intercept) is a fixed point that is considered a constant (how much Y can exist without X) Standardized Regression Coefficient (β) Estimated coefficient of the strength of relationship between the independent and dependent variables. Expressed on a standardized scale where higher absolute values indicate stronger relationships (range is from -1 to 1). 29 29

30 Simple Regression Results Example
30 30

31 What is Multivariate Data Analysis?
Research that involves three or more variables, or that is concerned with underlying dimensions among multiple variables, will involve multivariate statistical analysis. Methods analyze multiple variables or even multiple sets of variables simultaneously. Business or economic problems involve multivariate data analysis: most employee motivation research customer psychographic profiles research that seeks to identify viable market segments 31 31

32 Which Multivariate Approach Is Appropriate?
32 32

33 Classifying Multivariate Techniques
Dependence Techniques Explain or predict one or more dependent variables. Needed when hypotheses involve distinction between independent and dependent variables. Types: Multiple regression analysis Multiple discriminant analysis Multivariate analysis of variance 33 33

34 Classifying Multivariate Techniques (cont’d)
Interdependence Techniques Give meaning to a set of variables or seek to group things together. Used when researchers examine questions that do not distinguish between independent and dependent variables. Types: Factor analysis Cluster analysis Multidimensional scaling 34 34

35 Classifying Multivariate Techniques (cont’d)
Influence of Measurement Scales The nature of the measurement scales will determine which multivariate technique is appropriate for the data. Selection of a multivariate technique requires consideration of the types of measures used for both independent and dependent sets of variables. Nominal and ordinal scales are nonmetric. Interval and ratio scales are metric. 35 35

36 Which Multivariate Dependence Technique Should I Use?
36 36

37 Which Multivariate Interdependence Technique Should I Use?
37 37

38 Interpreting Multiple Regression
Multiple Regression Analysis An analysis of association in which the effects of two or more independent variables on a single, interval-scaled dependent variable are investigated simultaneously. Dummy variable The way a dichotomous (two group) independent variable is represented in regression analysis by assigning a 0 to one group and a 1 to the other. 38 38

39 Multiple Regression Analysis
A Simple Example Assume that a toy manufacturer wishes to explain store sales (dependent variable) using a sample of stores from Canada and Europe. Several hypotheses are offered: H1: Competitor’s sales are related negatively to sales. H2: Sales are higher in communities with a sales office than when no sales office is present. H3: Grammar school enrollment in a community is related positively to sales. 39 39

40 Multiple Regression Analysis (cont’d)
Regression Coefficients in Multiple Regression Partial correlation The correlation between two variables after taking into account the fact that they are correlated with other variables too. R2 in Multiple Regression The coefficient of multiple determination in multiple regression indicates the percentage of variation in Y explained by all independent variables. 40 40

41 Interpreting Multiple Regression Results
41 41

42 ANOVA (n-way) and MANOVA
Multivariate Analysis of Variance (MANOVA) A multivariate technique that predicts multiple continuous dependent variables with multiple categorical independent variables. 42 42

43 ANOVA (n-way) and MANOVA (cont’d)
Interpreting N-way (Univariate) ANOVA Examine overall model F-test result. If significant, proceed. Examine individual F-tests for individual variables. For each significant categorical independent variable, interpret the effect by examining the group means. For each significant, continuous covariate, interpret the parameter estimate (b). For each significant interaction, interpret the means for each combination. 43 43

44 Discriminant Analysis
A statistical technique for predicting the probability that an object will belong in one of two or more mutually exclusive categories (dependent variable), based on several independent variables. To calculate discriminant scores, the linear function used is: 44 44

45 Factor Analysis A type of analysis used to discern the underlying dimensions or regularity in phenomena. Its general purpose is to summarize the information contained in a large number of variables into a smaller number of factors. 45 45

46 Multidimensional Scaling
Measures objects in multidimensional space on the basis of respondents’ judgments of the similarity of objects. 46 46


Download ppt "Data Analysis."

Similar presentations


Ads by Google