Download presentation
Presentation is loading. Please wait.
Published byMeghan Fox Modified over 9 years ago
1
1 Mgt 540 Research Methods Data Analysis
2
2 Additional “sources” Compilation of sources: Compilation of sources: http://lrs.ed.uiuc.edu/tse- portal/datacollectionmethodologies/jin-tselink/tselink.htm http://lrs.ed.uiuc.edu/tse- portal/datacollectionmethodologies/jin-tselink/tselink.htm http://lrs.ed.uiuc.edu/tse- portal/datacollectionmethodologies/jin-tselink/tselink.htm http://lrs.ed.uiuc.edu/tse- portal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/Random/Order/Start.htm http://web.utk.edu/~dap/Random/Order/Start.htm http://web.utk.edu/~dap/Random/Order/Start.htm Data Analysis Brief Book (glossary) Data Analysis Brief Book (glossary) http://rkb.home.cern.ch/rkb/titleA.html http://rkb.home.cern.ch/rkb/titleA.html http://rkb.home.cern.ch/rkb/titleA.html Exploratory Data Analysis Exploratory Data Analysis http:// www.itl.nist.gov/div898/handbook/eda/eda.ht m http:// www.itl.nist.gov/div898/handbook/eda/eda.ht m http:// www.itl.nist.gov/div898/handbook/eda/eda.ht m http:// www.itl.nist.gov/div898/handbook/eda/eda.ht m Statistical Data Analysis Statistical Data Analysis http://obelia.jde.aca.mmu.ac.uk/resdesgn/arsham/opre330. htm http://obelia.jde.aca.mmu.ac.uk/resdesgn/arsham/opre330. htm http://obelia.jde.aca.mmu.ac.uk/resdesgn/arsham/opre330. htm http://obelia.jde.aca.mmu.ac.uk/resdesgn/arsham/opre330. htm
3
3 FIGURE 12.1 Copyright © 2003 John Wiley & Sons, Inc. Sekaran/RESEARCH 4E
4
4 Data Analysis Get the “feel” for the data Get the “feel” for the data Get Mean, variance' and standard deviation on each variable Get Mean, variance' and standard deviation on each variable See if for all items, responses range all over the scale, and not restricted to one end of the scale alone. See if for all items, responses range all over the scale, and not restricted to one end of the scale alone. Obtain Pearson Correlation among the variables under study. Obtain Pearson Correlation among the variables under study. Get Frequency Distribution for all the variables. Get Frequency Distribution for all the variables. Tabulate your data. Tabulate your data. Describe your sample's key characteristics (Demographic details of sex composition, education, age, length of service, etc. ) Describe your sample's key characteristics (Demographic details of sex composition, education, age, length of service, etc. ) See Histograms, Frequency Polygons, etc. See Histograms, Frequency Polygons, etc.
5
5 Quantitative Data Each type of data requires different analysis method(s): Each type of data requires different analysis method(s): Nominal Nominal Labeling Labeling No inherent “value” basis No inherent “value” basis Categorization purposes only Categorization purposes only Ordinal Ordinal Ranking, sequence Ranking, sequence Interval Interval Relationship basis (e.g. age) Relationship basis (e.g. age)
6
6 Central Tendency Central Tendency Mean, median mode Mean, median mode Spread Spread Variance, standard deviation, range Variance, standard deviation, range Distribution (Shape ) Distribution (Shape ) Skewness, kurtosis Skewness, kurtosis Descriptive Statistics Describing key features of data
7
7 Nominal Nominal Identification / categorization only Identification / categorization only Ordinal (Example on pg. 139) Ordinal (Example on pg. 139) Non-parametric statistics Non-parametric statistics Do not assume equal intervals Do not assume equal intervals Frequency counts Frequency counts Averages (median and mode) Averages (median and mode) Interval Interval Parametric Parametric Mean, Standard Deviation, variance Mean, Standard Deviation, variance
8
8 Testing “Goodness of Fit” Reliability Validity Internal Consistency Split Half Discriminant Convergent Factorial Involves Correlations and Factor Analysis
9
9 Testing Hypotheses Use appropriate statistical analysis Use appropriate statistical analysis T-test (single or twin-tailed) T-test (single or twin-tailed) Test the significance of differences of the mean of two groups Test the significance of differences of the mean of two groups ANOVA ANOVA Test the significance of differences among the means of more than two different groups, using the F test. Test the significance of differences among the means of more than two different groups, using the F test. Regression (simple or multiple) Regression (simple or multiple) Establish the variance explained in the DV by the variance in the IVs Establish the variance explained in the DV by the variance in the IVs
10
10 Statistical Power Claiming a significant difference Claiming a significant difference Errors in Methodology Errors in Methodology Type 1 error Type 1 error Reject the null hypothesis when you should not. Reject the null hypothesis when you should not. Called an “alpha” error Called an “alpha” error Type 2 error Type 2 error Fail to reject the null hypothesis when you should. Fail to reject the null hypothesis when you should. Called a “beta” error Called a “beta” error Statistical power refers to the ability to detect true differences Statistical power refers to the ability to detect true differences avoiding type 2 errors avoiding type 2 errors
11
11 Statistical Power see discussion at http://my.execpc.com/4A/B7/helberg/pitfalls/ http://my.execpc.com/4A/B7/helberg/pitfalls/ Depends on 4 issues Depends on 4 issues Sample size Sample size The effect size you want to detect The effect size you want to detect The alpha (type 1 error rate) you specify The alpha (type 1 error rate) you specify The variability of the sample The variability of the sample Too little power Too little power Overlook effect Overlook effect Too much power Too much power Any difference is significant Any difference is significant
12
12 Parametric vs. nonparametric Parametric (characteristics referring to specific population parameters) Parametric (characteristics referring to specific population parameters) Parametric assumptions Parametric assumptions Independent samples Independent samples Homogeneity of variance Homogeneity of variance Data normally distributed Data normally distributed Interval or better scale Interval or better scale Nonparametric assumptions Nonparametric assumptions Sometimes independence of samples Sometimes independence of samples
13
13 t-tests (Look at t tables; p. 435) Used to compare two means or one observed mean against a guess about a hypothesized mean Used to compare two means or one observed mean against a guess about a hypothesized mean For large samples t and z can be considered equivalent For large samples t and z can be considered equivalent Calculate t Calculate t = - μ S S Where S is the standard error of the mean, S/√n and df = n-1
14
14 t-tests Statistical programs will give you a choice between a matched pair and an independent t-test. Statistical programs will give you a choice between a matched pair and an independent t-test. Your sample and research design determine which you will use. Your sample and research design determine which you will use.
15
15 z-test for Proportions (Look at t tables; p. 435) When data are nominal When data are nominal Describe by counting occurrences of each value Describe by counting occurrences of each value From counts, calculate proportions From counts, calculate proportions Compare proportion of occurrence in sample to proportion of occurrence in population Compare proportion of occurrence in sample to proportion of occurrence in population Hypotheses testing allows only one of two outcomes: success or failure Hypotheses testing allows only one of two outcomes: success or failure
16
16 z-test for Proportions (Look at t tables; p. 435) H 0 : = k, where k is a value H 0 : = k, where k is a value between 0 and 1 between 0 and 1 H 1 : k H 1 : k z = p - = p - p √( (1- )/n) p √( (1- )/n) Equivalent to χ 2 for df = 1 Equivalent to χ 2 for df = 1 Comparing sample proportion to the population proportion
17
17 Chi-Square Test (sampling distribution) One Sample Measures sample variance Measures sample variance Squared deviations from the mean – based on normal distribution Squared deviations from the mean – based on normal distribution Nonparametric Nonparametric Compare expected with observed proportion Compare expected with observed proportion H 0 : Observed proportion = expected proportion H 0 : Observed proportion = expected proportion df = number of data points df = number of data points categories, cells (k) minus 1 categories, cells (k) minus 1 χ 2 = (O – E) 2 E
18
18 Univariate z Test Test a guess about a proportion against an observed sample; Test a guess about a proportion against an observed sample; eg., MBAs constitute 35% of the managerial population eg., MBAs constitute 35% of the managerial population H 0 : π =.35 H 0 : π =.35 H 1 : π .35 (two-tailed test suggested) H 1 : π .35 (two-tailed test suggested)
19
19 Univariate Tests Some univariate tests are different in that they are among statistical procedures where you, the researcher, set the null hypothesis. Some univariate tests are different in that they are among statistical procedures where you, the researcher, set the null hypothesis. In many other statistical tests the null hypothesis is implied by the test itself. In many other statistical tests the null hypothesis is implied by the test itself.
20
20 Contingency Tables Relationship between nominal variables http://www.psychstat.smsu.edu/introbook/sbk28m.htm http://www.psychstat.smsu.edu/introbook/sbk28m.htm http://www.psychstat.smsu.edu/introbook/sbk28m.htm Relationship between subjects' scores on two qualitative or categorical variables (Early childhood intervention) Relationship between subjects' scores on two qualitative or categorical variables (Early childhood intervention) If the columns are not contingent on the rows, then the rows and column frequencies are independent. The test of whether the columns are contingent on the rows is called the chi square test of independence. The null hypothesis is that there is no relationship between row and column frequencies. If the columns are not contingent on the rows, then the rows and column frequencies are independent. The test of whether the columns are contingent on the rows is called the chi square test of independence. The null hypothesis is that there is no relationship between row and column frequencies.
21
21 Correlations A statistical summary of the degree and direction of association between two variables A statistical summary of the degree and direction of association between two variables Correlation itself does not distinguish between independent and dependent variables Correlation itself does not distinguish between independent and dependent variables Most common – Pearson’s r Most common – Pearson’s r
22
22 Correlations You believe that a linear relationship exists between two variables You believe that a linear relationship exists between two variables The range is from –1 to +1 The range is from –1 to +1 R 2, the coefficient of determination, is the % of variance explained in each variable by the other R 2, the coefficient of determination, is the % of variance explained in each variable by the other
23
23 Correlations r = S xy /S x S y or the covariance between x and y divided by their standard deviations r = S xy /S x S y or the covariance between x and y divided by their standard deviations Calculations needed Calculations needed The means, x-bar and y-bar The means, x-bar and y-bar Deviations from the means, (x – x-bar) and (y – y-bar) for each case Deviations from the means, (x – x-bar) and (y – y-bar) for each case The squares of the deviations from the means for each case to insure positive distance measures when added, (x - x- bar) 2 and (y – y-bar) 2 The squares of the deviations from the means for each case to insure positive distance measures when added, (x - x- bar) 2 and (y – y-bar) 2 The cross product for each case (x – x- bar) times (y – y-bar) The cross product for each case (x – x- bar) times (y – y-bar)
24
24 Correlations The null hypothesis for correlations is The null hypothesis for correlations is H 0 : ρ = 0 and the alternative is usually H 1 : ρ ≠ 0 However, if you can justify it prior to analyzing the data you might also use H 1 : ρ > 0 or H 1 : ρ 0 or H 1 : ρ < 0, a one-tailed test
25
25 Correlations Alternative measures Alternative measures Spearman rank correlation, r ranks Spearman rank correlation, r ranks r ranks and r are nearly always equivalent measures for the same data (even when not the differences are trivial) r ranks and r are nearly always equivalent measures for the same data (even when not the differences are trivial) Phi coefficient, r Φ, when both variables are dichotomous; again, it is equivalent to Pearson’s r Phi coefficient, r Φ, when both variables are dichotomous; again, it is equivalent to Pearson’s r
26
26 Correlations Alternative measures Alternative measures Point-biserial, r pb when correlating a dichotomous with a continuous variable Point-biserial, r pb when correlating a dichotomous with a continuous variable If a scatterplot shows a curvilinear relationship there are two options: If a scatterplot shows a curvilinear relationship there are two options: A data transformation, or A data transformation, or Use the correlation ratio, η 2 (eta- squared) Use the correlation ratio, η 2 (eta- squared) 1 - SS within SS total
27
27 ANOVA For two groups only the t-test and ANOVA yield the same results For two groups only the t-test and ANOVA yield the same results You must do paired comparisons when working with three or more groups to know where the means lie You must do paired comparisons when working with three or more groups to know where the means lie
28
28 Multivariate Techniques Dependent variable Dependent variable Regression in its various forms Regression in its various forms Discriminant analysis Discriminant analysis MANOVA MANOVA Classificatory or data reduction Classificatory or data reduction Cluster analysis Cluster analysis Factor analysis Factor analysis Multidimensional scaling Multidimensional scaling
29
29 Linear Regression We would like to be able to predict y from x We would like to be able to predict y from x Simple linear regression with raw scores Simple linear regression with raw scores y = dependent variable y = dependent variable x = independent variable x = independent variable b = regression coefficient = r xy b = regression coefficient = r xy c = a constant term c = a constant term The general model is The general model is y = bx + c (+e) sxsx sysy
30
30 Linear Regression The statistic for assessing the overall fit of a regression model is the R 2, or the overall % of variance explained by the model The statistic for assessing the overall fit of a regression model is the R 2, or the overall % of variance explained by the model R 2 = 1 – R 2 = 1 –= = 1 – (s 2 e / s 2 y ), where s 2 e is the variance of the error or residual unpredictable variance total variance predictable variance total variance
31
31 Linear Regression Multiple regression: more than one predictor Multiple regression: more than one predictor y = b 1 x 1 + b 2 x 2 + c y = b 1 x 1 + b 2 x 2 + c Each regression coefficient b is assessed independently for its statistical significance; H 0 : b = 0 Each regression coefficient b is assessed independently for its statistical significance; H 0 : b = 0 So, in a statistical program’s output a statistically significant b rejects the notion that the variable associated with b contributes nothing to predicting y So, in a statistical program’s output a statistically significant b rejects the notion that the variable associated with b contributes nothing to predicting y
32
32 Linear Regression Multiple regression Multiple regression R 2 still tells us the amount of variation in y explained by all of the predictors (x) together R 2 still tells us the amount of variation in y explained by all of the predictors (x) together The F-statistic tells us whether the model as a whole is statistically significant The F-statistic tells us whether the model as a whole is statistically significant Several other types of regression models are available for data that do not meet the assumptions needed for least-squares models (such as logistic regression for dichotomous dependent variables) Several other types of regression models are available for data that do not meet the assumptions needed for least-squares models (such as logistic regression for dichotomous dependent variables)
33
33 Regression by SPSS & other Programs Methods for developing the model Methods for developing the model Stepwise: let’s computer try to fit all chosen variables, leaving out those not significant and re-examining variables in the model at each step Stepwise: let’s computer try to fit all chosen variables, leaving out those not significant and re-examining variables in the model at each step Enter: researcher specifies that all variables will be used in the model Enter: researcher specifies that all variables will be used in the model Forward, backward: begin with all (backward) or none (forward) of the variables and automatically adds or removes variables without reconsideration of variables already in the model Forward, backward: begin with all (backward) or none (forward) of the variables and automatically adds or removes variables without reconsideration of variables already in the model
34
34 Multicollinearity Best regression model has uncorrelated IVs Best regression model has uncorrelated IVs Model stability low with excessively correlated IVs Model stability low with excessively correlated IVs Collinearity diagnostics identify problems, suggesting variables to be dropped Collinearity diagnostics identify problems, suggesting variables to be dropped High tolerance, low variance inflation factor are desirable High tolerance, low variance inflation factor are desirable
35
35 Discriminant Analysis Regression requires DV to be interval or ratio Regression requires DV to be interval or ratio If DV categorical (nominal) can use discriminant analysis If DV categorical (nominal) can use discriminant analysis IVs should be interval or ratio scaled IVs should be interval or ratio scaled Key result is number of cases classified correctly Key result is number of cases classified correctly
36
36 MANOVA Compare means on two or more DVs Compare means on two or more DVs (ANOVA limited to one DV) (ANOVA limited to one DV) Pure MANOVA via SPSS only from command syntax Pure MANOVA via SPSS only from command syntax Can use the general linear model though Can use the general linear model though
37
37 Factor Analysis A data reduction technique – a large set of variables can be reduced to a smaller set while retaining the information from the original data set A data reduction technique – a large set of variables can be reduced to a smaller set while retaining the information from the original data set Data must be on an interval or ratio scale Data must be on an interval or ratio scale E.g., a variable called socioeconomic status might be constructed from variables such as household income, educational attainment of the head of household, and average per capita income of the census block in which the person resides E.g., a variable called socioeconomic status might be constructed from variables such as household income, educational attainment of the head of household, and average per capita income of the census block in which the person resides
38
38 Cluster Analysis Cluster analysis seeks to group cases rather than variables; it too is a data reduction technique Cluster analysis seeks to group cases rather than variables; it too is a data reduction technique Data must be on an interval or ratio scale Data must be on an interval or ratio scale E.g., a marketing group might want to classify people into psychographic profiles regarding their tendencies to try or adopt new products – pioneers or early adopters, early majority, late majority, laggards E.g., a marketing group might want to classify people into psychographic profiles regarding their tendencies to try or adopt new products – pioneers or early adopters, early majority, late majority, laggards
39
39 Factor vs. Cluster Analysis Factor analysis focuses on creating linear composites of variables Factor analysis focuses on creating linear composites of variables Number of variables with which we must work is then reduced Number of variables with which we must work is then reduced Technique begins with a correlation matrix to seed the process Technique begins with a correlation matrix to seed the process Cluster analysis focuses on cases Cluster analysis focuses on cases
40
40 Potential Biases Asking the inappropriate or wrong research questions. Asking the inappropriate or wrong research questions. Insufficient literature survey and hence inadequate theoretical model. Insufficient literature survey and hence inadequate theoretical model. Measurement problems Measurement problems Samples not being representative. Samples not being representative. Problems with data collection: Problems with data collection: researcher biases researcher biases respondent biases respondent biases instrument biases instrument biases Data analysis biases: Data analysis biases: coding errors coding errors data punching & input errors data punching & input errors inappropriate statistical analysis inappropriate statistical analysis Biases (subjectivity) in interpretation of results. Biases (subjectivity) in interpretation of results.
41
41 FIGURE 11.2 Copyright © 2003 John Wiley & Sons, Inc. Sekaran/RESEARCH 4E Questions to ask: Adopted from Robert Niles Where did the data come from? Where did the data come from? How (Who) was the data reviewed, verified, or substantiated? How (Who) was the data reviewed, verified, or substantiated? How were the data collected? How were the data collected? How is the data presented? How is the data presented? What is the context? What is the context? Cherry-picking? Cherry-picking? Be skeptical when dealing with comparisons Be skeptical when dealing with comparisons Spurious correlations Spurious correlations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.