E 45 Johan Brink, IIE 24 November Quantitative data analysis Lecture
Agenda Chapter 14 Univariate analysis Bivariate analysis Multivariate analysis Contingency Pearson’s correlation t-test Chi square Factor & cluster analysis
Univariate analysis One variable at a time Frequency tables – Bar charts Grouping of ration & interval variables: 20-29, 30-39… - Histograms Arithmetic mean = Sum of all values/ # Values =33.6 Median = Mid point of distribution of values Mode =the most frequent value in the distribution Interval/ Ratio – Scale & distance Ordinal - ranked Nominal – can’t be ranked MeanYesNo -used anyhow No MedianYes No ModeYes
Dispersion Range: Min to Max (7 & 3) Variance =standard deviation² s²= Σ (x-M)²/ (n-1) = 1,5 ->Standard deviation =1,22 Point(x-M)² Sum4512 Variance = 12/(9-1) =1,5
Spurious correlation Relationship between two variables are caused by a thirds, underlying factor Intervening variable Chain of relationships Moderating variable The relationship between A & B only exist if C is percent Multivariate analysis AB C ABC AB C
Analyzing data Correlation & relationship Between variables (questions, groups, items) Does the answers on question 1 correlate with answers on question 2? - Different questions/items for the same construct or does it capture a relationship Test – significant differences Between variables -Questions, groups items -Across time/treatments Is the mean different enough given the standard deviations? t-test ( Chi-squared (nominal scales) Differences from the expected value?
Constructs & Items Variable 1 Variable 2 Variable 3 Variable 4 Variable 5 Variable 6 Cronbach α is a measure of how well variables measures the same underlying phenomena Construct 1Construct 2 Item Question 1 Item Question 2 Item Question 3 Item Question 4 Item Question 5 Item Question 6 Org. CulturePerformance
Hypothesis testing H0 There is no difference between group A and group B H1 There is a difference between group A and B H0 There is no connection between variable X and Y H1 There is a connection between variable X and Y Real relationship (unknown) H0 trueH0 false Result of statistical test H0 rejectedType 1 error Correct H0 not rejected CorrectType 2 error In order to reduce the risk of type 1 error, by increasing the level of significance from 5% to 1%,the risk of committing type 2 error increases!
Bivariate analysis NominalOrdinalInterval/rationDichotomus Nominal – Can’t be ranked Contingency table, Chi square, Cramer’s V Ordinal -RankedContingency table, Chi square, Cramer’s V Spearman’s rho Interval/ratio – scale & distance Contingency table, Chi square, Cramer’s V Spearman’s rhoPearson’s rSpearman’s rho Dichotomus – Yes/No Contingency table, Chi square, Cramer’s V Spearman’s rho Phi
Contingency tables Reasons MaleFemale #%#% Relaxation37613 Fitness Lose weight Build strength Total
Pearson's correlation For interval/ratio variables Measure of the strength of association between two variables r = Between -1 and +1, 0= no correlation & 1= perfect correlation r²*100% = Variation caused /explained XY R=0,969
Pearson's correlation
Pearson's correlation
Pearson's correlation
t-test: A statistical test to see if there is a difference between two samples n1=25 n2=24 Mean1=64Mean2=56 S1=10S2=8 Df=n1+n2-2=47 t= (Mean1-Mean2)/√[((n1-1)s1²+ (n2- 1)s2²)/(n1+n2-2))*(1/n1+1/n2))] t= (641-56)/√[((25-1)10²+ (24-1)8²)/( ))*(1/25+1/24))] =3,08 Statistic table t(df=47, 0,05)=2,012 3,08>2,012 thus reject H0, there is a difference! Hypothesis: H0, the true means is equal Alternative, H1, there is a difference
Chi square Is there a difference between age groups (young, middle and old) and preference for A or B? Nominal scales! 40 respondents Chi²= Σ (observed-expected)²/expected
Chi square YoungMiddle age OldΣ A B55515 Σ
Chi square YoungMiddle age OldΣ A20 (15,625)5 (6,25)0 (3,125)25 B5 (9,375)5(3,75)5 (1,875)15 Σ
Chi square Chi²= (20-15,625)²/15,625+ (5-6,25)²/6,25+ (0-3,125)²/3,125+ (5-9,375)²/9,375+ (5-3,75)²/3,75+ (5-1,875)²/1,875=12,27 Df= (r-1)*(c-1)= 2-1*3-1=2 Chi²0,05, df=2 =>5,99 12,27> 5,99 Thus reject H0, there is a difference between A and B and age group.
Factor analysis and Cluster analysis Reduce the data Variables which measures the same thing Underlying factors