Statistical Analysis A Quick Overview
The Scientific Method Establishing a hypothesis (idea) Collecting evidence (often in the form of numerical data) Analysing the evidence (using statistical and other techniques) Accepting / rejecting the hypothesis, coming to conclusions and providing explanation Evaluating the process and making recommendation for the future
Basic Stuff ‘X-bar’ is the mean ‘n’ is the number in the sample (sample size) ‘∑x’ is the “sum of”, in this case x (in other words add up all the individuals in the sample)
The Null Hypothesis The opposite of a hypothesis (in other words that there is no relationship between variables) Thus, rather than prove the hypothesis we disprove the null hypothesis If we cannot accept our null hypothesis then it is rejected and an alternative hypothesis can be accepted
Choosing a Method The statistical method chosen will largely depend on what type of test you need e.g. a test of association, a test of difference, probability testing, diversity, degree of clustering etc. Other factors such as sample size, data type (categorical vs. ordinal) and distribution are also important
Normal Distribution If a set of continuous variables is plotted against frequency the result is likely to be a belled shaped curve called the ‘normal’ curve The curve suggests that most individuals are aggregated around the average or mean (which in theory should be the middle of the curve)
Significance and Confidence Limits Significance concerns the reliability of the data and is expressed as a percentage value The 95% level of significance is usually appropriate for field data This means that only 5 times out of 100 would this data (results) occur by chance In significance tables the 95% level is indicated by 0.05 (i.e. 5% likelihood of the data occurring by chance) If the calculated value exceeds the theoretical (critical) value (from the table) then the value is significant Thus we can say that we are confident, at the 95% level, of the reliability of the data
So where do we go from here? 2. Take each hypotheses in turn There is a positive correlation between Place Utility and happiness (Subjective appreciation of life). There is a significant difference between the happiness (subjective appreciation of life) of residents in two contrasting areas of Bratislava. 3. Use the flow chart to determine the appropriate test 4. Scatter the graphs and crunch the numbers 5. Test for significance 6. Draw your conclusions about the relationships and associations
Normal Distribution Is our data normally distributed? To check we could plot a frequency graph. However, as we have a large sample size we can reasonably assume it will have a normal distribution.
Hypothesis 1 There is a positive correlation between Place Utility and happiness (Subjective appreciation of life). So what is correlation? It is an association between two data sets. What should we do first to test for correlation?
Plot a scatter graph
What statistical test would be appropriate to test correlation (association) Use the flows chart
Pearson Product-Moment Correlation Coeffcient A statistical technique to quantify the degree of association – correlation – between two sets of data
Step 1 State the null hypothesis – what will our null hypothesis be? There is no relationship between place utility and happiness.
Step 2 – The Table (in Excel)
Step 3 – The Formula (in Excel) r = ∑(dx.dy) / √(∑d 2 x. ∑d 2 y)
Step 4 - Test for Significance You need to consult significance tables to test for significance. You need to work out the degrees of freedom which is N-1 where N is the number of paired observations. To reject the null hypothesis we require 95% confidence – thus at significance level p = 0.05 the r value must exceed the critical value from the table.
Any problems with these significance tables and our data? The internet has the solution…..
Step 5 - Accept or reject the null hypothesis and summarize the results
Summarize the Results E.g. With a Pearsons Product moment value of 0.98 a strong positive correlation between place utility and happiness has been identified. The results were successfully tested for significance at the 95% confidence levels thus it is possible to reject the null hypothesis and accept the hypothesis “there is a positive correlation between Place Utility and happiness”. The scatter graph of the same data (figure 1) further supports these conclusions.
Hypothesis 2 There is a significant difference between the happiness (subjective appreciation of life) of residents in two contrasting areas of Bratislava. So we have happiness data for 2 samples (Petrzalka and Stupava) We need to establish whether the happiness results are significantly different between the 2 samples – which test?
The comparison of two means A statistical technique that enables you to discover whether there are in fact two distinct populations by calculating the standard error of the difference in means
Step 1 – Null Hypothesis What will the null hypothesis be this time? The is no significant difference between the happiness of residents in Petrzalka and Stupava
Step 2 – The Table (excel)
Step 3 – The Formula (Excel) SE of difference = √(SD 1 2 /n 1 ) + (SD 2 2 /n 2 )
Step 4 – accept or reject the null hypothesis (see spreadsheet) and summarize the results