Analysing your evidence
What sort of evidence/data will you have? You need to plan how you will analyse the data before you collect it If not you may Produce data that you cannot easily analyse Not make the most of the evidence you have at your disposal If you are going to use statistical methods, do it properly
Statistical methods Are used to Describe a set of data in an efficient and meaningful manner Make decisions about a larger population of potential observations of which the data are a sample Test hypotheses
Descriptive statistics Describe data and events refer to: Frequency distributions Central tendencies or averages Variability of the data or dispersion by examining the range or standard deviation of scores Graphical representations Useful to convey information. It is often good to look at graphic representation prior to further analysis so you can see patterns of data. Bar charts, Histograms, Pie charts etc. Different formats can make the data look more or less significant Can help you to tell the story
Inferential statistics Inferential statistics are concerned with making inferences about populations and hypotheses Inferential statistics are values which are calculated from a sample, and used to estimate the same values for a population Types Mean and Standard Deviation Chi-Square Correlation T-Tests Analysis of Variance (ANOVA)
Variables Any property that may vary, i.e. that may take different values Qualitative variables - variables which differ only in kind Gender (male, female) Nationality (English, French) Occupation (Nurse, teacher) etc.
Variables Quantitative variables - variables which differ only in amount Height (1.62 metres, 3 inches) Time (2.58 seconds, 5 hours) IQ (98,124) Continuity versus discreteness Continuous scale e.g. length Only a finite number of values (discrete) (e.g. dress sizes, test scores, degree classifications)
Types of numerical data Two main kinds Frequencies Count the number of events occurring in particular categories e.g. 12 right handed people in the room category = right handed people in the room frequency = 12 Measurements (metric data) Results of giving scores to individual people, objects or events on the basis on an underlying scale of measurement Scale of measurement that already exists or one that you design/apply
Levels of measurement Nominal level Ordinal level Interval level Ratio level
Levels of measurement Nominal level Use of numbers or letters to classify events differing only in kind BBC radio stations Ordinal level Use of numbers or letters to indicate an ordered relationship between events Finishing positions in a race Grades awarded to essays, degree classifications
Levels of measurement Interval level Indicates not only the relative position of events but also the size of the differences between events There is a constant unit of measurement which means that the arithmetic difference between 2 scores accurately represents the size of the actual difference measured E.g. temperature
Levels of measurement Ratio level Is simply interval measurement with an absolute zero (i.e. a score of 0 really indicates the total absence of the property being measured) A score of 60 represents twice as much of a property as does a score of 30 E.g. length, mass, time and volume
Measurement data: examining relationships When observing continuous variables e.g. age or tenure, a Correlation can be used to make inferences about relationship between the variables Correlations estimate the extent to which changes in one variable are related to or associated with changes in another variable.
Measurement data: examining relationships A correlation will examine the degree to which two or more variables are related. A correlation co-efficient will be calculated – ranging from indicates a positive relationship To indicating a negative relationship Scattergrams or plots are used to pictorially identify whether there is likely to be any form of relationship, prior to statistical testing
Examining group differences Descriptive or explanatory research may involve trying to determine whether two groups differ according to a specific quality. This may involve examining central tendency of results or scores on one group, and how this compares to another T-Test :used to examine the values/scores of two groups ANOVA :used to examine the values/scores of more than two groups These tests are used to determine whether groups have different mean values or scores These tests carry presumptions about the type of data e.g. based on normal distribution and equal variance in scores between the groups
Symmetry Frequency distributions are not always symmetrical about the middle of the distribution Many of the group difference tests rely on data having a normal distribution Skew – when you get bunching of scores at one end of the distribution
Averages: Mode Mode Most frequent value when data is grouped into class intervals Estimated by taking the midpoint of the interval that has the greatest frequency Easy to calculate It may be used for data at any level of measurement It is the only average that can be reported when data consists of frequencies in categories In such cases the mode is the category having the highest frequency
Averages: Median Median Midway point in a series of scores (i.e. 50 th percentile point) To calculate Sort the scores in order of increasing value If there is an odd number of scores Median = middle point If there is an even number of scores Median = halfway point between the two middle values
Averages: Median Advantages of the Median It is the most appropriate average when data is measured at the ordinal level (because the median is based on rank order position) It is unaffected by extreme values, therefore With skewed distributions, the median usually describes the most “typical” value much better than does the mean (which is greatly affected by extreme scores)
Averages: Mean Ordinary average that most people use Calculate by adding up all the scores and dividing by total number of scores Advantages More stable from sample to sample Uses more information than median or mode Disadvantages Affected by extreme scores and not the best average to report when the data is very skewed or truncated. Strictly speaking it requires data measured at the interval or ratio level
Averages: summary of differences With a symmetrical, normal distribution, the mode, median and mean all coincide exactly With skewed distributions the mean is pulled towards the pointed end with respect to the mode and median In such cases, the different averages can give very different impressions of the data. The mean, in particular, can be very misleading if it is reported as reflecting a “typical” score. It is often informative to report more than one average
Averages: summary of differences The mode indicates the most common score The median indicates the score that is exactly in the middle of the distribution The mean indicates the “centre of gravity” of the distribution If in a particular set of date the median is very different from the mean, this will generally indicate that the distribution is skewed or truncated.
Validity “The extent to which a test, questionnaire or other method or operation is really measuring what the researcher intends to measure”. Internal validity whether procedures are standardised or controlled External validity generalisability, whether the findings can be applied to the wider population
Triangulation Helps with validity because Findings are judged valid when different and contrasting methods of data collection yield identical findings on the same participants and setting
Reliability Refers to the consistency of the findings Concerned with whether the results can be replicated. In research we need to examine consistency over time – involves administering a measure more than once Internal consistency is usually concerned with the internal coherence of a scale or measure i.e. whether different components link together perhaps to produce an overall score.
Qualitative data How are we going to analyse more qualitative, often “free text” information This type of information is often rich, adding important social information Need to plan to identify and draw out themes, strands
Which parts of the analysis goes in the dissertation, where? The appendix should be used for: Examples of questionnaires (blank) Examples of completed questionnaires that illustrate particular themes/strands (not all) The body of the dissertation should have in it the stages and conclusions of the analysis e.g. scattergrams which first identified trends, followed by the graphic representations of the results
Any questions? Preparation for presentation session