Introduction to Statistical Terms Dr Bryan Mills
Contents Some key statistical terms What makes useful output Sampling
Statistics – turn data into information Inferential statistics – using a sample to talk about the whole population Variables – things that can vary e.g. student grades, height, etc. Empirical data – data collected from observation or measurement
The Problem Measurements The basis of both models and statistics is being able to measure a variable numerically (quantitatively). Statistics Usually describe either a set of data or the strength of a relationship. Mathematical models Something along the lines of "this = that + something else * something other" These are often expressed as x = f(a,b,c) or income = f(age, social class, qualifications) - in other words x is a function of other variables
Types of Data (Discrete ) Nominal - differences e.g. voting preference, Towns, types of beach (sandy, rocky, etc.), discrete categories, occupations, named groups. Uses cross-tabulation (contingency tables) and Chi 2 as a means of display/analysis (Non-parametric). Ordinal - differences and magnitude - e.g. ratings in order, A, B, C grades, small- medium - large (Non-parametric). Use Mann-Whitney, Kruskal Wallis, Spearmans
Types of Data (Continuous) Interval - differences, magnitude and equal intervals, centimetres above and below an average height, IQ is the same to 110 as 115 is to 100, but 120 is not twice 60, Centigrade, there can be no 0, however, so height from 0 would be a ratio scale (Parametric). Ratio - differences, magnitude and equal intervals plus the ability to say this is twice that etc. MPH, size, Kelvin (Parametric).
Type of analysis Between groups - between different groups (e.g. independent group t-test) Within groups - repeated measures, before and after an experiment (e.g. related samples t-test)
Number of Variables Univariate - 1 variable Bivariate - 2 variables Multivariate
Meaningless Mean Mean grade = 56% but 7 students out of the 10 are below this.
A Reminder Qualitative Quantitative Sample SizeValidityReliability PositivistBoth, but mostly quantitative Represents a large population Often Low High PhenomenologyQualitativeSmall and rich in data HighOften Low
What Makes Good Output There are 2 main points to consider: Your audience The data
Sampling Statistics rely on having gathered enough data from a sample to be able to represent the population. A sample is a subset of the main population.
Stratification population stratification –Age –Gender –Ethnicity –Other known characteristics
Ideal Response Size Sample size = Ideal Response Size Estimated Response Rate (%)
Where: n = Number of usable questionnaires returned p = Proportion being estimated Z = Confidence coefficient (1.96 by convention) E = Error in proportion (<5% by convention)
Types of Sample (probability) Simple Random Sampling Stratified Random Sampling –proportional or quota –Divide into sub-groups and take random sample from each Cluster (Area) Random Sampling –Narrow down to area (e,.g. Districts)
Types of Sample (non-probability) Convenience Sampling Purposive Sampling –Modal Instance Sampling Target ‘typical’ –Expert Sampling (Delphi) –Quota Sampling (work to a quota) –Heterogeneity Sampling (diversity of views) –Snowball Sampling