Practice & Communication of Science Measurement
What is Measurement? Assigning comparative labels to things to help explain their relationships… sounds a bit abstract… …but things/relationships is all there is… …and that’s all that science is about! so measurement is rather central to science Measurements are typically, but not exclusively, numerical not all types of measurement are equivalent four different levels of measurement nominal, ordinal, interval and ratio
Nominal Scales The ‘lowest’ level of measurement nominal implies ‘names’ Just labels to stick things into categories and separate them No implicit order eg yes/no shirt numbers of footballers 1 – goalie, 10 – striker (but not ‘10x better’!) can serve to separate and provide some info if refer to #10, it’s likely to be about a striker not a goalie blue, yellow, red, green etc no implicit order (though underlying spectrum has)
Ordinal Scales The measurements can be ‘ordered’ (ranked) the order of finishers in a race (first, second, etc) but time between each can vary dramatically equal gaps not implied the Likert scale (1 5) agree strongly, agree, neutral, disagree, disagree strongly again, the ‘gaps’ between each are not equal agree - neutral doesn’t ‘equal’ neutral - disagree
Interval Scales The ‘gaps’ (intervals) between units of measurement are equal the Centigrade scale the temp difference between 20 and 30 C is the same as between 10 and 20 C though they might not ‘feel’ that way! there is no absolute reference point 0 C is arbitrary water’s freezing point used to define the baseline Much more common in science
Ratio Scales An interval scale that has an absolute reference point the Kelvin temperature scale 0 K is -273.16 C the reference point is absolute absolute zero (0 K) is, well, absolute! for our everyday lives, time is a ratio scale zero time is absolute like interval scales, ratio scales common in science These measurement scales are important as they determine the types of data-handling/statistics that can be performed
Summaries of Data Scientists seldom take single measurements need repeated measurements to… minimise error permit extrapolation to the general case eg my eyes are blue 32 out of 100 subjects studied had blue eyes 32% of the general population have blue eyes Data is plural (datum is singular) Could just report all measurements… contains unadulterated ‘info’ about what you did but doesn’t carry a ‘message’ about the findings can’t see the wood for the trees
Summaries of Data Here is a set of ordered data… Mode Median Mean 17, 18, 18, 18, 19, 19, 20 21, 21 Mode the most common value(s) of a list of data (18) Median the central value in the ordered data (19) Mean sum of values/sample size (171 / 9 = 19) Range (or maybe Maximum and Minimum) highest minus lowest (21 – 17 = 4) starts to indicate variability, but biased by extremes
Indicating Variability This is an important aspect of measurement 17, 18, 18, 18, 19, 19, 20 21, 21 and 18, 19, 19, 19, 19, 19, 19, 19, 20 and 19, 19, 19, 19, 19, 19, 19, 19, 19 all have the same mean Need a way to summarise data both in terms of ‘central tendency’ and ‘spread’ mean and standard deviation median and quartiles Measures of variation covered in detail elsewhere
Summary Measurements are labels assigned to things to explain relationships Four levels… Nominal – names; no inherent order Ordinal – ordered; ‘gaps’ not equal Interval – ordered; ‘gaps’ are equal Ratio – ordered; equal gaps; absolute ref point Summaries of data needed to ease interpretation – eg mode, mean, median, range Need indicators of ‘spread’ as well as ‘centre’ eg range, max, min, standard deviation
Practice & Communication of Science Probability
What is Probability? We cannot know everything about everything we cannot measure everything our measurements are prone to error So uncertainty is a central feature of science uncertainty in observations/measurements uncertainty in explanations uncertainty in predictions Probability is a way of quantifying (un)certainty scale of 0 1 (or 0% 100%) Probability reflects random influences ‘randomness’ reflects our lack of knowledge
Randomness Predictable Rules Individual outcomes cannot be predicted, but repeated runs are very predictable eg individual coin-toss H or T ‘infinite’ repeats 50:50 H:T (if fair) Modelling a system in terms of probabilities can be done through observation or from theory Throwing dice theory Red vs Blue in sport observation (Hill & Barton)
Frequencies and Probabilities Red and Blue football teams in 140 matches, Red won 60 and drew 30 relative frequency = 60/140 probability of red winning (in the future) is also 60/140 = 0.429 For throwing a die relative frequency of getting a ‘3’ is 1/6 probability is also 1/6 = 0.167
Combining Probabilities For independent events, eg probability of throwing a five and then a two multiply the individual probabilities P(A and B) = P(A) x P(B) eg 1/6 x 1/6 = 1/36 = 0.028 For incompatible events, eg probability of throwing a five or a two add the individual probabilities P(A or B) = P(A) + P(B) eg 1/6 + 1/6 = 1/3 = 0.333
Probability and common sense In playing the lottery, which choice of numbers is more likely to win? 3, 5, 15, 27, 29, 44 1, 2, 3, 4, 5, 6 I won the lottery last week; the chances of me winning the lottery this week are… less, the same, greater? I won the lottery last week; the chances of me winning the lottery twice in a row are…
Probability and common sense What are the chances of two people on a football field sharing the same birthday? 1%, 11%, 21%, 31%, 41%, 51% For 23 people, prob of not sharing b’day with previously considered people is… person 1 : 365/365 person 2 : 364/365 … person 23 (includes ref!) : 343/365 Multiply them all together 0.493 1 – 0.493 = 0.507 = 51%
Probability and common sense Linda is thirty-one, single, outspoken and very bright She studied political science at Uni; she was concerned with discrimination and social justice, and took part in CND demonstrations Which of the following statements about Linda is more likely? Linda works as an estate agent Linda works as an estate agent and is active in the feminist movement
Probability is context-sensitive In tossing a coin ten times, which sequence is most probable? H T T H H T H T H T H H H H H H H H H H In coin-tossing, which sequence is more probable? A mix of heads and tails 10 heads in a row 1 in 1024
Probability is context-sensitive Derren Brown can toss a coin heads 10x in a row Incredible motor control over ‘random’ variables?
Probability can be counter-intuitive Flip a coin three times to get HH or HT Are the two outcomes equally probable? HHH HH first HHT HH first HTH TH first HTT TTT TTH TH first THT TH first THH TH first
Probability can be counter-intuitive Defendant’s DNA match was 1 in 1 billion Lab’s false positive error rate (not disclosed) is 1% What is the probability of the defendant being falsely convicted on that evidence? 1 in 1 billion 1 billion x 1% = 1 in 10 million 1 billion x 99% = 1 in 990,000,000 1 in 100.000000001
Probability can be counter-intuitive The Monty Hall conundrum You are on a game show You have a choice of three doors Behind one is a car, behind other two are goats You choose a door The host (who knows where the goats are) opens one to show you a goat Should you now change the door you have chosen?
Probability can be counter-intuitive A mother gives birth to twins (not identical) What is the chance they will be of different sexes? 25% 33% 50% One is a girl What is the chance that they will both be girls?
Conditional Probability Previous questions not conditional/conditional… Not con - A mum gives birth to twins (not identical). What is the chance that they will both be girls? Cond - A mum gives birth to twins (not identical). What is the chance that they will both be girls if one is a girl? The if clause makes all the difference It provides extra information that alters the odds The influence of additional info on odds was developed by Thomas Bayes (b 1702) Bayesian odds Prior prob (1 in 4) and posterior prob (1 in 3)
Three variations on a theme… A family has two children; what are the chances that both children are girls? 1 in 4, 1 in 3, 1 in 2? A family has two children; what are the chances that both children are girls if one is a girl? A family has two children; what are the chances that both children are girls, if one is a girl named Florida?
Three variations on a theme… 2 children; P that both children are girls? GB, BG, BB, GG 1 in 4 2 children; P both girls if one is a girl? 1 in 3 2 children; P both girls if 1 girl named Florida? BB,BGF,BGNF,GFB,GNFB,GFGNF,GNFGF,GNFGNF,GFGF 1 in 2
Conspiracy Theories and Probability Are these equivalent? 1) The P of a series of events happening if due to a huge conspiracy 2) The P of a huge conspiracy existing if a series of events happened P of 1) > P of 2) a ‘single’ explanation vs ‘many’ other explanations think 9/11, moon landings, paranoia, confabulation Bayes’ theory supports this non-correspondence
Conditional Probability and Testing A test for disease ‘X’ comes back positive And the false-positive rate is low at only 1 in 1000 Only a 0.1% chance of not having the disease?! But. Is… the chance of not having the disease if I tested positive …the same as… the chance of testing positive if I didn’t have the disease? No. Think of the ‘sample space’ ‘categories’ of people tested
The Test’s Sample Space 1) tested +ve and have ‘X’ (true positives) 2) tested +ve put don’t have ‘X’ (false positive) 3) tested –ve and don’t have ‘X’ (true negative) 4) tested –ve and have ‘X’ (false negatives) Reported false positive rate is 1 in 1000 Incidence rate: say 1 in 10,000 tested have the disease (and false neg effectively 0) incidence rate not usually mentioned/considered For 10,000 tested, there will be 10 false positives and only 1 true positive so 10/11 chance of not having the disease!
Summary Probability estimates the odds of future events based on theory or observation Probability cannot predict an individual event Probability can predict pattern of events Probability, P, 0 1 or 0% 100% Probability is often not ‘intuitive’, it fools us Combining probabilities Independent events: p(A and B) = p(A) x p(B) Incompatible events: p(A or B) = p(A) + p(B) Conditional probabilities (prob of A if B) Baysian probability Prior probability + extra info posterior probabilities prob of A if B often different to prob of B if A