Presentation is loading. Please wait.

Presentation is loading. Please wait.

PART 1: Models, metrics and the demystification of statistical significance FRSS!!!!

Similar presentations


Presentation on theme: "PART 1: Models, metrics and the demystification of statistical significance FRSS!!!!"— Presentation transcript:

1 PART 1: Models, metrics and the demystification of statistical significance David.Wastell@nottingham.ac.uk FRSS!!!!

2 Causal modelling (theory-based metrics) Key terminology: Independent variables (IV) = causal factors (+ve or –ve) Dependent variables (DV) = effects/outcomes Moderating variables (MV) = modify cause-effect relations IV DV MV - Treatment/intervention effect = impact of the IV

3 Group exercise 1 Alcohol-related violence and its reduction is a priority area of social policy Need to understand its “epidemiology”, i.e. those factors which influence its prevalence –In your groups, produce a causal model for AVC, identifying those socio-demographic factors you think are key….

4 Evidence-based Policy: crime control Preston street drinking ban Ambulance Incidents (monthly) Target zone County demand Before ban9.1516.2 After ban8.3541.1 Change-9%+5% % total serious violent crime committed within target zone (reduced 14.8% to 12.3%) All effects stat. sig. BUT … any validity concerns, alternative explanations?? Guess what…..

5 Statistical testing & the null hypotheses (H 0 ) A street drinking ban is being implemented in an effort to reduce alcohol- related violence (AVC) –Randomised control trial (RCT) used to evaluate 6 towns chosen, 3 randomly picked for the drinking ban –why randomisation? After three months, the levels of AVC reduced in the three “treatment” sites, but no change in the “controls” –has the intervention been effective? How strong is the evidence? H 0 = the intervention has not changed anything –How many we explain the results if H 0 is true? –What is the probability of getting the observed evidence on this assumption? Statistical inference: –If the likelihood of getting results as extreme as those obtained, assuming H 0 to be true, is less than some threshold value (typically 1 in 20), then reject H 0 and conclude that the effects are genuine, i.e. could not have occurred by chance This is the principle of STATISTICAL SIGNIFICANCE –NB. Not the same as substantive significance!! Can always get statistical significance by gathering more data…. Even though treatment effect is very small

6 PART 2: Exploratory data analysis The term EDA coined by John Tukey (1977) – he likened EDA to “detective work” In EDA, the role of the researcher is to explore the data in as many ways as possible until a plausible "story" of the data emerges. –A detective does not collect just any information. Instead he collects evidence and clues related to the central question of the case. Some tools of the trade, using Excel: –Histograms (stem-and-leaf diagrams) –Scatter-plots –Correlation coefficients

7 Exercise 1: The Humble Histogram 3) From Data Analysis on the Tools menu select Histogram 1) Open Workshop Spreadsheet 2) Select Violent Crime Incidents worksheet

8 Exercise 2: the not-so-scatty scatterplot! Select the “Ward Profile” worksheet –Inspect it carefully!!! –What relationships should we look at in terms of the causal modelling exercise? Draw a scatterplot relating crime incidence to the no. of liquor outlets: –Highlight “No pubs” and the crimes column –Click Chart on the Insert menu –Select XY (scatter) and press Next –Hey Presto!!

9 Exercise 3: correlation Correlation coefficients (r) measure the strength of the linear relationship between two variables –1=perfect correlation, 0 = no relation, –What would r = -1 mean? What is the correlation between the no. of pubs per ward and the rate of AVC? –Select Correlation from Data Analysis tools –Identify variables of interest –Click ok and correlation matrix appears in new sheet How can the model be improved? –Create new variable (pub density) and repeat correlational analysis –Is the correlation higher? NB: correlation does not mean causality!!! Highlight the appropriate columns Click here when columns identified

10 Exercise 4: More truffling…. Draw a histogram showing the distribution of crime: –Use automatic “bin widths” –Chose a more meaningful set –Is the distribution “normal”? Use basic Excel functions (sorting) to explore relationships between crime rates and contextual factors: –Sort table by crime rates, highest crime first –What stands out!!! –What percentage of crimes in top ten wards? If time investigate other possible correlations concerning crime, and also between other variables… –How strong is the link with deprivation? –Is this what you’d expect? –Would the relationship be stronger with other sorts of crime?

11 Any other business Data-mining example – Govmetric Statistics packages –Mostly rather expensive, e.g. SPSS Opensource: –R is brilliant, though hard work to learn –R Commander provides a GUI interface


Download ppt "PART 1: Models, metrics and the demystification of statistical significance FRSS!!!!"

Similar presentations


Ads by Google