why smart data is better than big data Queen Mary University of London

Slides:



Advertisements
Similar presentations
EXPLORIS Montserrat Volcano Observatory Aspinall and Associates Risk Management Solutions An Evidence Science approach to volcano hazard forecasting.
Advertisements

Risk & Information Management. World Leaders in Applications of Risk Assessment with Bayesian Networks Fenton, Neil, Marsh.
New Developments in Bayesian Network Software (AgenaRisk)
The influence of domain priors on intervention strategy Neil Bramley.
ICFIS, Leiden 21 August 2014 Norman Fenton Queen Mary University of London and Agena Ltd Limitations and opportunities of the likelihood.
A Tutorial on Learning with Bayesian Networks
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.
Slide 1 PGM 2012 The Sixth European Workshop on Probabilistic Graphical Models Granada, Spain 20 September 2012 Norman Fenton Queen Mary University of.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Bayesian Networks for Risk Assessment
Uncertainty Everyday reasoning and decision making is based on uncertain evidence and inferences. Classical logic only allows conclusions to be strictly.
1 Knowledge Engineering for Bayesian Networks. 2 Probability theory for representing uncertainty l Assigns a numerical degree of belief between 0 and.
1. Profile Decision-making and risk assessment under uncertainty Special expertise on software project risk assessment Novel applications of causal models.
Knowledge Engineering for Bayesian Networks
1 Knowledge Engineering for Bayesian Networks Ann Nicholson School of Computer Science and Software Engineering Monash University.
Data Mining Cardiovascular Bayesian Networks Charles Twardy †, Ann Nicholson †, Kevin Korb †, John McNeil ‡ (Danny Liew ‡, Sophie Rogers ‡, Lucas Hope.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
CS 589 Information Risk Management 6 February 2007.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
AI - Week 24 Uncertain Reasoning (quick mention) then REVISION Lee McCluskey, room 2/07
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
1 Knowledge Engineering for Bayesian Networks Ann Nicholson School of Computer Science and Software Engineering Monash University.
Constructing Belief Networks: Summary [[Decide on what sorts of queries you are interested in answering –This in turn dictates what factors to model in.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
For Monday after Spring Break Read Homework: –Chapter 13, exercise 6 and 8 May be done in pairs.
Probability, Bayes’ Theorem and the Monty Hall Problem
Bayesian Decision Theory Making Decisions Under uncertainty 1.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Graphical Causal Models: Determining Causes from Observations William Marsh Risk Assessment and Decision Analysis (RADAR) Computer Science.
Deciding when to intervene: A Markov Decision Process approach Xiangjin Zou(Rho) Department of Computer Science Rice University [Paolo Magni, Silvana Quaglini,
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Copyright © 2009 PMI RiskSIGNovember 5-6, 2009 RiskSIG - Advancing the State of the Art A collaboration of the PMI, Rome Italy Chapter and the RiskSIG.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Renaissance Risk Changing the odds in your favour Risk forecasting & examples.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Exeter University 10 October 2013 Norman Fenton Director of Risk & Information Management Research (Queen Mary University of London) and CEO of Agena Ltd.
1 Reasoning Under Uncertainty Artificial Intelligence Chapter 9.
Why Risk Models Should be Parameterised William Marsh, Risk Assessment and Decision Analysis Research Group.
Uncertainty Management in Rule-based Expert Systems
© 2003 By Default! A Free sample background from Slide 1 PCI Risk Model Comparisons An alternative model for case level estimation.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.
Chapter 7. Learning through Imitation and Exploration: Towards Humanoid Robots that Learn from Humans in Creating Brain-like Intelligence. Course: Robots.
Uncertainty in AI. Birds can fly, right? Seems like common sense knowledge.
SOFTWARE METRICS Software Metrics :Roadmap Norman E Fenton and Martin Neil Presented by Santhosh Kumar Grandai.
Slide 1 UCL JDI Centre for the Forensic Sciences 21 March 2012 Norman Fenton Queen Mary University of London and Agena Ltd Bayes and.
Decision Analytic Approaches for Evidence-Based Practice M8120 Fall 2001 Suzanne Bakken, RN, DNSc, FAAN School of Nursing & Department of Medical Informatics.
Department of Surgery and Cancer Imperial College London 20 May 2014 Norman Fenton Queen Mary University of London and Agena Ltd Improved Medical Risk.
Slide 1 SPIN 23 February 2006 Norman Fenton Agena Ltd and Queen Mary University of London Improved Software Defect Prediction.
1 DECISION MAKING Suppose your patient (from the Brazilian rainforest) has tested positive for a rare but serious disease. Treatment exists but is risky.
Decision Tree Algorithms Rule Based Suitable for automatic generation.
© 2015 McGraw-Hill Education. All rights reserved. Chapter 16 Decision Analysis.
- 1 - Outline Introduction to the Bayesian theory –Bayesian Probability –Bayes’ Rule –Bayesian Inference –Historical Note Coin trials example Bayes rule.
CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Prediction of Soil Corrosivity Index: A Bayesian Belief Network Approach Gizachew A. Demissie, PhD student Solomon Tesfamariam,
Decision Analysis Lecture 7
Discussion/Presentation of Park and Basu: “Alternative Evaluation Metrics for Risk Adjustment Models” Stephen P. Ryan, Olin.
Lecture on Bayesian Belief Networks (Basics)
Chapter 10: Using Uncertain Knowledge
Bayes Net Learning: Bayesian Approaches
Course: Autonomous Machine Learning
Data Mining Lecture 11.
Uncertainty in AI.
Generalization in deep learning
INTRODUCTION TO Machine Learning
Multiple DAGs Learning with Non-negative Matrix Factorization
Model selection/averaging for subjectivists
Presentation transcript:

why smart data is better than big data Queen Mary University of London Bayesian Networks why smart data is better than big data Bayesian Seminar 16 October 2015 Norman Fenton Queen Mary University of London and Agena Ltd Pleasure to have the opportunity to talk in this series. I tried to get into the first one. I arrived a couple minutes after 2.00 to find the room so packed that there was not even standing room left. I physically could not get in. That suggests to me there is a huge appetite for people to learn more about Bayesian methods and what I am going to talk about today is influenced by years of research and practical experience – largely in the area of risk assessment and decision analysis. As a Director of Agena I declare an interest up front because Agena is in the business of applying Bayesian methods to risk assessment and Agena has an established proprietary BN tool (for which there is a completely free version available).

From Bayes to Bayesian networks Outline From Bayes to Bayesian networks Why pure machine learning is insufficient Applications Way forward <CL>I am going to introduce BNs and explain why, due to relatively recent algorithmic breakthroughs, they have become an increasingly popular technique for risk assessment and decision analysis. <CL> I will explain why Bayesian networks ‘learnt’ purely from data – even when ‘big data’ is available - generally do not work well. <CL> I will provide an overview of successful applications (including transport safety, medical, law/forensics, operational risk, and football prediction). What is common to all of these applications is that the Bayesian network models are built using a combination of expert judgment and (often very limited) data. <CL> I will finally give and overview of the challenges ahead and conclusions

From Bayes to Bayesian networks

Introducing Bayes H (Person has disease?) We have a hypothesis H E (Positive Test?) We get some evidence E 1 in a 1000 100% accurate for those with disease; 95% accurate for those without Although I’m assuming most people here know what BT is I want to introduce it using the graphical formalism of BNs. <CLICK> we start with some hypothesis H (disease) – for simplicity assume Boolean T or F <CLICK> We some evidence about H (e.g. result of a diagnostic test) Again for simplicity assume this outcome is T or F. <CLICK> We have a prior probability for H – say 1/1000. so here is the prior probability table for H <CLICK> We also know the probability of the evidence given the hypothesis – this is the test accuracy. Suppose eg the test is always pos if a person has the disease P(E|H) = 1 and P(E| not H) is 0.05 So here is its probability table which you can see is conditioned on the state of H. This incidentally is a complete specification of a BN. <CL> But what we want to know is the prob …. I am sure most people here know the answer but it is worth pointing out that when this problem was presented to staff at students at Harvard medical school most said the answer was 95%. What is the probability a person has the disease if they test positive?

Waste of time showing this to most people!!! Bayes Theorem We have a prior P(H) = 0.001 Waste of time showing this to most people!!! We know the (likelihood) values for P(E|H) But we want the posterior P(H|E) P(H|E) = P(E|H)*P(H) P(E) P(E|H)*P(H) + P(E|not H)*P(not H) = In more familiar terms to most people here I suspect <CL> we have a prior for H <CL> we know P(E|H) the likelihood of E <CL> but what we really want to know is the posterior probability of the hypothesis given the evidence <CLICK> Bayes theorem gives us the necessary formula for this. <CLICK> which is of course very different to the 95% assumed by most doctors. This suggests that Bayes is counterintuitive to most lay people and domain experts. But worse <CL> showing them the formula and calculations neither makes them understand it nor convinces them the answer is correct. It might be easy for statisticians and for mathematically literate people in this simple case, But for MOST people – and this includes from my personal experience highly intelligent barristers, judges and surgeons – this is completely hopeless. And it is no good us arguing that it is not. 1*0.001 1*0.001 + 0.05*0.999 P(H|E)  2% = 0.001 0.5005 0.0196

Imagine 1,000 people To explain it you have to use diagrammatic methods like this.

One has the disease

But about 5% of the remaining 999 people without the disease test positive. That is about 50 people

So about 1 out of 50 who test positive actually have the disease That’s about 2% <CLICK> Of the 100 only one is guilty. <CLICK> So (if there is no other evidence against the defendant) there is a 99% chance that a person with the matching blood type is innocent <CLICK> Which is very different from the prosecution claim That’s very different from the 95% assumed by most medics

A more realistic scenario Cause 1 Cause 2 This is a Bayesian network Disease Y Disease Z Disease X Test A Symptom 1 Symptom 2 Test B The problem is that neither the formulaic approach nor the diagrammatic scales up. In any realistic scenario, our problem will involve more than just a single unknown hypothesis and piece of evidence. There may be more than one disease which leads to a positive result of Test A. So we introduce a test B more specific to Disease X, but which also has a separate dependency on Test A. There also might by observable symptoms of disease X Some of these may also be more or less likely with the other diseases. Then we might know of some common cause of the diseases. And another which is influenced or causes by the first. This is a Bayesian network. LACK of arcs represent conditional independence – so the BN is a simplified version of the full joint probability space over all variables. That is good for 2 reasons: 1) Having a visual representation improves understanding and communication and 2) it makes the Bayesian inference simpler. <FINAL CLICK> Unfortunately, despite this the necessary Bayesian calculations quickly become infeasible. Not only is it almost impossible to do the calculations manually even for small size BNs, but the problem of producing an efficient exact algorithm is know to be NP-hard (i.e. intractable) in general. The necessary Bayesian propagation calculations quickly become extremely complex

Combined Evidence/data The usual big mistake Combined Hypothesis Combined Evidence/data Now, although as I will show that problem has been to a large part resolved – there are many researchers including even Bayesian statisticians– who are unaware of these developments and this is a reason why BNs have so far been relatively under exploited. <CLICK> The ramifications are that it is very common for researchers to try to solve Bayesian inference problems by collapsing the ‘real model’ into effectively a 2-node model. The results of doing that can be misleading or simply wrong.

The Barry George case This flawed simplification is especially common when dealing with statistical forensic evidence in legal cases. <CL> BG was convicted in 2001 for the murder of TV presenter Jill Dando <CL> A critical part of the prosecution case was the discovery in BGs coat pocket of a tiny particle of gun powder residue that matched that of the gun which killed JD In 2007 BG’s lawyers used essentially a Bayesian argument successfully to argue that this evidence was ‘NEUTRAL’ and therefore should not have been presented at trial. A retrial was ordered wiith the gun powder evidence inadmissible and BG was not found guilty.

Evidence George fired gun The Barry George case But the claim that the evidence was neutral was simply an artefact of collapsing a complex BN model <LR> into a 2-node model <CL><CL> By collapsing it into a 2-node model the we can use the LR of the evidence to determine the extent to which the evidence favours the defence hyp or the prosecution hyp. We don’t even have to assume a prior probability for the hypothesis/ But by transforming it crucial information is lost and incorrect assumptions are made. In this case it was clear that while the evidence was neutral on H in the simple model it was NOT neutral on H in the full model. I was not involved in BG but have published work about it. I have, however, have been involved as an expert in several cases involving forensic and statistical evidence and this type of mistake is common. Only by using BNs can flaws in the foresnic scientists’ claims be exposed. This is especially worrying for DNA evidence. CONTINUED AVOIDANCE OF BNS IS SILLY BECAUSE A SOLUTION HAS BEEN AVAILABLE FOR 30 YEARS NOW. George fired gun Evidence George fired gun

Lauritzen and Spiegelhalter Late 1980s breakthrough Pearl Lauritzen and Spiegelhalter The real breakthrough came in the late 1980s when AI researchers <CL> Pearl and <CL> L&S discovered exact fast Bayesian algorithms that worked not for all BNs – that’s impossible – but for a very large class of practical BNs. Since then increasingly sophisticated and widely available tools that implement these algorithms have become available, meaning that nobody should ever manually do BT calculations nor write their own programs to do complex inference.

A Classic BN As an illustration of a classic BN model in action I can show you the famous Asian model. The idea here is that there is a chest clinic where patients come with different symptoms and we have to diagnose what’s wrong with them. For simplicity all the nodes here are Boolean.

Marginals Before entering evidence in the model the algorithm computes the marginal probs based on the user defined priors. So, e.g. the marginal for the dyspnea symptom (shortness of breath) is calculated from the user-defined conditional prob table of this node. The 50-50 for smoker simply means we provided a prior suggesting that 50% of people who have come to the clinic in the past are smokers. Only 1% had a recent visit to Asia. The marginals tells us that 45% of previous patients had bronchitis. Very few had TB or cancer

As we enter evidence in the model all the uncertain nodes result in updated prob distributions, So if the patient has shortness of breath then effect is that the chance of TB Bronch increases massively to 83%. Although the other 2 also increase Bronch is now much more likely than not the problem. Dyspnoea observed

Also non-smoker If the person is NOT a smoker the prob drops a bit but is still overwleminlgy the most likely.

Positive x-ray So we send the patient for an X-ray and it comes back positive (bad thing). Although bronch is still most likely TB and cancer are both up to about 25%.

..but recent visit to Asia We then find out the patient had a recent visit to Asia and everything changes. TB is now easily the most likely disease. So the power of BNs Explicitly model causal factors Reason from effect to cause and vice versa ‘Explaining away’ Overturn previous beliefs Make predictions with incomplete data Combine diverse types of evidence Visible auditable reasoning But first generation tools have significant limitations ……..

How to develop complex models Can we really LEARN this kind of model from data? The most obvious limitation of BN tools is that, while they are able to do the calculations in a BN model they provide minimal support for actually building the BN model. <CL> This is an actual Bayesian network model colleagues in my research group built for risk assessment and risk management of offending behaviour in released prisoners with serious background of violent behaviour. How do we build a model like this? Building a BN requires us to first build the graph structure and then to define the probability tables for each node. To see how difficult this could be look at a node with 5-states having 2-parents each with 5 states. <CL> <CL><CL>So there are 5 times 5 parent state combinations and each of these has to be defined for each of the 5 states. That’s 125 table entries. Imagine a node with 5 parents. Many people who use BNs assume that the only sensible way to build them is to LEARN both the structure and the tables from data <CL> But the data requirements for this are huge. Even when vast amounts of data are available structure learning is largely a waste of time. For table learning it can be fine, but many of the problems we deal with simply do not have the data and we have to rely at least in part on expert judgment. That has been the focus of our research and applications for several years.

How to develop complex models Definitional idiom Cause consequence idiom Induction idiom A Bayesian network model for risk assessment and risk management of offending behaviour in released prisoners with serious background of violent behaviour. Measurement idiom Idioms

How to develop complex models A Bayesian network model for risk assessment and risk management of offending behaviour in released prisoners with serious background of violent behaviour. Bayesian net objects

How to develop complex models Ranked nodes When it comes to building node probability tables we have developed and implemented the notion of ranked nodes that make it very easy to define large tables for a an important class of variables

Static discretisation: marginals But the most important development is the work on numeric variables that has been pioneered by my colleague MN that deals with a critical limitation of the first gen BN tools: their inability to properly and accurately handle continuous variables. Because the algorithms only apply to discrete nodes, any continuous variables have to be manually discretised. This is not only incredibly time consuming but also very inaccurate as there is generally no way of knowing in advance which ranges require the finest discretisations. One of the most important applications we worked on was software reliability and defect prediction, which involved model fragements like this. With the standard static disc this is the kind of result you get. Not how, e.g. we cannot differentiate between an observation of 2000 and a 20,000 KLOC In 2007 my colleague MN developed a DD algorithm which has been implemented in AR which largely resolves this critical problem. Static discretisation: marginals

Dynamic discretisation: marginals With DD there is no need to do any manual discretisation – the algorithm works on the whole range and dynamically discretises as is necessary based on where most of the probability mass lies. This is the same model using DD (which incidentally can be built in a couple of minutes). Dynamic discretisation: marginals

Static discretisation with observations Now compare what happens when you enter observations. Here is the result with static DD when KLOC=50 and p=0.2 Static discretisation with observations

Dynamic discretisation with observations Compared with the far more accurate results with DD. I should point out that AgenaRisk is the only BN tool that has implemented DD. Dynamic discretisation with observations

Why pure machine learning is insufficient What I will now try to explain is why good BN models inevitably require expert judgment to build and cannot be learnt from data alone – no matter how much data you have.

A typical data-driven study Age Delay in arrival Injury type Brain scan result Arterial pressure Pupil dilation Outcome (death y/n) 17 25 A N L Y 39 20 B M 23 65 21 80 C H 68 22 30 … .. In a typical data driven approach we have observations from a large number of patients – in the example here taken from a study attempting to build a model to predict at risk patients in A&E with head injuries. We have a bunch of variables representing observable factors about the patient and a record of the outcome. The idea is we want to use the date to learn a model to help identify patients most at risk of death <CL>

BN Model learnt purely from data Age Brain scan result Injury type Outcome Delay in arrival What you tend to end up with (and this is based on a published study) is a meaningless illogical structure. Poor predictive accuracy. BN used in this way is no better than any other pure ML technique. Arterial pressure Pupil dilation

Regression model learnt purely from data Delay in arrival Brain scan result Arterial pressure Pupil dilation Age Injury type Of course the classic statistical approach is to build a regression model. This is actually a special case of expert contribution (because there is a prior assumption about the structure). All variables <CL> except outcome <CL> are treated as independent risk factors affecting the dependent outcome variable. <CL> Often produces counterintuitive results like outcome OK for the ‘worst’ combination of risk factors. The classic 70% maximum classification accuracy. Outcome

Expert causal BN with hidden explanatory and intervention variables Brain scan result Arterial pressure Pupil dilation Delay in arrival Injury type Seriousness of injury Age Ability to recover Outcome Treatment What an expert can provide is the following causes and explanatory information. <CL> Delay and Injury type SERIOUSNESS <CL> <CL> Art pressue,,, are symptoms of the seriousness of injury <CL> Ability to recover is influenced by seriousness and age <CL> most crucially the outcome is influenced not just by you ability to recover but by whether or not you receive treatment. What the model was missing were crucial variables like seriousness of injury and treatment. Especially at risk patients are of course more likely to get urgent treatment to avoid worst outcomes. Hence the anomolies and inaccuracies of the data learnt and regression models. By relying on the data available rather than data that is necessary I continue to see very poor BN models learnt from data. Such models in fact perform no better than any of the other multitude of ML models ranging from regression models through to NNs.

Danger of pure data driven decision making: Example of a Bank database on loans Customer Age Marital status Employment status Home owner Salary Loan … Defaulted 1 37 M Employed Y 50000 10000 N 2 45 Self-employed 60000 5000 3 26 30000 20000 4 29 S 15000 5 90000 6 35 70000 7 32 40000 8 25000   9 18 Unemployed 10 40 65000 45000 11 21 12 30 13 22 14 3000 15 19 100000 100001 34 1000 100002 28 2000 100003 OK, so we might need expert judgment when we have missing data, but with good experimental design and lots of good quality data we can surely remove dependency on experts …… Because too many people ‘default’ on loans the bank wants to use machine learning techniques on this database to help decide whether or not to offer credit to new applicants. In other words they expect to ‘learn’ when to refuse loans on the basis that the customer profile is too ‘risky’. <CL> These are the problem customers. The fundamental problem with such an approach is that it can learn nothing about those customers who were refused credit precisely because the bank decided they were likely to default. Any causal knowledge about such (potential) customers is missing from the data. Suppose, for example, that the bank normally refuses credit to people under 20, unless their parents are existing high-income customers known to a bank manager. <CL> Such special cases (like customers 9, 15, 100003 above) show up in the database and they never default. Any pure data driven learning algorithm will 'learn' that unemployed people under 20 never default - the exact opposite of reality in almost all cases. Pure machine learning will therefore recommend giving credit to people known most likely to default.

Other examples See:www.probabilityandlawblogspot.co.uk Massive databases cannot learn even tiny models The massive shadow cast by Simpson’s paradox See:www.probabilityandlawblogspot.co.uk What you tend to end up with (and this is based on a published study) is a meaningless illogical structure. Poor predictive accuracy. BN used in this way is no better than any other pure ML technique.

applications

Legal arguments and forensics As mentioned earlier we have developed many BN to capture complex legal evidence – like forensic evidence and especially to expose many hidden and incorrect assumptions made by both forensic scientists and lawyers. This is from a real case (ongoing) that has exposed fundamental flaws in the way DNA evidence was interpreted in a rape case. Models use basic statistical assumptions from DNA and expert judgment. Legal arguments and forensics

Football prediction overview We have used BNs extensively in different areas of football analysis including predicting premiership results. Models uses historical data and expert judgment. Primarily AC. This is the high level view of such a model.

Parameter learning from past data One component learns parameters from previous seasons data but with current adjustments from experts.

Game specific information

Taking account of fatigue

Incorporating recent match data

Final prediction

Final prediction www.pi-football.com Constantinou, A., N. E. Fenton and M. Neil (2013): "Profiting from an Inefficient Association Football Gambling Market: Prediction, Risk and Uncertainty Using Bayesian Networks". Knowledge-Based Systems. Vol 50, 60-86

The Royal London Hospital US Army Institute of Surgical Research Trauma Care Case Study QM RIM Group The Royal London Hospital US Army Institute of Surgical Research Our case study was to provide decision support for the treatment of lower extremity injuries. The surgeons provided the clinical knowledge and data for the BN models developed by William and barbaros. There were actually two major models developed: one concerning patient physiology in trauma care that is focused on coagulopathy risk and the other which also involved the US Army Institute of Surgical research was concerned with predicting limb viability.

Improving on MESS Score method The motivation for the work was to improve on the state-of-the-art with respect to decision making for amputations. The most prominent existing model was a scoring system based on data about whether amputations were made. So the model is essentially helping to predict whether an amputation was done rather than whether or not it should be done. The model failed to incorporate causal features of the process and failed to take account of the patient’s physiological state.

Life Saving: Prediction of Physiological Disorders Treatment of lower extremities involves multiple decision making stages, and priorities in these stages changes as the treatment progress. The first BN model we developed is aimed for the life saving stage of the lower extremity treatment. Surgeons follow life over limb strategy in treatment of lower extremities. If the patient’s life is danger, they postpone the definitive reconstruction operations until patient’s physiology stabilises and risk of death decreases. An important physiological disorder in this phase is acute traumatic coagulopathy. We built a BN model that accurately predicts ATC using the observations available in first 10 minutes of treatment. Our model was validated on three different datasets from three different hospitals and two countries. It had good results in all validations (external and temporal).

Limb Saving: Prediction of Limb Viability Our second model aims to predict the viability of lower extremities after salvage is attempted. The model uses injury and treatment information to predict non-viable lower extremity and failed salvage attempt. This model was based on a large systematic review and meta-analysis of the literauture, and a large data set from USAISR. The dataset contained information about the lower extremity injuries of the US injured military personnel. The model’s results was accurate, it outperformed a well known scoring model, and multiple data driven algorithms. In summary, both of the models built in our case studies had significant contributions to the clinical literature as well. These models provide accurate predictions to two important clinical areas where previous models have failed. Both models analyse the data based on clinical knowledge, and evidence from literature. DATA + KNOWLEDGE

www.traumamodels.com You can actually run the models online at this website

This interface hides the underlying model complexity and allows users to enter basic patient information and get updated risk probabilities in real-time.

Operational Risk

Way forward

Big Data Big Data … or Smart Data? machine learning causal models Knowledge There are many who are unaware of the Bayesian developments who feel that the real solution to the problems I have spoken about will come with the advent of big data and increasingly powerful pure machine learning algorithms. I feel very strongly that much of this big data drive is an unnecessary waste of effort. Big data <CL> churned through pure machine learning <CL more often than not delivers rubbish.<CL> <CL>It is the combination of knowledge <CL> and smart data <CL> which generates causal models that make sense and the Bayesian approach is the most effective method for this smart data approach. Smart Data

Challenges Building good models with minimal data Tackle resistance to subjective priors Make BN models easier to use and understand BAYES-KNOWLEDGE bayes-knowledge.com

Bayesian calculations can and should be done with BN tools Conclusions Bayesian calculations can and should be done with BN tools Some of the most serious limitations of BN tools and algorithms have been resolved BNs have been used effectively in a range of real world problems. Most of these BNs involve expert judgment and not just data <CLICK> Indeed subjective approach and Bayes is only rational way to reason under uncertainty and is the only rational way to do risk assessment. <CLICK> BNs in real use have been underreported. They are not just an academic research tool. <CLUCK> Many of the traditional genuine barriers have now been removed. Manual model building has been revolutionised by improvements in tool design and advances in methods for generating tables from mimimal user input. The achilles heel of continuous nodes has essentially been fixed. There are issues of computationasl complexity, but these are even worse in alternaitve approaches such as Monte Carlo. So the remaining problems are largely perceptual. To gain trust in Bayes we need visual non math arguments. There should NEVER be any need for discussion about the Bayesian calculations, just as there should not be any need to discuss or challenge say how a calculator is used to compute a long division. Under no circumstances should we assume that decision-makers can do the calculations or understand the way such calculations are done. I have indicated how BN tools have already been used with some effect. I believe that in 50 years time professionals of all types icluding those in insurance, law and even medicine will look back in total disbelief that they could have ignored these available techniques of reasoning about risk for so long.

Follow up Get the papers eecs.qmul.ac.uk/~norman Get the book BayesianRisk.com Propose case study for BAYES-KNOWLEDGE bayes-knowledge.com Try the free software and models AgenaRisk.com