Medical Decision-Support Systems Probabilistic Reasoning in Diagnostic Systems Yuval Shahar, M.D., Ph.D.

Slides:



Advertisements
Similar presentations
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
Advertisements

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Chapter 4 Pattern Recognition Concepts: Introduction & ROC Analysis.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Receiver Operating Characteristic (ROC) Curves
Uncertainty Everyday reasoning and decision making is based on uncertain evidence and inferences. Classical logic only allows conclusions to be strictly.
FT228/4 Knowledge Based Decision Support Systems
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
Bayesian Decision Theory
Introduction of Probabilistic Reasoning and Bayesian Networks
Uncertain Reasoning CPSC 315 – Programming Studio Spring 2009 Project 2, Lecture 6.
What is Statistical Modeling
Diagnosing – Critical Activity HINF Medical Methodologies Session 7.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Model Evaluation Metrics for Performance Evaluation
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Learning with Bayesian Networks David Heckerman Presented by Colin Rickert.
Bayesian Nets and Applications Today’s Reading: C. 14 Next class: machine learning C. 18.1, 18.2 Questions on the homework?
Design of Health Technologies lecture 19 John Canny 11/14/05.
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.
1 Learning with Bayesian Networks Author: David Heckerman Presented by Yan Zhang April
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Judgment and Decision Making in Information Systems Probability, Utility, and Game Theory Yuval Shahar, M.D., Ph.D.
For Monday after Spring Break Read Homework: –Chapter 13, exercise 6 and 8 May be done in pairs.
Judgment and Decision Making in Information Systems Computing with Influence Diagrams and the PathFinder Project Yuval Shahar, M.D., Ph.D.
Judgement and Decision Making in Information Systems Diagnostic Modeling: Bayes’ Theorem, Influence Diagrams and Belief Networks Yuval Shahar, M.D., Ph.D.
Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)
6 Probability Chapter6 p Operations on events and probability An event is the basic element to which probability can be applied. Notations Event:
Judgment and Decision Making in Information Systems Introduction: Decision Analysis and Human Judgment Yuval Shahar, M.D., Ph.D.
Medical decision making. 2 Predictive values 57-years old, Weight loss, Numbness, Mild fewer What is the probability of low back cancer? Base on demographic.
Non-Traditional Metrics Evaluation measures from the Evaluation measures from the medical diagnostic community medical diagnostic community Constructing.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Soft Computing Lecture 17 Introduction to probabilistic reasoning. Bayesian nets. Markov models.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
1/20 Remco Chang (Computer Science) Paul Han (Tufts Medical / Maine Medical) Holly Taylor (Psychology) Improving Health Risk Communication: Designing Visualizations.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Visibility Graph. Voronoi Diagram Control is easy: stay equidistant away from closest obstacles.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Previous Lecture: Data types and Representations in Molecular Biology.
MEASURES OF TEST ACCURACY AND ASSOCIATIONS DR ODIFE, U.B SR, EDM DIVISION.
Bayesian Nets and Applications. Naïve Bayes 2  What happens if we have more than one piece of evidence?  If we can assume conditional independence 
Likelihood 2005/5/22. Likelihood  probability I am likelihood I am probability.
Evidence-Based Medicine Diagnosis Component 2 / Unit 5 1 Health IT Workforce Curriculum Version 1.0 /Fall 2010.
Uncertainty Management in Rule-based Expert Systems
Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to make and use are universally true. Observations of the.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Prediction statistics Prediction generally True and false, positives and negatives Quality of a prediction Usefulness of a prediction Prediction goes Bayesian.
Diagnostic Tests Studies 87/3/2 “How to read a paper” workshop Kamran Yazdani, MD MPH.
1 Tournament Not complete Processing will begin again tonight, 7:30PM until wee hours Friday, 8-5. Extra Credit 5 points for passing screening, in tournament.
Decision Analytic Approaches for Evidence-Based Practice M8120 Fall 2001 Suzanne Bakken, RN, DNSc, FAAN School of Nursing & Department of Medical Informatics.
Clinical Decision Support 1 Historical Perspectives.
Diagnostic Test Characteristics: What does this result mean
Textbook Basics of an Expert System: – “Expert systems: Design and Development,” by: John Durkin, 1994, Chapters 1-4. Uncertainty (Probability, Certainty.
Laboratory Medicine: Basic QC Concepts M. Desmond Burke, MD.
Bayesian Inference Artificial Intelligence CMSC February 26, 2002.
PROBABILITY AND BAYES THEOREM 1. 2 POPULATION SAMPLE PROBABILITY STATISTICAL INFERENCE.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
PTP 560 Research Methods Week 12 Thomas Ruediger, PT.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Sensitivity, Specificity, and Receiver- Operator Characteristic Curves 10/10/2013.
Bayesian Nets and Applications Next class: machine learning C. 18.1, 18.2 Homework due next class Questions on the homework? Prof. McKeown will not hold.
Bayes’ Theorem Suppose we have estimated prior probabilities for events we are concerned with, and then obtain new information. We would like to a sound.
Probability and Statistics
Component 11: Configuring EHRs
Data Mining Classification: Alternative Techniques
Presentation transcript:

Medical Decision-Support Systems Probabilistic Reasoning in Diagnostic Systems Yuval Shahar, M.D., Ph.D.

Reasoning Under Uncertainty in Medicine Uncertainty is inherent to medical reasoning – relation of diseases to clinical and laboratory findings is probabilistic –Patient data itself is often uncertain with respect to value and time –Patient preferences regarding outcomes vary –Cost of interventions and therapy can change

Probability: A Quick Introduction Probability function, range: [0, 1] Prior probability of A, P(A): with no new information (e.g., no patient information) Posterior probability of A: P(A) given certain information (e.g. laboratory tests) Conditional probability: P(B|A) Independence of A, B: P(B) = P(B|A) Conditional independence of B,C, given A: P(B|A) = P(B|A & C) –(e.g., two symptoms, given a specific disease)

Probabilistic Calculus P(not(A)) = 1-P(A) In general: –P(A & B) = P(A) * P(B|A) If A, B are independent: –P(A & B) = P(A) * P(B) If A, B are mutually exclusive: –P(A or B) = P(A) + P(B) If A,B not mutually exclusive, but independent: –P(A or B) = 1-P(not(A) & not(B)) = 1-(1-P(A))(1-P(B))

Test Characteristics Disease Test result Disease present Disease absent Total Positive True positive (TP) False positive (FP) TP+FP Negative False negative (FN) True negative (TN) FN+TN TP+FNFP+TN

Test Performace Measures The gold standard test: the procedure that defines presence or absence of a disease (often, very costly) The index test: The test whose performance is examined True positive rate (TPR) = Sensitivity: –P(Test is positive|patient has disease) = P(T+|D+) –Ratio of number of diseased patients with positive tests to total number of patient: TP/(TP+FN) True negative rate (TNR) = Specificity –P(Test is negative|patient has no disease) = P(T-|D-) –Ratio of number of nondiseased patients with negative tests to total number of patients: TN/(TN+FP)

Test Predictive Values Positive predictive value (PV+) = P(D|T+) = TP/(TP+FP) Negative predictive value (PV-) = P(D-|T-) = TN/(TN+FN)

Lab Tests: What is “Abnormal”?

The Cut-off Value Trade off Sensitivity and specificity depend on the cut off value between what we define as normal and abnormal Assume high test values are abnormal; then, moving the cut-off value to a higher one increases FN results and decreases FP results (i.e. more specific) and vice versa There is always a trade off in setting the cut-off point

Receiver Operating Characteristic (ROC) Curves: Examples

Receiver Operating Characteristic (ROC) Curves: Interpretation ROC curves summarize the trade-off between the TPR (sensitivity) and the false positive rate (FPR) (1-specificity) for a particular test, as we vary the cut-off treshold The greater the area under the ROC curve, the better (more sensitive, more specific)

Bayes Theorem

Odds-Likelihood (Odds Ratio) Form of Bayes Theorem Odds = P(A)/(1-P(A)); P = Odds/(1+Odds) Post-test odds = pretest odds * likehood ratio

Application of Bayes Theorem Needs reliable pre-test probabilities Needs reliable post-test likelihood ratios Assumes one disease only (mutual exclusivity of diseases) Can be used in sequence for several tests, but only if they are conditionally independent given the disease; then we use the post-test probability of T i as the pre- test probability for T i+1 (Simple, or Naïve, Bayes)

Relation of Pre-Test and Post-Test Probabilities

Example: Computing Predictive Values Assume P(Down Syndrom): – (A) 0.1% (age 30) – (B) 2% (age 45) Assume amniocentesis with Sensitivity of 99%, Specificity of 99% for Down Syndrom PV+ = P(DS|Amnio+) PV- = P(DS-|Amnio-) = %

Predictive Values: Down Syndrom

Example: de Dombal’s System (1972) Domain: Acute abdominal pain (7 possible diagnoses) Input: Signs and symptoms of patient Output: Probability distribution of diagnoses Method: Naïve Bayesian classification Evaluation: an eight-center study involving 250 physicians and 16,737 patients Results: –Diagnostic accuracy rose from 46 to 65% –The negative laparotomy rate fell by almost half –Perforation rate among patients with appendicitis fell by half –Mortality rate fell by 22% Results using survey data consistently better than the clinicians’ opinions and even the results using human probability estimates!

Decision Trees A convenient way to explicitly show the order and relationships of possible decisions, uncertain outcomes of decisions, and outcome utilities Enable computation of the decision that maximizes expected utility

Decision Trees Conventions Decision node Chance node Information link Influence link

A Generic Decision Tree

Decision Trees: an HIV Example Decision node Chance node

Computation With Decision Trees Decision trees are “folded back” to the top most (leftmost, or initial) decision Computation is performed by averaging expected utility recursively over tree branches from right to left (bottom up), maximizing utility for every decision made and assuming that this is the expected utility for the subtree that follows the computed decision

Influence Diagrams: Node Conventions Chance node Decision node Utility node

Link Semantics in Influence Diagrams Dependence link Information link Influence link

Influence Diagrams: An HIV Example

The Structure of Influence Diagram Links

Belief Networks (Bayesian/Causal Probabilistic/Probabilistic Networks, etc) Disease Fever Sinusitis Runny nose Headache Influence diagrams without decision and utility nodes Gender

Link Semantics in Belief Networks Dependence Independence Conditional independence of B and C, given A B C A

Advantages of Influence Diagrams and Belief Networks Excellent modeling tool that supports acquisition from domain experts –Intuitive semantics (e.g., information and influence links) –Explicit representation of dependencies –very concise representation of large decision models “Anytime” algorithms available (using probability theory) to compute the distribution of values at any node given the values of any subset of the nodes (e.g., at any stage of information gathering) Explicit support for value of information computations

Disadvantages of Influence Diagrams and Belief Networks Explicit representation of dependencies often requires acquisition of joint probability distributions (P(A|B,C)) Computation in general intractable (NP hard) Order of decisions and relations between decisions and available information might be obscured

Value of Information (VI) We often need to decide what would be the next best piece of information to gather (e.g., within a diagnostic process); that is, what is the best next question to ask (e.g., what would be the result of a urine culture?) The Value of Information (VI) of feature f is the marginal expected utility of an optimal decision made knowing f, compared to making it without knowing f The net value of information (NVI) of f = VI(f)-cost(f) NVI is highly useful for a hypothetico-deductive diagnostic approach to decide what would be the next information item, if any, to investigate

Examples of Successful Belief- Network Applications In clinical medicine: –Pathological diagnosis at the level of a subspecialized medical expert (Pathfinder) –Endocrinological diagnosis (NESTOR) In bioinformatics: –Recognition of meaningful sites and features in DNA sequences –Educated guess of tertiary structure of proteins

The Pathfinder Project (Heckerman, Horvitz, Nathwani 1992) Task and domain: Diagnosis of lymph node biopsy, an important medical problem –Large difference between expert and general pathologist opinions (almost 65%!) Problems in the domain include –Misrecognition of features (information gathering) –Misintegration of evidence (information processing) The Pathfinder project focused mainly on assistance in information processing A Stanford/USC collaboration; eventually commercialized as Intellipath, marketed by the ACP, used as early as 1992 by at least 200 pathology sites

Pathfinder Domain More than 60 diseases More than 130 findings, such as: –Microscopic – immunological –molecular biology –Laboratory –Clinical Commercial product extended to at least 10 more medical domains

Pathfinder I/O behavior Input: set of ( ) pairs (e.g., –Instances are mutually exclusive values of each feature –Prior probability of each disease D k is known –P(F 1 I 1, F 2 I 2 …F t I t | D k,  is in acquired knowledge base Output: P(D k |F 1 I 1, F 2 I 2 …F m I m,   = background knowledge (context) User can ask what is the next best (cost-effective) feature to investigate or enter -Probabilistic (decision-theoretic) hypothethico-deductive approach Distribution of each D k is updated dynamically

Pathfinder Methodology: Probabilities and Utilities Decision-theoretic computation Bayesian approach: Probabilities represent beliefs of experts (data can update beliefs) Utilities represented as a matrix of all diseases A matrix entry pair encodes the (patient) utility of diagnosing D k when patient really has D k Since no therapeutic recommendations are made, the model can use one representative patient (the expert), expressed in micromorts and willingness-to-pay to avoid risk of each outcome

Pathfinder Computation Normally we would use the general form of Bayes Theorem: But that involves exponential number of probabilities to be acquired and represented

Pathfinder 1: The Simple Bayes Version Assuming conditional independence of features (Simple or Naïve Bayes): Assuming mutual exclusivity and exhaustiveness of diseases the overall computation is tractable:

Pathfinder 2: The Belief Network Version Mutual exclusivity and exhaustiveness of diseases is reasonable in lymphnode pathology –Single disease per examined lymph node –Large, exhaustive knowledge base Conditional independence is less reasonable and can lead to erroneous conclusions The simple Bayes representation of Pathfinder 1 was therefore enhanced to a belief network in Pathfinder 2 which included explicit dependencies between different features, still taking advantage of any explicit global and conditional independencies

Decision-Theoretic Diagnosis Using the utility matrix and given observations , the expected diagnostic utility using  is averaged over all diagnoses: –EU(D k (  )) =  j P(D j |  )U(D j,D k ) Thus, Dx(  ) = ARGMAX k [EU(D k (  )) However, since the diagnosis is sensitive to the utility model, Pathfinder does not recommend it, only the probabilities P(D k |  )

Pathfinder: Gathering Information Next best feature to observe is recommended using a myopic approximation, which considers only up to one single feature to be observed The feature chosen maximizes EU given that a diagnosis would be made after observing it Feature f is chosen that maximizes NVI(f) Although myopic approximation could backfire, in practice it works well –especially when U(D j,D k ) =is set to 0 if one of the diseases is malignant and the other benign, and set to 1 if they are both malignant or both benign

Pathfinder 2: Knowledge Acquisition To facilitate acquisition of multiple probabilities, a Similarity Network model was developed Using similarity networks, an expert creates multiple small belief networks, representing 2 or more diseases that are difficult to distinguish The local belief networks are then unified into a global belief network, preserving soundness The graphical interface also allows partitioning of diseases into sets, relative to each set some feature is independent, thus further assisting in the construction

Pathfinder 1 and 2: Evaluation Pathfinder 1 was compared to Pathfinder 2 using 53 cases, a new user, and a thorough analysis of each case –Diagnostic accuracy of PF2 is greater than that of PF1 (gold standard: the main domain expert’s distribution and his assessment on a scale of 1 to 10) –Difference is due to better probabilistic representation (better acquisition and inference) –Cost of constructing PF2 rather than PF1 is justified by the improvements, (measure: the utility of the diagnosis) –PF2 is at least as good as the main domain expert, with respect to diagnostic accuracy