A gentle introduction to the mathematics of biosurveillance: Bayes Rule and Bayes Classifiers Associate Member The RODS Lab University of Pittburgh Carnegie.

Slides:



Advertisements
Similar presentations
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Advertisements

INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Lecture 5 Bayesian Learning
Uncertainty Everyday reasoning and decision making is based on uncertain evidence and inferences. Classical logic only allows conclusions to be strictly.
FT228/4 Knowledge Based Decision Support Systems
A gentle introduction to the mathematics of biosurveillance: Bayes Rule and Bayes Classifiers Associate Member The RODS Lab University of Pittburgh Carnegie.
Executive Summary: Bayes Nets Andrew W. Moore Carnegie Mellon University, School of Computer Science Note to other teachers.
KETIDAKPASTIAN (UNCERTAINTY) Yeni Herdiyeni – “Uncertainty is defined as the lack of the exact knowledge that would enable.
Rosa Cowan April 29, 2008 Predictive Modeling & The Bayes Classifier.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.6 [P]: Reasoning Under Uncertainty Section.
An introduction to time series approaches in biosurveillance Professor The Auton Lab School of Computer Science Carnegie Mellon University
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 12 Jim Martin.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
KI2 - 2 Kunstmatige Intelligentie / RuG Probabilities Revisited AIMA, Chapter 13.
Sample Midterm question. Sue want to build a model to predict movie ratings. She has a matrix of data, where for M movies and U users she has collected.
Copyright © 2004, Andrew W. Moore Naïve Bayes Classifiers Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
Probabilistic Prediction Algorithms Jon Radoff Biophysics 101 Fall 2002.
1 A Tutorial on Bayesian Networks Modified by Paul Anderson from slides by Weng-Keen Wong School of Electrical Engineering and Computer Science Oregon.
For Monday after Spring Break Read Homework: –Chapter 13, exercise 6 and 8 May be done in pairs.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
CISC 4631 Data Mining Lecture 06: Bayes Theorem Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside)
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.
Aug 25th, 2001Copyright © 2001, Andrew W. Moore Probabilistic and Bayesian Analytics Andrew W. Moore Associate Professor School of Computer Science Carnegie.
6/28/2014 CSE651C, B. Ramamurthy1.  Classification is placing things where they belong  Why? To learn from classification  To discover patterns  To.
Soft Computing Lecture 17 Introduction to probabilistic reasoning. Bayesian nets. Markov models.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Introduction to Probability Theory March 24, 2015 Credits for slides: Allan, Arms, Mihalcea, Schutze.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Digital Statisticians INST 4200 David J Stucki Spring 2015.
No. 1 Classification and clustering methods by probabilistic latent semantic indexing model A Short Course at Tamkang University Taipei, Taiwan, R.O.C.,
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
CSE 446: Point Estimation Winter 2012 Dan Weld Slides adapted from Carlos Guestrin (& Luke Zettlemoyer)
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
1 Reasoning Under Uncertainty Artificial Intelligence Chapter 9.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.
CSE PR 1 Reasoning - Rule-based and Probabilistic Representing relations with predicate logic Limitations of predicate logic Representing relations.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Elementary manipulations of probabilities Set probability of multi-valued r.v. P({x=Odd}) = P(1)+P(3)+P(5) = 1/6+1/6+1/6 = ½ Multi-variant distribution:
Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to make and use are universally true. Observations of the.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Sep 10th, 2001Copyright © 2001, Andrew W. Moore Learning Gaussian Bayes Classifiers Andrew W. Moore Associate Professor School of Computer Science Carnegie.
1 Bayesian Networks: A Tutorial. 2 Introduction Suppose you are trying to determine if a patient has tuberculosis. You observe the following symptoms:
CS 401R: Intro. to Probabilistic Graphical Models Lecture #6: Useful Distributions; Reasoning with Joint Distributions This work is licensed under a Creative.
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 24 of 41 Monday, 18 October.
Nov 30th, 2001Copyright © 2001, Andrew W. Moore PAC-learning Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University.
Oct 29th, 2001Copyright © 2001, Andrew W. Moore Bayes Net Structure Learning Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon.
CS 2750: Machine Learning Probability Review Prof. Adriana Kovashka University of Pittsburgh February 29, 2016.
Weng-Keen Wong, Oregon State University © Bayesian Networks: A Tutorial Weng-Keen Wong School of Electrical Engineering and Computer Science Oregon.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
Bayes Rule and Bayes Classifiers
School of Computer Science & Engineering
Bayesian Networks: A Tutorial
Data Mining Lecture 11.
Introduction to Information Retrieval
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Spreadsheets, Modelling & Databases
Academic Debate and Critical Thinking
28th September 2005 Dr Bogdan L. Vrusias
Presentation transcript:

A gentle introduction to the mathematics of biosurveillance: Bayes Rule and Bayes Classifiers Associate Member The RODS Lab University of Pittburgh Carnegie Mellon University Andrew W. Moore Professor The Auton Lab School of Computer Science Carnegie Mellon University Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: Comments and corrections gratefully received.

2 What we’re going to do We will review the concept of reasoning with uncertainty Also known as probability This is a fundamental building block It’s really going to be worth it

3 Discrete Random Variables A is a Boolean-valued random variable if A denotes an event, and there is some degree of uncertainty as to whether A occurs. Examples A = The next patient you examine is suffering from inhalational anthrax A = The next patient you examine has a cough A = There is an active terrorist cell in your city

4 Probabilities We write P(A) as “the fraction of possible worlds in which A is true” We could at this point spend 2 hours on the philosophy of this. But we won’t.

5 Visualizing A Event space of all possible worlds Its area is 1 Worlds in which A is False Worlds in which A is true P(A) = Area of reddish oval

6 The Axioms Of Probabi lity

7 0 <= P(A) <= 1 P(True) = 1 P(False) = 0 P(A or B) = P(A) + P(B) - P(A and B) The area of A can’t get any smaller than 0 And a zero area would mean no world could ever have A true

8 Interpreting the axioms 0 <= P(A) <= 1 P(True) = 1 P(False) = 0 P(A or B) = P(A) + P(B) - P(A and B) The area of A can’t get any bigger than 1 And an area of 1 would mean all worlds will have A true

9 Interpreting the axioms 0 <= P(A) <= 1 P(True) = 1 P(False) = 0 P(A or B) = P(A) + P(B) - P(A and B) A B

10 A B Interpreting the axioms 0 <= P(A) <= 1 P(True) = 1 P(False) = 0 P(A or B) = P(A) + P(B) - P(A and B) P(A or B) B P(A and B) Simple addition and subtraction

11 These Axioms are Not to be Trifled With There have been attempts to do different methodologies for uncertainty Fuzzy Logic Three-valued logic Dempster-Shafer Non-monotonic reasoning But the axioms of probability are the only system with this property: If you gamble using them you can’t be unfairly exploited by an opponent using some other system [di Finetti 1931]

12 Another important theorem 0 <= P(A) <= 1, P(True) = 1, P(False) = 0 P(A or B) = P(A) + P(B) - P(A and B) From these we can prove: P(A) = P(A and B) + P(A and not B) AB

13 Conditional Probability P(A|B) = Fraction of worlds in which B is true that also have A true F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 “Headaches are rare and flu is rarer, but if you’re coming down with ‘flu there’s a chance you’ll have a headache.”

14 Conditional Probability F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 P(H|F) = Fraction of flu-inflicted worlds in which you have a headache = #worlds with flu and headache #worlds with flu = Area of “H and F” region Area of “F” region = P(H and F) P(F)

15 Definition of Conditional Probability P(A and B) P(A|B) = P(B) Corollary: The Chain Rule P(A and B) = P(A|B) P(B)

16 Probabilistic Inference F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 One day you wake up with a headache. You think: “Drat! 50% of flus are associated with headaches so I must have a chance of coming down with flu” Is this reasoning good?

17 Probabilistic Inference F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 P(F and H) = … P(F|H) = …

18 Probabilistic Inference F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2

19 What we just did… P(A ^ B) P(A|B) P(B) P(B|A) = = P(A) P(A) This is Bayes Rule Bayes, Thomas (1763) An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:

20 Menu Bad HygieneGood Hygiene Menu You are a health official, deciding whether to investigate a restaurant You lose a dollar if you get it wrong. You win a dollar if you get it right Half of all restaurants have bad hygiene In a bad restaurant, ¾ of the menus are smudged In a good restaurant, 1/3 of the menus are smudged You are allowed to see a randomly chosen menu

21

22 Menu

23 Bayesian Diagnosis BuzzwordMeaningIn our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad?

24 Bayesian Diagnosis BuzzwordMeaningIn our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x)P(Bad)1/2

25 Bayesian Diagnosis BuzzwordMeaningIn our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x)P(Bad)1/2 Evidence Some symptom, or other thing you can observe Smudge

26 Bayesian Diagnosis BuzzwordMeaningIn our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x)P(Bad)1/2 Evidence Some symptom, or other thing you can observe Conditional Probability of seeing evidence if you did know the true state P(Smudge|Bad)3/4 P(Smudge|not Bad)1/3

27 Bayesian Diagnosis BuzzwordMeaningIn our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x)P(Bad)1/2 Evidence Some symptom, or other thing you can observe Conditional Probability of seeing evidence if you did know the true state P(Smudge|Bad)3/4 P(Smudge|not Bad)1/3 Posterior The Prob(true state = x | some evidence) P(Bad|Smudge)9/13

28 Bayesian Diagnosis BuzzwordMeaningIn our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x)P(Bad)1/2 Evidence Some symptom, or other thing you can observe Conditional Probability of seeing evidence if you did know the true state P(Smudge|Bad)3/4 P(Smudge|not Bad)1/3 Posterior The Prob(true state = x | some evidence) P(Bad|Smudge)9/13 Inference, Diagnosis, Bayesian Reasoning Getting the posterior from the prior and the evidence

29 Bayesian Diagnosis BuzzwordMeaningIn our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x)P(Bad)1/2 Evidence Some symptom, or other thing you can observe Conditional Probability of seeing evidence if you did know the true state P(Smudge|Bad)3/4 P(Smudge|not Bad)1/3 Posterior The Prob(true state = x | some evidence) P(Bad|Smudge)9/13 Inference, Diagnosis, Bayesian Reasoning Getting the posterior from the prior and the evidence Decision theory Combining the posterior with known costs in order to decide what to do

30 Many Pieces of Evidence

31 Many Pieces of Evidence Pat walks in to the surgery. Pat is sore and has a headache but no cough

32 Many Pieces of Evidence P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3 Pat walks in to the surgery. Pat is sore and has a headache but no cough Priors Conditionals

33 Many Pieces of Evidence P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3 Pat walks in to the surgery. Pat is sore and has a headache but no cough What is P( F | H and not C and S ) ? Priors Conditionals

34 P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3 The Naïve Assumption

35 If I know Pat has Flu… …and I want to know if Pat has a cough… …it won’t help me to find out whether Pat is sore P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3 The Naïve Assumption

36 If I know Pat has Flu… …and I want to know if Pat has a cough… …it won’t help me to find out whether Pat is sore Coughing is explained away by Flu P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3 The Naïve Assumption

37 P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3 The Naïve Assumption: General Case If I know the true state… …and I want to know about one of the symptoms… …then it won’t help me to find out anything about the other symptoms Other symptoms are explained away by the true state

38 P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3 The Naïve Assumption: General Case If I know the true state… …and I want to know about one of the symptoms… …then it won’t help me to find out anything about the other symptoms Other symptoms are explained away by the true state What are the good things about the Naïve assumption? What are the bad things?

39 P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3

40 P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3

41 P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3

42 How do I get P(H and not C and S and F)? P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3

43 P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3

44 P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3 Chain rule: P( █ and █ ) = P( █ | █ ) × P( █ )

45 P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3 Naïve assumption: lack of cough and soreness have no effect on headache if I am already assuming Flu

46 P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3 Chain rule: P( █ and █ ) = P( █ | █ ) × P( █ )

47 P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3 Naïve assumption: Sore has no effect on Cough if I am already assuming Flu

48 P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3 Chain rule: P( █ and █ ) = P( █ | █ ) × P( █ )

49 P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3

50 P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3

51 P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3

52 P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3 = (11% chance of Flu, given symptoms)

53 Building A Bayes Classifier P(Flu)=1/40P(Not Flu)=39/40 P( Headache | Flu )=1/2P( Headache | not Flu )=7 / 78 P( Cough | Flu )=2/3P( Cough | not Flu )=1/6 P( Sore | Flu )=3/4P( Sore | not Flu )=1/3 Priors Conditionals

54 The General Case

55 Building a naïve Bayesian Classifier P(State=1)=___P(State=2)=___…P(State=N)=___ P( Sym 1 =1 | State=1 )=___P( Sym 1 =1 | State=2 )=___…P( Sym 1 =1 | State=N )=___ P( Sym 1 =2 | State=1 )=___P( Sym 1 =2 | State=2 )=___…P( Sym 1 =2 | State=N )=___ ::::::: P( Sym 1 =M 1 | State=1 )=___P( Sym 1 =M 1 | State=2 )=___…P( Sym 1 =M 1 | State=N )=___ P( Sym 2 =1 | State=1 )=___P( Sym 2 =1 | State=2 )=___…P( Sym 2 =1 | State=N )=___ P( Sym 2 =2 | State=1 )=___P( Sym 2 =2 | State=2 )=___…P( Sym 2 =2 | State=N )=___ ::::::: P( Sym 2 =M 2 | State=1 )=___P( Sym 2 =M 2 | State=2 )=___…P( Sym 2 =M 2 | State=N )=___ ::: P( Sym K =1 | State=1 )=___P( Sym K =1 | State=2 )=___…P( Sym K =1 | State=N )=___ P( Sym K =2 | State=1 )=___P( Sym K =2 | State=2 )=___…P( Sym K =2 | State=N )=___ ::::::: P( Sym K =M K | State=1 )=___P( Sym K =M 1 | State=2 )=___…P( Sym K =M 1 | State=N )=___ Assume: True state has N possible values: 1, 2, 3.. N There are K symptoms called Symptom 1, Symptom 2, … Symptom K Symptom i has M i possible values: 1, 2,.. M i

56 Building a naïve Bayesian Classifier P(State=1)=___P(State=2)=___…P(State=N)=___ P( Sym 1 =1 | State=1 )=___P( Sym 1 =1 | State=2 )=___…P( Sym 1 =1 | State=N )=___ P( Sym 1 =2 | State=1 )=___P( Sym 1 =2 | State=2 )=___…P( Sym 1 =2 | State=N )=___ ::::::: P( Sym 1 =M 1 | State=1 )=___P( Sym 1 =M 1 | State=2 )=___…P( Sym 1 =M 1 | State=N )=___ P( Sym 2 =1 | State=1 )=___P( Sym 2 =1 | State=2 )=___…P( Sym 2 =1 | State=N )=___ P( Sym 2 =2 | State=1 )=___P( Sym 2 =2 | State=2 )=___…P( Sym 2 =2 | State=N )=___ ::::::: P( Sym 2 =M 2 | State=1 )=___P( Sym 2 =M 2 | State=2 )=___…P( Sym 2 =M 2 | State=N )=___ ::: P( Sym K =1 | State=1 )=___P( Sym K =1 | State=2 )=___…P( Sym K =1 | State=N )=___ P( Sym K =2 | State=1 )=___P( Sym K =2 | State=2 )=___…P( Sym K =2 | State=N )=___ ::::::: P( Sym K =M K | State=1 )=___P( Sym K =M 1 | State=2 )=___…P( Sym K =M 1 | State=N )=___ Assume: True state has N values: 1, 2, 3.. N There are K symptoms called Symptom 1, Symptom 2, … Symptom K Symptom i has M i values: 1, 2,.. M i Example: P( Anemic | Liver Cancer) = 0.21

57 P(State=1)=___P(State=2)=___…P(State=N)=___ P( Sym 1 =1 | State=1 )=___P( Sym 1 =1 | State=2 )=___…P( Sym 1 =1 | State=N )=___ P( Sym 1 =2 | State=1 )=___P( Sym 1 =2 | State=2 )=___…P( Sym 1 =2 | State=N )=___ ::::::: P( Sym 1 =M 1 | State=1 )=___P( Sym 1 =M 1 | State=2 )=___…P( Sym 1 =M 1 | State=N )=___ P( Sym 2 =1 | State=1 )=___P( Sym 2 =1 | State=2 )=___…P( Sym 2 =1 | State=N )=___ P( Sym 2 =2 | State=1 )=___P( Sym 2 =2 | State=2 )=___…P( Sym 2 =2 | State=N )=___ ::::::: P( Sym 2 =M 2 | State=1 )=___P( Sym 2 =M 2 | State=2 )=___…P( Sym 2 =M 2 | State=N )=___ ::: P( Sym K =1 | State=1 )=___P( Sym K =1 | State=2 )=___…P( Sym K =1 | State=N )=___ P( Sym K =2 | State=1 )=___P( Sym K =2 | State=2 )=___…P( Sym K =2 | State=N )=___ ::::::: P( Sym K =M K | State=1 )=___P( Sym K =M 1 | State=2 )=___…P( Sym K =M 1 | State=N )=___

58 P(State=1)=___P(State=2)=___…P(State=N)=___ P( Sym 1 =1 | State=1 )=___P( Sym 1 =1 | State=2 )=___…P( Sym 1 =1 | State=N )=___ P( Sym 1 =2 | State=1 )=___P( Sym 1 =2 | State=2 )=___…P( Sym 1 =2 | State=N )=___ ::::::: P( Sym 1 =M 1 | State=1 )=___P( Sym 1 =M 1 | State=2 )=___…P( Sym 1 =M 1 | State=N )=___ P( Sym 2 =1 | State=1 )=___P( Sym 2 =1 | State=2 )=___…P( Sym 2 =1 | State=N )=___ P( Sym 2 =2 | State=1 )=___P( Sym 2 =2 | State=2 )=___…P( Sym 2 =2 | State=N )=___ ::::::: P( Sym 2 =M 2 | State=1 )=___P( Sym 2 =M 2 | State=2 )=___…P( Sym 2 =M 2 | State=N )=___ ::: P( Sym K =1 | State=1 )=___P( Sym K =1 | State=2 )=___…P( Sym K =1 | State=N )=___ P( Sym K =2 | State=1 )=___P( Sym K =2 | State=2 )=___…P( Sym K =2 | State=N )=___ ::::::: P( Sym K =M K | State=1 )=___P( Sym K =M 1 | State=2 )=___…P( Sym K =M 1 | State=N )=___ Coming Soon: How this is used in Practical Biosurveillance Also coming soon: Bringing time and space into this kind of reasoning. And how to not be naïve.

59 Conclusion You will hear lots of “Bayesian” this and “conditional probability” that this week. It’s simple: don’t let wooly academic types trick you into thinking it is fancy. You should know: What are: Bayesian Reasoning, Conditional Probabilities, Priors, Posteriors. Appreciate how conditional probabilities are manipulated. Why the Naïve Bayes Assumption is Good. Why the Naïve Bayes Assumption is Evil.

60 Text mining Motivation: an enormous (and growing!) supply of rich data Most of the available text data is unstructured… Some of it is semi-structured: Header entries (title, authors’ names, section titles, keyword lists, etc.) Running text bodies (main body, abstract, summary, etc.) Natural Language Processing (NLP) Text Information Retrieval

61 Text processing Natural Language Processing: Automated understanding of text is a very very very challenging Artificial Intelligence problem Aims on extracting semantic contents of the processed documents Involves extensive research into semantics, grammar, automated reasoning, … Several factors making it tough for a computer include: Polysemy (the same word having several different meanings) Synonymy (several different ways to describe the same thing)

62 Text processing Text Information Retrieval: Search through collections of documents in order to find objects: relevant to a specific query similar to a specific document For practical reasons, the text documents are parameterized Terminology: Documents (text data units: books, articles, paragraphs, other chunks such as messages,...) Terms (specific words, word pairs, phrases)

63 Text Information Retrieval Typically, the text databases are parametrized with a document- term matrix Each row of the matrix corresponds to one of the documents Each column corresponds to a different term Shortness of breath Difficulty breathing Rash on neck Sore neck and difficulty breathing Just plain ugly

64 Text Information Retrieval Typically, the text databases are parametrized with a document- term matrix Each row of the matrix corresponds to one of the documents Each column corresponds to a different term breathdifficultyjustneckplainrashshortsoreugly Shortness of breath Difficulty breathing Rash on neck Sore neck and difficulty breathing Just plain ugly

65 Parametrization for Text Information Retrieval Depending on the particular method of parametrization the matrix entries may be: binary (telling whether a term Tj is present in the document Di or not) counts (frequencies) (total number of repetitions of a term Tj in Di) weighted frequencies (see the slide following the next)

66 Typical applications of Text IR Document indexing and classification (e.g. library systems) Search engines (e.g. the Web) Extraction of information from textual sources (e.g. profiling of personal records, consumer complaint processing)

67 Typical applications of Text IR Document indexing and classification (e.g. library systems) Search engines (e.g. the Web) Extraction of information from textual sources (e.g. profiling of personal records, consumer complaint processing)

68 Building a naïve Bayesian Classifier P(State=1)=___P(State=2)=___…P(State=N)=___ P( Sym 1 =1 | State=1 )=___P( Sym 1 =1 | State=2 )=___…P( Sym 1 =1 | State=N )=___ P( Sym 1 =2 | State=1 )=___P( Sym 1 =2 | State=2 )=___…P( Sym 1 =2 | State=N )=___ ::::::: P( Sym 1 =M 1 | State=1 )=___P( Sym 1 =M 1 | State=2 )=___…P( Sym 1 =M 1 | State=N )=___ P( Sym 2 =1 | State=1 )=___P( Sym 2 =1 | State=2 )=___…P( Sym 2 =1 | State=N )=___ P( Sym 2 =2 | State=1 )=___P( Sym 2 =2 | State=2 )=___…P( Sym 2 =2 | State=N )=___ ::::::: P( Sym 2 =M 2 | State=1 )=___P( Sym 2 =M 2 | State=2 )=___…P( Sym 2 =M 2 | State=N )=___ ::: P( Sym K =1 | State=1 )=___P( Sym K =1 | State=2 )=___…P( Sym K =1 | State=N )=___ P( Sym K =2 | State=1 )=___P( Sym K =2 | State=2 )=___…P( Sym K =2 | State=N )=___ ::::::: P( Sym K =M K | State=1 )=___P( Sym K =M 1 | State=2 )=___…P( Sym K =M 1 | State=N )=___ Assume: True state has N values: 1, 2, 3.. N There are K symptoms called Symptom 1, Symptom 2, … Symptom K Symptom i has M i values: 1, 2,.. M i

69 Building a naïve Bayesian Classifier P(State=1)=___P(State=2)=___…P(State=N)=___ P( Sym 1 =1 | State=1 )=___P( Sym 1 =1 | State=2 )=___…P( Sym 1 =1 | State=N )=___ P( Sym 1 =2 | State=1 )=___P( Sym 1 =2 | State=2 )=___…P( Sym 1 =2 | State=N )=___ ::::::: P( Sym 1 =M 1 | State=1 )=___P( Sym 1 =M 1 | State=2 )=___…P( Sym 1 =M 1 | State=N )=___ P( Sym 2 =1 | State=1 )=___P( Sym 2 =1 | State=2 )=___…P( Sym 2 =1 | State=N )=___ P( Sym 2 =2 | State=1 )=___P( Sym 2 =2 | State=2 )=___…P( Sym 2 =2 | State=N )=___ ::::::: P( Sym 2 =M 2 | State=1 )=___P( Sym 2 =M 2 | State=2 )=___…P( Sym 2 =M 2 | State=N )=___ ::: P( Sym K =1 | State=1 )=___P( Sym K =1 | State=2 )=___…P( Sym K =1 | State=N )=___ P( Sym K =2 | State=1 )=___P( Sym K =2 | State=2 )=___…P( Sym K =2 | State=N )=___ ::::::: P( Sym K =M K | State=1 )=___P( Sym K =M 1 | State=2 )=___…P( Sym K =M 1 | State=N )=___ Assume: True state has N values: 1, 2, 3.. N There are K symptoms called Symptom 1, Symptom 2, … Symptom K Symptom i has M i values: 1, 2,.. M i prodrome GI, Respiratory, Constitutional …

70 Building a naïve Bayesian Classifier P(State=1)=___P(State=2)=___…P(State=N)=___ P( Sym 1 =1 | State=1 )=___P( Sym 1 =1 | State=2 )=___…P( Sym 1 =1 | State=N )=___ P( Sym 1 =2 | State=1 )=___P( Sym 1 =2 | State=2 )=___…P( Sym 1 =2 | State=N )=___ ::::::: P( Sym 1 =M 1 | State=1 )=___P( Sym 1 =M 1 | State=2 )=___…P( Sym 1 =M 1 | State=N )=___ P( Sym 2 =1 | State=1 )=___P( Sym 2 =1 | State=2 )=___…P( Sym 2 =1 | State=N )=___ P( Sym 2 =2 | State=1 )=___P( Sym 2 =2 | State=2 )=___…P( Sym 2 =2 | State=N )=___ ::::::: P( Sym 2 =M 2 | State=1 )=___P( Sym 2 =M 2 | State=2 )=___…P( Sym 2 =M 2 | State=N )=___ ::: P( Sym K =1 | State=1 )=___P( Sym K =1 | State=2 )=___…P( Sym K =1 | State=N )=___ P( Sym K =2 | State=1 )=___P( Sym K =2 | State=2 )=___…P( Sym K =2 | State=N )=___ ::::::: P( Sym K =M K | State=1 )=___P( Sym K =M 1 | State=2 )=___…P( Sym K =M 1 | State=N )=___ Assume: True state has N values: 1, 2, 3.. N There are K symptoms called Symptom 1, Symptom 2, … Symptom K Symptom i has M i values: 1, 2,.. M i wordsword 1 word 2 word K prodrome GI, Respiratory, Constitutional …

71 Building a naïve Bayesian Classifier P(State=1)=___P(State=2)=___…P(State=N)=___ P( Sym 1 =1 | State=1 )=___P( Sym 1 =1 | State=2 )=___…P( Sym 1 =1 | State=N )=___ P( Sym 1 =2 | State=1 )=___P( Sym 1 =2 | State=2 )=___…P( Sym 1 =2 | State=N )=___ ::::::: P( Sym 1 =M 1 | State=1 )=___P( Sym 1 =M 1 | State=2 )=___…P( Sym 1 =M 1 | State=N )=___ P( Sym 2 =1 | State=1 )=___P( Sym 2 =1 | State=2 )=___…P( Sym 2 =1 | State=N )=___ P( Sym 2 =2 | State=1 )=___P( Sym 2 =2 | State=2 )=___…P( Sym 2 =2 | State=N )=___ ::::::: P( Sym 2 =M 2 | State=1 )=___P( Sym 2 =M 2 | State=2 )=___…P( Sym 2 =M 2 | State=N )=___ ::: P( Sym K =1 | State=1 )=___P( Sym K =1 | State=2 )=___…P( Sym K =1 | State=N )=___ P( Sym K =2 | State=1 )=___P( Sym K =2 | State=2 )=___…P( Sym K =2 | State=N )=___ ::::::: P( Sym K =M K | State=1 )=___P( Sym K =M 1 | State=2 )=___…P( Sym K =M 1 | State=N )=___ Assume: True state has N values: 1, 2, 3.. N There are K symptoms called Symptom 1, Symptom 2, … Symptom K Symptom i has M i values: 1, 2,.. M i wordsword 1 word 2 word K prodrome GI, Respiratory, Constitutional … word i is either present or absent i

72 Building a naïve Bayesian Classifier Assume: True state has N values: 1, 2, 3.. N There are K symptoms called Symptom 1, Symptom 2, … Symptom K Symptom i has M i values: 1, 2,.. M i wordsword 1 word 2 word K prodrome GI, Respiratory, Constitutional … word i is either present or absent i P(Prod'm=GI)=___P(Prod'm=respir)=___…P(Prod'm=const)=___ P( angry | Prod'm=GI )=___P( angry | Prod'm=respir )=___…P( angry | Prod'm=const )=___ P( ~angry | Prod'm=GI )=___P(~angry | Prod'm=respir )=___…P(~angry | Prod'm=const )=___ P( blood | Prod'm=GI )=___P( blood | Prod'm=respir )=___…P( blood | Prod'm=const )=___ P( ~blood | Prod'm=GI )=___P( ~blood | Prod'm=respir)=___…P( ~blood | Prod'm=const )=___ ::: P( vomit | Prod'm=GI )=___P( vomit | Prod'm=respir )=___…P( vomit | Prod'm=const )=___ P( ~vomit | Prod'm=GI )=___P( ~vomit |Prod'm=respir )=___…P( ~vomit | Prod'm=const )=___

73 Building a naïve Bayesian Classifier Assume: True state has N values: 1, 2, 3.. N There are K symptoms called Symptom 1, Symptom 2, … Symptom K Symptom i has M i values: 1, 2,.. M i wordsword 1 word 2 word K prodrome GI, Respiratory, Constitutional … word i is either present or absent i P(Prod'm=GI)=___P(Prod'm=respir)=___…P(Prod'm=const)=___ P( angry | Prod'm=GI )=___P( angry | Prod'm=respir )=___…P( angry | Prod'm=const )=___ P( ~angry | Prod'm=GI )=___P(~angry | Prod'm=respir )=___…P(~angry | Prod'm=const )=___ P( blood | Prod'm=GI )=___P( blood | Prod'm=respir )=___…P( blood | Prod'm=const )=___ P( ~blood | Prod'm=GI )=___P( ~blood | Prod'm=respir)=___…P( ~blood | Prod'm=const )=___ ::: P( vomit | Prod'm=GI )=___P( vomit | Prod'm=respir )=___…P( vomit | Prod'm=const )=___ P( ~vomit | Prod'm=GI )=___P( ~vomit |Prod'm=respir )=___…P( ~vomit | Prod'm=const )=___ Example: Prob( Chief Complaint contains “Blood” | Prodrome = Respiratory ) = 0.003

74 Building a naïve Bayesian Classifier Assume: True state has N values: 1, 2, 3.. N There are K symptoms called Symptom 1, Symptom 2, … Symptom K Symptom i has M i values: 1, 2,.. M i wordsword 1 word 2 word K prodrome GI, Respiratory, Constitutional … word i is either present or absent i P(Prod'm=GI)=___P(Prod'm=respir)=___…P(Prod'm=const)=___ P( angry | Prod'm=GI )=___P( angry | Prod'm=respir )=___…P( angry | Prod'm=const )=___ P( ~angry | Prod'm=GI )=___P(~angry | Prod'm=respir )=___…P(~angry | Prod'm=const )=___ P( blood | Prod'm=GI )=___P( blood | Prod'm=respir )=___…P( blood | Prod'm=const )=___ P( ~blood | Prod'm=GI )=___P( ~blood | Prod'm=respir)=___…P( ~blood | Prod'm=const )=___ ::: P( vomit | Prod'm=GI )=___P( vomit | Prod'm=respir )=___…P( vomit | Prod'm=const )=___ P( ~vomit | Prod'm=GI )=___P( ~vomit |Prod'm=respir )=___…P( ~vomit | Prod'm=const )=___ Example: Prob( Chief Complaint contains “Blood” | Prodrome = Respiratory ) = Q: Where do these numbers come from?

75 Building a naïve Bayesian Classifier Assume: True state has N values: 1, 2, 3.. N There are K symptoms called Symptom 1, Symptom 2, … Symptom K Symptom i has M i values: 1, 2,.. M i wordsword 1 word 2 word K prodrome GI, Respiratory, Constitutional … word i is either present or absent i P(Prod'm=GI)=___P(Prod'm=respir)=___…P(Prod'm=const)=___ P( angry | Prod'm=GI )=___P( angry | Prod'm=respir )=___…P( angry | Prod'm=const )=___ P( ~angry | Prod'm=GI )=___P(~angry | Prod'm=respir )=___…P(~angry | Prod'm=const )=___ P( blood | Prod'm=GI )=___P( blood | Prod'm=respir )=___…P( blood | Prod'm=const )=___ P( ~blood | Prod'm=GI )=___P( ~blood | Prod'm=respir)=___…P( ~blood | Prod'm=const )=___ ::: P( vomit | Prod'm=GI )=___P( vomit | Prod'm=respir )=___…P( vomit | Prod'm=const )=___ P( ~vomit | Prod'm=GI )=___P( ~vomit |Prod'm=respir )=___…P( ~vomit | Prod'm=const )=___ Example: Prob( Chief Complaint contains “Blood” | Prodrome = Respiratory ) = Q: Where do these numbers come from? A: Learn them from expert-labeled data

76 Learning a Bayesian Classifier breathdifficultyjustneckplainrashshortsoreugly Shortness of breath Difficulty breathing Rash on neck Sore neck and difficulty breathing Just plain ugly Before deployment of classifier,

77 Learning a Bayesian Classifier EXPERT SAYS breathdifficultyjustneckplainrashshortsoreugly Shortness of breath Resp Difficulty breathing Resp Rash on neck Rash Sore neck and difficulty breathing Resp Just plain ugly Other Before deployment of classifier, get labeled training data

78 Learning a Bayesian Classifier EXPERT SAYSbreathdifficultyjustneckplainrashshortsoreugly Shortness of breath Resp Difficulty breathing Resp Rash on neck Rash Sore neck and difficulty breathing Resp Just plain ugly Other Before deployment of classifier, get labeled training data 2.Learn parameters (conditionals, and priors) P(Prod'm=GI)=___P(Prod'm=respir)=___…P(Prod'm=const)=___ P( angry | Prod'm=GI )=___P( angry | Prod'm=respir )=___…P( angry | Prod'm=const )=___ P( ~angry | Prod'm=GI )=___P(~angry | Prod'm=respir )=___…P(~angry | Prod'm=const )=___ P( blood | Prod'm=GI )=___P( blood | Prod'm=respir )=___…P( blood | Prod'm=const )=___ P( ~blood | Prod'm=GI )=___P( ~blood | Prod'm=respir)=___…P( ~blood | Prod'm=const )=___ ::: P( vomit | Prod'm=GI )=___P( vomit | Prod'm=respir )=___…P( vomit | Prod'm=const )=___ P( ~vomit | Prod'm=GI )=___P( ~vomit |Prod'm=respir )=___…P( ~vomit | Prod'm=const )=___

79 Learning a Bayesian Classifier EXPERT SAYSbreathdifficultyjustneckplainrashshortsoreugly Shortness of breath Resp Difficulty breathing Resp Rash on neck Rash Sore neck and difficulty breathing Resp Just plain ugly Other Before deployment of classifier, get labeled training data 2.Learn parameters (conditionals, and priors) P(Prod'm=GI)=___P(Prod'm=respir)=___…P(Prod'm=const)=___ P( angry | Prod'm=GI )=___P( angry | Prod'm=respir )=___…P( angry | Prod'm=const )=___ P( ~angry | Prod'm=GI )=___P(~angry | Prod'm=respir )=___…P(~angry | Prod'm=const )=___ P( blood | Prod'm=GI )=___P( blood | Prod'm=respir )=___…P( blood | Prod'm=const )=___ P( ~blood | Prod'm=GI )=___P( ~blood | Prod'm=respir)=___…P( ~blood | Prod'm=const )=___ ::: P( vomit | Prod'm=GI )=___P( vomit | Prod'm=respir )=___…P( vomit | Prod'm=const )=___ P( ~vomit | Prod'm=GI )=___P( ~vomit |Prod'm=respir )=___…P( ~vomit | Prod'm=const )=___

80 Learning a Bayesian Classifier EXPERT SAYSbreathdifficultyjustneckplainrashshortsoreugly Shortness of breath Resp Difficulty breathing Resp Rash on neck Rash Sore neck and difficulty breathing Resp Just plain ugly Other Before deployment of classifier, get labeled training data 2.Learn parameters (conditionals, and priors) P(Prod'm=GI)=___P(Prod'm=respir)=___…P(Prod'm=const)=___ P( angry | Prod'm=GI )=___P( angry | Prod'm=respir )=___…P( angry | Prod'm=const )=___ P( ~angry | Prod'm=GI )=___P(~angry | Prod'm=respir )=___…P(~angry | Prod'm=const )=___ P( blood | Prod'm=GI )=___P( blood | Prod'm=respir )=___…P( blood | Prod'm=const )=___ P( ~blood | Prod'm=GI )=___P( ~blood | Prod'm=respir)=___…P( ~blood | Prod'm=const )=___ ::: P( vomit | Prod'm=GI )=___P( vomit | Prod'm=respir )=___…P( vomit | Prod'm=const )=___ P( ~vomit | Prod'm=GI )=___P( ~vomit |Prod'm=respir )=___…P( ~vomit | Prod'm=const )=___

81 Learning a Bayesian Classifier EXPERT SAYSbreathdifficultyjustneckplainrashshortsoreugly Shortness of breath Resp Difficulty breathing Resp Rash on neck Rash Sore neck and difficulty breathing Resp Just plain ugly Other Before deployment of classifier, get labeled training data 2.Learn parameters (conditionals, and priors) P(Prod'm=GI)=___P(Prod'm=respir)=___…P(Prod'm=const)=___ P( angry | Prod'm=GI )=___P( angry | Prod'm=respir )=___…P( angry | Prod'm=const )=___ P( ~angry | Prod'm=GI )=___P(~angry | Prod'm=respir )=___…P(~angry | Prod'm=const )=___ P( blood | Prod'm=GI )=___P( blood | Prod'm=respir )=___…P( blood | Prod'm=const )=___ P( ~blood | Prod'm=GI )=___P( ~blood | Prod'm=respir)=___…P( ~blood | Prod'm=const )=___ ::: P( vomit | Prod'm=GI )=___P( vomit | Prod'm=respir )=___…P( vomit | Prod'm=const )=___ P( ~vomit | Prod'm=GI )=___P( ~vomit |Prod'm=respir )=___…P( ~vomit | Prod'm=const )=___

82 Learning a Bayesian Classifier EXPERT SAYSbreathdifficultyjustneckplainrashshortsoreugly Shortness of breath Resp Difficulty breathing Resp Rash on neck Rash Sore neck and difficulty breathing Resp Just plain ugly Other Before deployment of classifier, get labeled training data 2.Learn parameters (conditionals, and priors) 3.During deployment, apply classifier P(Prod'm=GI)=___P(Prod'm=respir)=___…P(Prod'm=const)=___ P( angry | Prod'm=GI )=___P( angry | Prod'm=respir )=___…P( angry | Prod'm=const )=___ P( ~angry | Prod'm=GI )=___P(~angry | Prod'm=respir )=___…P(~angry | Prod'm=const )=___ P( blood | Prod'm=GI )=___P( blood | Prod'm=respir )=___…P( blood | Prod'm=const )=___ P( ~blood | Prod'm=GI )=___P( ~blood | Prod'm=respir)=___…P( ~blood | Prod'm=const )=___ ::: P( vomit | Prod'm=GI )=___P( vomit | Prod'm=respir )=___…P( vomit | Prod'm=const )=___ P( ~vomit | Prod'm=GI )=___P( ~vomit |Prod'm=respir )=___…P( ~vomit | Prod'm=const )=___

83 Learning a Bayesian Classifier EXPERT SAYSbreathdifficultyjustneckplainrashshortsoreugly Shortness of breath Resp Difficulty breathing Resp Rash on neck Rash Sore neck and difficulty breathing Resp Just plain ugly Other Before deployment of classifier, get labeled training data 2.Learn parameters (conditionals, and priors) 3.During deployment, apply classifier New Chief Complaint: “Just sore breath” P(Prod'm=GI)=___P(Prod'm=respir)=___…P(Prod'm=const)=___ P( angry | Prod'm=GI )=___P( angry | Prod'm=respir )=___…P( angry | Prod'm=const )=___ P( ~angry | Prod'm=GI )=___P(~angry | Prod'm=respir )=___…P(~angry | Prod'm=const )=___ P( blood | Prod'm=GI )=___P( blood | Prod'm=respir )=___…P( blood | Prod'm=const )=___ P( ~blood | Prod'm=GI )=___P( ~blood | Prod'm=respir)=___…P( ~blood | Prod'm=const )=___ ::: P( vomit | Prod'm=GI )=___P( vomit | Prod'm=respir )=___…P( vomit | Prod'm=const )=___ P( ~vomit | Prod'm=GI )=___P( ~vomit |Prod'm=respir )=___…P( ~vomit | Prod'm=const )=___

84 Learning a Bayesian Classifier EXPERT SAYSbreathdifficultyjustneckplainrashshortsoreugly Shortness of breath Resp Difficulty breathing Resp Rash on neck Rash Sore neck and difficulty breathing Resp Just plain ugly Other Before deployment of classifier, get labeled training data 2.Learn parameters (conditionals, and priors) 3.During deployment, apply classifier New Chief Complaint: “Just sore breath” P(Prod'm=GI)=___P(Prod'm=respir)=___…P(Prod'm=const)=___ P( angry | Prod'm=GI )=___P( angry | Prod'm=respir )=___…P( angry | Prod'm=const )=___ P( ~angry | Prod'm=GI )=___P(~angry | Prod'm=respir )=___…P(~angry | Prod'm=const )=___ P( blood | Prod'm=GI )=___P( blood | Prod'm=respir )=___…P( blood | Prod'm=const )=___ P( ~blood | Prod'm=GI )=___P( ~blood | Prod'm=respir)=___…P( ~blood | Prod'm=const )=___ ::: P( vomit | Prod'm=GI )=___P( vomit | Prod'm=respir )=___…P( vomit | Prod'm=const )=___ P( ~vomit | Prod'm=GI )=___P( ~vomit |Prod'm=respir )=___…P( ~vomit | Prod'm=const )=___

85 CoCo Performance (AUC scores) Botulism 0.78 rash, 0.91 neurological 0.92 hemorrhagic, 0.93; constitutional 0.93 gastrointestinal 0.95 other, 0.96 respiratory 0.96

86 Conclusion Automated text extraction is increasingly important There is a very wide world of text extraction outside Biosurveillance The field has changed very fast, even in the past three years. Warning, although Bayes Classifiers are simplest to implement, Logistic Regression or other discriminative methods often learn more accurately. Consider using off the shelf methods, such as William Cohen’s successful “minor third” open-source libraries: Real systems (including CoCo) have many ingenious special-case improvements.

87 Discussion 1.What new data sources should we apply algorithms to? 1.EG Self-reporting? 2.What are related surveillance problems to which these kinds of algorithms can be applied? 3.Where are the gaps in the current algorithms world? 4.Are there other spatial problems out there? 5.Could new or pre-existing algorithms help in the period after an outbreak is detected? 6.Other comments about favorite tools of the trade.