Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Friday, March 10, 2000 William.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
1 Essential Probability & Statistics (Lecture for CS598CXZ Advanced Topics in Information Retrieval ) ChengXiang Zhai Department of Computer Science University.
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
Bayesian Classification
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
IMPORTANCE SAMPLING ALGORITHM FOR BAYESIAN NETWORKS
Parameter Estimation using likelihood functions Tutorial #1
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Data Mining Techniques Outline
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
Machine Learning CMPT 726 Simon Fraser University
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, March 6, 2000 William.
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Thanks to Nir Friedman, HU
Bayes Classification.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Bayesian Decision Theory Making Decisions Under uncertainty 1.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 19, 2001.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 26 of 41 Friday, 22 October.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Naive Bayes Classifier
Introduction to Bayesian statistics Yves Moreau. Overview The Cox-Jaynes axioms Bayes’ rule Probabilistic models Maximum likelihood Maximum a posteriori.
BA 201 Lecture 6 Basic Probability Concepts. Topics Basic Probability Concepts Approaches to probability Sample spaces Events and special events Using.
Computing & Information Sciences Kansas State University Lecture 30 of 42 CIS 530 / 730 Artificial Intelligence Lecture 30 of 42 William H. Hsu Department.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.
CS464 Introduction to Machine Learning1 Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 28 of 41 Friday, 22 October.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 29 October 2004 William.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Computing & Information Sciences Kansas State University Wednesday, 25 Oct 2006CIS 490 / 730: Artificial Intelligence Lecture 26 of 42 Wednesday. 25 October.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Chapter 6 Bayesian Learning
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Wednesday, 21 February 2007.
Computing & Information Sciences Kansas State University Friday, 27 Oct 2006CIS 490 / 730: Artificial Intelligence Lecture 27 of 42 Friday, 27 October.
Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Tuesday, September 28, 1999.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 24 of 41 Monday, 18 October.
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Wednesday, 28 February 2007.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Monday, 01 February 2016 William.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Bayesian Learning. Probability Bayes Rule Choosing Hypotheses- Maximum a Posteriori Maximum Likelihood - Bayes Concept Learning Maximum Likelihood of.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Computing & Information Sciences Kansas State University Wednesday, 25 Oct 2006CIS 490 / 730: Artificial Intelligence Lecture 26 of 42 Wednesday. 25 October.
Bayesian Learning. Uncertainty & Probability Baye's rule Choosing Hypotheses- Maximum a posteriori Maximum Likelihood - Baye's concept learning Maximum.
Essential Probability & Statistics
Lecture 1.31 Criteria for optimal reception of radio signals.
Naive Bayes Classifier
Data Mining Lecture 11.
LECTURE 23: INFORMATION THEORY REVIEW
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Naive Bayes Classifier
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Friday, March 10, 2000 William H. Hsu Department of Computing and Information Sciences, KSU Readings: “Learning Bayesian Network Structure from Massive Datasets”, Friedman, Nachman, and Pe’er (Reference) Section 6.11, Mitchell Uncertain Reasoning Discussion (2 of 4): Learning Bayesian Network Structure Lecture 21

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Lecture Outline Suggested Reading: Section 6.11, Mitchell Overview of Bayesian Learning (Continued) Bayes’s Theorem (Continued) –Definition of conditional (posterior) probability –Ramifications of Bayes’s Theorem Answering probabilistic queries MAP hypotheses Generating Maximum A Posteriori (MAP) Hypotheses Generating Maximum Likelihood Hypotheses Later –Applications of probability in KDD Learning over text Learning over hypermedia documents General HCII (Yuhui Liu: March 13, 2000) –Causality (Yue Jiao: March 17, 2000)

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Probability: Basic Definitions and Axioms Sample Space (  ): Range of a Random Variable X Probability Measure Pr(  ) –  denotes a range of outcomes; X:  –Probability P: measure over 2  (power set of sample space, aka event space) –In a general sense, Pr(X = x   ) is a measure of belief in X = x P(X = x) = 0 or P(X = x) = 1: plain (aka categorical) beliefs (can’t be revised) All other beliefs are subject to revision Kolmogorov Axioms –1.  x  . 0  P(X = x)  1 –2. P(  )   x   P(X = x) = 1 –3. Joint Probability: P(X 1  X 2 )  Probability of the Joint Event X 1  X 2 Independence: P(X 1  X 2 ) = P(X 1 )  P(X 2 )

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Choosing Hypotheses Bayes’s Theorem MAP Hypothesis –Generally want most probable hypothesis given the training data –Define:  the value of x in the sample space  with the highest f(x) –Maximum a posteriori hypothesis, h MAP ML Hypothesis –Assume that p(h i ) = p(h j ) for all pairs i, j (uniform priors, i.e., P H ~ Uniform) –Can further simplify and choose the maximum likelihood hypothesis, h ML

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Bayes’s Theorem: Query Answering (QA) Answering User Queries –Suppose we want to perform intelligent inferences over a database DB Scenario 1: DB contains records (instances), some “labeled” with answers Scenario 2: DB contains probabilities (annotations) over propositions –QA: an application of probabilistic inference QA Using Prior and Conditional Probabilities: Example –Query: Does patient have cancer or not? –Suppose: patient takes a lab test and result comes back positive Correct + result in only 98% of the cases in which disease is actually present Correct - result in only 97% of the cases in which disease is not present Only of the entire population has this cancer –   P(false negative for H 0  Cancer) = 0.02 (NB: for 1-point sample) –   P(false positive for H 0  Cancer) = 0.03 (NB: for 1-point sample) –P(+ | H 0 ) P(H 0 ) = , P(+ | H A ) P(H A ) =  h MAP = H A   Cancer

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Basic Formulas for Probabilities Product Rule (Alternative Statement of Bayes’s Theorem) –Proof: requires axiomatic set theory, as does Bayes’s Theorem Sum Rule –Sketch of proof (immediate from axiomatic set theory) Draw a Venn diagram of two sets denoting events A and B Let A  B denote the event corresponding to A  B… Theorem of Total Probability –Suppose events A 1, A 2, …, A n are mutually exclusive and exhaustive Mutually exclusive: i  j  A i  A j =  Exhaustive:  P(A i ) = 1 –Then –Proof: follows from product rule and 3 rd Kolmogorov axiom A B

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence MAP and ML Hypotheses: A Pattern Recognition Framework Pattern Recognition Framework –Automated speech recognition (ASR), automated image recognition –Diagnosis Forward Problem: One Step in ML Estimation –Given: model h, observations (data) D –Estimate: P(D | h), the “probability that the model generated the data” Backward Problem: Pattern Recognition / Prediction Step –Given: model h, observations D –Maximize: P(h(X) = x | h, D) for a new X (i.e., find best x) Forward-Backward (Learning) Problem –Given: model space H, data D –Find: h  H such that P(h | D) is maximized (i.e., MAP hypothesis) More Info – HiddenMarkovModels.htmlhttp:// HiddenMarkovModels.html –Emphasis on a particular H (the space of hidden Markov models)

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Bayesian Learning Example: Unbiased Coin [1] Coin Flip –Sample space:  = {Head, Tail} –Scenario: given coin is either fair or has a 60% bias in favor of Head h 1  fair coin: P(Head) = 0.5 h 2  60% bias towards Head: P(Head) = 0.6 –Objective: to decide between default (null) and alternative hypotheses A Priori (aka Prior) Distribution on H –P(h 1 ) = 0.75, P(h 2 ) = 0.25 –Reflects learning agent’s prior beliefs regarding H –Learning is revision of agent’s beliefs Collection of Evidence –First piece of evidence: d  a single coin toss, comes up Head –Q: What does the agent believe now? –A: Compute P(d) = P(d | h 1 ) P(h 1 ) + P(d | h 2 ) P(h 2 )

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Bayesian Learning Example: Unbiased Coin [2] Bayesian Inference: Compute P(d) = P(d | h 1 ) P(h 1 ) + P(d | h 2 ) P(h 2 ) –P(Head) = = = –This is the probability of the observation d = Head Bayesian Learning –Now apply Bayes’s Theorem P(h 1 | d) = P(d | h 1 ) P(h 1 ) / P(d) = / = P(h 2 | d) = P(d | h 2 ) P(h 2 ) / P(d) = 0.15 / = Belief has been revised downwards for h 1, upwards for h 2 The agent still thinks that the fair coin is the more likely hypothesis –Suppose we were to use the ML approach (i.e., assume equal priors) Belief is revised upwards from 0.5 for h 1 Data then supports the bias coin better More Evidence: Sequence D of 100 coins with 70 heads and 30 tails –P(D) = (0.5) 50 (0.5) (0.6) 70 (0.4) –Now P(h 1 | d) << P(h 2 | d)

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Bayesian Concept Learning and Version Spaces Assumptions –Fixed set of instances –Let D denote the set of classifications: D = Choose P(D | h) –P(D | h) = 1 if h consistent with D (i.e.,  x i. h(x i ) = c(x i )) –P(D | h) = 0 otherwise Choose P(h) ~ Uniform –Uniform distribution: –Uniform priors correspond to “no background knowledge” about h –Recall: maximum entropy MAP Hypothesis

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Start with Uniform Priors –Equal probabilities assigned to each hypothesis –Maximum uncertainty (entropy), minimum prior information Evidential Inference –Introduce data (evidence) D 1 : belief revision occurs Learning agent revises conditional probability of inconsistent hypotheses to 0 Posterior probabilities for remaining h  VS H,D revised upward –Add more data (evidence) D 2 : further belief revision Evolution of Posterior Probabilities P(h)P(h) Hypotheses P(h|D1)P(h|D1) P(h|D 1, D 2 ) Hypotheses

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Most Probable Classification of New Instances MAP and MLE: Limitations –Problem so far: “find the most likely hypothesis given the data” –Sometimes we just want the best classification of a new instance x, given D A Solution Method –Find best (MAP) h, use it to classify –This may not be optimal, though! –Analogy Estimating a distribution using the mode versus the integral One finds the maximum, the other the area Refined Objective –Want to determine the most probable classification –Need to combine the prediction of all hypotheses –Predictions must be weighted by their conditional probabilities –Result: Bayes Optimal Classifier (see CIS 798 Lecture 10)

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Midterm Review: Topics Covered Review: Inductive Learning Framework –Search in hypothesis space H –Inductive bias: preference for some hypotheses over others –Search in space of hypothesis languages: bias optimization Analytical Learning –Learning architecture components: hypothesis languages, domain theory –Learning algorithms: EBL, hybrid (analytical and inductive) learning Artificial Neural Networks (ANN) –Architectures (hypothesis languages): MLP, Boltzmann machine, GLIM hierarchy –Algorithms: backpropagation (gradient), MDL, EM –Tradeoffs and improvements: momentum, wake-sleep, modularity / HME Bayesian Networks –Learning architecture: BBN (graphical model of probability) –Learning algorithms: CPT (e.g., gradient); structure (polytree, K2) –Tradeoffs and improvements: polytrees vs. multiply-connected BBNs, etc.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Midterm Review: Applications and Concepts Methods for Multistrategy (Integrated Inductive and Analytical) Learning –Analytical learning to drive inductive learning: EBNN, Phantom Induction, advice- taking agents –Interleaved analytical and inductive learning: Chown and Dietterich Artificial Neural Networks in KDD –Tradeoffs and improvements Reinforcement learning models: temporal differences, ANN methods Wake-sleep Modularity (mixture models and hierarchical mixtures of experts) Combining classifiers –Applications to KDD: learning for pattern (e.g., image) recognition, planning Bayesian Networks in KDD –Advantages of probability, causal networks (BBNs) –Applications to KDD: learning to reason

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Terminology Introduction to Bayesian Learning –Probability foundations –Definitions: subjectivist, frequentist, logicist, objectivist –(3) Kolmogorov axioms Bayes’s Theorem –Prior probability of an event –Joint probability of an event –Conditional (posterior) probability of an event Maximum A Posteriori (MAP) and Maximum Likelihood (ML) Hypotheses –MAP hypothesis: highest conditional probability given observations (data) –ML: highest likelihood of generating the observed data –ML estimation (MLE): estimating parameters to find ML hypothesis Bayesian Inference: Computing Conditional Probabilities (CPs) in A Model Bayesian Learning: Searching Model (Hypothesis) Space using CPs

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Summary Points Introduction to Bayesian Learning –Framework: using probabilistic criteria to search H –Probability foundations Definitions: subjectivist, objectivist; Bayesian, frequentist, logicist Kolmogorov axioms Bayes’s Theorem –Definition of conditional (posterior) probability –Product rule Maximum A Posteriori (MAP) and Maximum Likelihood (ML) Hypotheses –Bayes’s Rule and MAP –Uniform priors: allow use of MLE to generate MAP hypotheses –Relation to version spaces, candidate elimination Next Class: Presentation on Learning Bayesian (Belief) Network Structure –For more on Bayesian learning: MDL, BOC, Gibbs, Simple (Naïve) Bayes –Soon: user modeling using BBNs, causality