Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3

Slides:



Advertisements
Similar presentations
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Advertisements

Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Reasoning Under Uncertainty: Bayesian networks intro Jim Little Uncertainty 4 November 7, 2014 Textbook §6.3, 6.3.1, 6.5, 6.5.1,
Slide 1 Reasoning Under Uncertainty: More on BNets structure and construction Jim Little Nov (Textbook 6.3)
Introduction of Probabilistic Reasoning and Bayesian Networks
Parameter Estimation using likelihood functions Tutorial #1
Cognitive Computer Vision
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Paper Discussion: “Simultaneous Localization and Environmental Mapping with a Sensor Network”, Marinakis et. al. ICRA 2011.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
. Learning Bayesian networks Slides by Nir Friedman.
Bayesian Belief Networks
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Neural Networks Marco Loog.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
CPSC 322, Lecture 28Slide 1 Reasoning Under Uncertainty: More on BNets structure and construction Computer Science cpsc322, Lecture 28 (Textbook Chpt 6.3)
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Thanks to Nir Friedman, HU
CPSC 322, Lecture 29Slide 1 Reasoning Under Uncertainty: Bnet Inference (Variable elimination) Computer Science cpsc322, Lecture 29 (Textbook Chpt 6.4)
Language Modeling Approaches for Information Retrieval Rong Jin.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Bayesian Belief Networks. What does it mean for two variables to be independent? Consider a multidimensional distribution p(x). If for two features we.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Reasoning Under Uncertainty: Bayesian networks intro CPSC 322 – Uncertainty 4 Textbook §6.3 – March 23, 2011.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
CMSC 471 Spring 2014 Class #16 Thursday, March 27, 2014 Machine Learning II Professor Marie desJardins,
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Introduction to Bayesian Networks
Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 28 of 41 Friday, 22 October.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Today’s Topics Bayesian Networks (BNs) used a lot in medical diagnosis M-estimates Searching for Good BNs Markov Blanket what is conditionally independent.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 24 of 41 Monday, 18 October.
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
Introduction on Graphic Models
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
A Cooperative Coevolutionary Genetic Algorithm for Learning Bayesian Network Structures Arthur Carvalho
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
CS479/679 Pattern Recognition Dr. George Bebis
Qian Liu CSE spring University of Pennsylvania
Cognitive Computer Vision
Ch3: Model Building through Regression
Parameter Estimation 主講人:虞台文.
Bayesian Networks: Motivation
Bayesian Learning Chapter
Parameter Learning 2 Structure Learning 1: The good
Machine Learning: Lecture 6
BN Semantics 3 – Now it’s personal! Parameter Learning 1
Chapter 14 February 26, 2004.
Learning Bayesian networks
Presentation transcript:

Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3

Lecture 13 Learning Bayesian Belief Networks Taxonomy of methods Learning BBNs for the fully observable data and known structure case

So why are BBNs relevant to Cognitive CV? Provides a well-founded methodology for reasoning with uncertainty These methods are the basis for our model of perception guided by expectation We can develop well-founded methods of learning rather than just being stuck with hand- coded models

Reminder: What is a BBN? p(a=detect) 0.2 p(b=detect) 0.1 a=b=p(o=T|A,B) TT0.95 TF0.6 FT0.5 FF0.01 o=p(n=T|O) T0.7 F0.2 o=p(c=T|O) T0.7 F0.1 Compact representation of the joint probability Each variable is represented as a node. Conditional independence assumptions are encoded using a set of arcs Different types of graph exist. The one shown is a Directed Acyclic Graph (DAG) A B O C N

Why is learning important in the context of BBNs? Knowledge acquisition can be an expensive process Experts may not be readily available (scarce knowledge) or simply not exist But you might have a lot of data from (say) case studies Learning allows us to construct BBN models from the data and in the process gain insight into the nature of the problem domain

The process of learning Model structure (if known) Data (may be full or partial) Learning process

What do we mean by “partial” data? Training data where there are missing values e.g.: a=b=o= TTF FTF F?T TF?.. FTF Discrete valued BBN with 3 nodes A B O

What do we mean by “known” and “unknown” structure? AOB Known structure A O B Unknown structure

Taxonomy of learning methods Model structure KnownUnknown Full Maximum likelihood estimationSearch through model space Partial Expectation Maximisation (EM) or gradient descent( EM + search through model space (structural EM) Observability In this lecture we will look at the full observability and known model structure case in detail In the next lecture we will take an overview of the other three cases

Full observability & known structure Getting the notation right The model parameters (CPDs) are represented as  (example later) Training data set D We want to find parameters to maximise P(  |D) Likelihood function L(  :D) is P(D|  ) LIKELIHOOD

Full observability & known structure Getting the notation right A O B Training data Dz

Factorising the likelihood expression A O B

Decomposition in general All the parameters for each node can be estimated separately

Example Estimating parameter for root node Let’s say our training data D contains these values for A {T,F,T,T,F,T,T,T} We represent our single parameter  as the probability that a=T The likelihood for the sequence is:  L(  :D) A B O

So what about the prior on  ? We have an expression for P(a[1],…,a[M]), all we need to do now is to say something about P(  ) If all values of  were equally likely at the outset, then we have a MAXIMUM LIKELIHOOD ESTIMATE (MLE) for P(  |a[1],…,a[M]) which for our example is  = 0.75 I.e. p(a=T is 0.75)

So what about the prior on  ? If P(  ) is not uniform, we need to take that into account when computing our estimate for a model parameter. In that case P(  |x[1],…,x[M]) would be a MAXIMUM APOSTERIORI PROBABILITY (MAP) estimate There are many different forms of prior, one of the more common ones in this application is the DIRICHLET prior …

The Dirichlet prior  p(  ) Dirichlet(  T,  F )

Semantic priors If the training data D is sorted into known classes, the priors can be estimate beforehand. These are called “semantic priors” This involves an element of hand coding and loses the advantage gaining some insight into the problem domain Does give the advantage of mapping into expert knowledge of the classes in the problem

Summary Estimation relies on sufficient statistics For ML estimate for discrete valued nodes, we use counts #: For MAP estimate, we have to account for the prior

Next time … Overview of methods for learning BBNs: – Full data and unknown structure – Partial data and known structure – Partial data and unknown structure Excellent tutorial at by Koller and Friedman: Some of today’s slides were adapted from that tutorial