Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 21 Learning Bayesian Networks & Extensions 27.01.2014.

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 21 Learning Bayesian Networks & Extensions 27.01.2014 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A AA A A A A A AAAAAA A AAAA A A A AA A A

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Announcements Exam  1 st Date:Monday, 17.02., 13:30 – 17:30h  2 nd Date:Monday, 17.03., 09:30 – 12:30h  Closed-book exam, the core exam time will be 2h.  Admission requirement: 50% of the exercise points or passed test exam  We will send around an announcement with the exact starting times and places by email. Test exam  Date: Thursday, 06.02., 10:15 – 11:45h, room 5056  Core exam time will be 1h  Purpose: Prepare you for the questions you can expect.  Possibility to collect bonus exercise points! 2 B. Leibe

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Announcements (2) Last lecture next Tuesday: Repetition  Summary of all topics in the lecture  “Big picture” and current research directions  Opportunity to ask questions  Please use this opportunity and prepare questions! 3 B. Leibe

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Course Outline Fundamentals (2 weeks)  Bayes Decision Theory  Probability Density Estimation Discriminative Approaches (5 weeks)  Linear Discriminant Functions  Statistical Learning Theory & SVMs  Ensemble Methods & Boosting  Decision Trees & Randomized Trees Generative Models (4 weeks)  Bayesian Networks  Markov Random Fields  Exact Inference  Applications & Extensions B. Leibe 4

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: Graph Cuts for Binary Problems 5 B. Leibe n-links s t a cut t-link EM-style optimization “expected” intensities of object and background can be re-estimated [Boykov & Jolly, ICCV’01] Slide credit: Yuri Boykov see Exercise 5.3

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: s-t-Mincut Equivalent to Maxflow 6 B. Leibe Source Sink v1v1 v2v2 2 5 9 4 2 1 Slide credit: Pushmeet Kohli Augmenting Path Based Algorithms 1.Find path from source to sink with positive capacity 2.Push maximum possible flow through this path 3.Repeat until no path can be found Algorithms assume non-negative capacity Flow = 0 see Exercise 5.2

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap: When Can s-t Graph Cuts Be Applied? s-t graph cuts can only globally minimize binary energies that are submodular. Submodularity is the discrete equivalent to convexity.  Implies that every local energy minimum is a global minimum.  Solution will be globally optimal. 7 B. Leibe t-links n-links E(L) can be minimized by s-t graph cuts Submodularity (“convexity”) [Boros & Hummer, 2002, Kolmogorov & Zabih, 2004] pairwise potentialsunary potentials

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Recap:  -Expansion Move Basic idea:  Break multi-way cut computation into a sequence of binary s-t cuts.  No longer globally optimal result, but guaranteed approximation quality and typically converges in few iterations. 8 B. Leibe other labels  Slide credit: Yuri Boykov

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Graph *g; For all pixels p /* Add a node to the graph */ nodeID(p) = g->add_node(); /* Set cost of terminal edges */ set_weights(nodeID(p), fgCost(p), bgCost(p)); end for all adjacent pixels p,q add_weights(nodeID(p), nodeID(q), cost); end g->compute_maxflow(); label_p = g->is_connected_to_source(nodeID(p)); // is the label of pixel p (0 or 1) Recap: Converting an MRF to an s-t Graph 9 B. Leibe Slide credit: Pushmeet Kohli Sink ( 1 ) Source ( 0 ) fgCost( a 1 ) fgCost( a 2 ) bgCost( a 1 ) bgCost( a 2 ) a1a1 a2a2 cost(p,q) a 1 = bg a 2 = fg see Exercise 5.4

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Topics of This Lecture Learning Bayesian Networks  Learning with known structure, full observability  Learning with known structure, partial observability  Structure learning Models for Sequential Data  Independence assumptions  Markov models Hidden Markov Models (HMMs)  Traditional view  Graphical Model view Extensions 10 B. Leibe

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Bayesian Networks What we’ve learned so far…  We know they are directed graphical models.  Their joint probability factorizes into conditional probabilities,  We know how to convert them into undirected graphs.  We know how to perform inference for them. –Sum/Max-Product BP for exact inference in (poly)tree-shaped BNs. –Loopy BP for approximate inference in arbitrary BNs. –Junction Tree algorithm for converting arbitrary BNs into trees. But what are they actually good for?  How do we apply them in practice?  And how do we learn their parameters? 11 B. Leibe Image source: C. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Parameter Learning in Bayesian Networks We need to specify two things:  Structure of Bayesian network (graph topology)  Parameters of each conditional probability table (CPT) It is possible to learn both from training data.  But learning structure is much harder than learning parameters.  Also, learning when some nodes are hidden is much harder than when everything is observable. Four cases: 12 B. Leibe StructureObservabilityMethod KnownFullMaximum Likelihood Estimation KnownPartialEM (or gradient ascent) UnknownFullSearch through model space UnknownPartialEM + search through model space

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Learning Parameters Example:  Assume each variable x i is discrete and can take K i values.  The parameters of this model can be represented with 4 tables (called conditional probability tables – CPT): – p ( x 1 = k ) = µ 1, k µ 1 has K 1 entries. – p ( x 2 = k ’ | x 1 = k ) = µ 2, k, k ’ µ 2 has K 1 £ K 2 entries. – p ( x 3 = k ’ | x 1 = k ) = µ 3, k, k ’ – p ( x 4 = k ’ | x 2 = k ) = µ 4, k, k ’ –Note that 13 B. Leibe Slide credit: Zoubin Ghahramani

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Case 1: Known Structure, Full Observability Assume a training data set:  How do we learn µ from D ? Maximum Likelihood: Maximum Log-Likelihood: 14 B. Leibe Slide credit: Zoubin Ghahramani

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Case 1: Known Structure, Full Observability Maximum Log-Likelihood:  This decomposes into a sum of functions µ i.  Each µ i can be optimized separately: where n i, k, k ’ is the number of times in D that x i = k ’ and x pa ( i ) = k. ML solution  Simply calculate frequencies! 15 B. Leibe Slide credit: Zoubin Ghahramani

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Case 2: Known Structure, Hidden Variables ML learning with hidden variables  Assume a model parameterized by µ with observed variables X and hidden (latent) variables Z. Goal  Maximize parameter log-likelihood given the observed data EM Algorithm: Iterate between two steps:  E-step: fill-in hidden / missing variables  M-step: apply complete-data learning to filled-in data. 16 B. Leibe Slide adapted from Zoubin Gharahmani Z1Z1 Z2Z2 Z3Z3 X

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Learning with Hidden Variables: EM Algorithm Goal:  Maximize parameter log-likelihood given the observed data. EM Algorithm: Derivation  We do not know the values of the latent variables in Z, but we can express their posterior distribution given X and (an initial guess for) µ.  E-step: Evaluate  Since we cannot use the complete-data log-likelihood directly, we maximize its expected value under the posterior distribution of Z.  M-step: Maximize 17 B. Leibe

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Learning with Hidden Variables: EM Algorithm Note on the E-step:  The E-step requires solving the inference problem.  I.e. finding the distribution over the hidden variables given the current model parameters.  This can be done using belief propagation or the junction tree algorithm.  As inference becomes a subroutine of the learning procedure, fast inference algorithms are crucial! 18 B. Leibe Slide adapted from Bernt Schiele

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Example Application Mixture-of-Gaussian Fitting with EM  Standard application of EM.  Corresponding Bayesian network: Important point here  Bayesian networks can be treacherous!  They hide the true complexity in a very simple-looking diagram.  E.g. the diagram here only encodes the information that we have a latent variable µ which depends on observed variables x i –The information that p ( x i | µ ) is represented by a mixture-of- Gaussians needs to be communicated additionally! –On the other hand, this general framework can also be used to apply EM for other types of distributions or latent variables. 19 B. Leibe

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Example Application Mixture-of-Gaussian Fitting with EM  Standard application of EM.  Corresponding Bayesian network: Important point here  Bayesian networks can be treacherous!  They hide the true complexity in a very simple-looking diagram.  E.g., the diagram here only encodes the information that we have latent variables z n which depend on observed variables x n and parameters µ=(¼, ¹, §) (parameter arrows are optional). –The information that p(x n | z n, µ) encodes a mixture-of-Gaussians needs to be communicated additionally! –On the other hand, this general framework can also be used to apply EM for other types of distributions or latent variables. 20 B. Leibe Image source: C.M. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Summary: Learning with Known Structure ML-Learning with complete data (no hidden variables)  Log-likelihood decomposes into sum of functions of µ i.  Each µ i can be optimized separately.  ML-solution: simply calculate frequencies. ML-Learning with incomplete data (hidden variables)  Iterative EM algorithm.  E-step: compute expected counts given previous settings µ ( t ) of parameters  E [ n i, j, k | D, µ ( t ) ].  M-step: re-estimate parameters µ using the expected counts. 21 B. Leibe Slide credit: Bernt Schiele

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Cases 3+4: Unknown Structure Goal  Learn a directed acyclic graph (DAG) that best explains the data. Constraints-based learning  Use statistical tests of marginal and conditional independence.  Find the set of DAGs whose d-separation relations match the results of conditional independence tests. Score-based learning  Use a global score such as BIC (Bayes Information Criterion).  Find a structure that maximizes this score. 22 B. Leibe Slide adapted from Zoubin Gharahmani

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Cases 3+4: Unknown Structure Extremely hard problem  NP-hard  Number of DAGs on N variables is super-exponential in N. –4 nodes: 543 DAGs –10 nodes: O(10 18 ) DAGs.  Need to use heuristics to prune down the search space and use efficient methods to evaluate hypotheses. Additional problem: often not enough data available.  Need to make decisions about statistical conditional independence.  Typically only feasible if the structure is relatively simple and a lot of data is available… 23 B. Leibe

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Example Application Analyzing gene expression from micro-array data  1000s of measurement spots (probes) on micro-array, each sensitive to a specific DNA marker (e.g a section of a gene).  The probes measure if the corresponding gene is expressed (=active).  Collect samples from patients with a certain disease or condition.  Monitor 1000s of genes simulta- neously. Interesting questions  Is there a statistical relationship between certain gene expressions?  If so, can we derive the structure by which they influence each other? 24 B. Leibe Image source: Wikipedia Micro-array with ~40k probes

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Sequential Data Many real-world problems involve sequential data  Speech recognition  Visual object tracking  Robot planning  DNA sequencing  Financial forecasting …… In the following, we will look at sequential problems from a Graphical Models perspective… 26 B. Leibe Image source: C.M. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Models for Sequential Data Simplest model  Treat all observations as independent (i.i.d.)  Corresponding graphical model: What can we infer from such a model?  Only relative frequencies of certain events.  Such a model is of limited use in practice.  In practice, the data often exhibits trends that help prediction! 27 B. Leibe Image source: C.M. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Markov Models Markov assumption  Each observation only depends on the most recent previous observation: First-order Markov chain: Second-order Markov chain: 28 B. Leibe Image source: C.M. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Markov Models We can generalize this further  M th order Markov chains  However, this does not scale well.  Suppose all x n can take on K possible values.  #Parameters in the model:  Complexity grows sharply! Goal  We want a model that is not as limited by the Markov assumption  But that can be specified by few parameters  We can achieve that by introducing a state space model. 29 B. Leibe

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Hidden Markov Models (HMMs) Traditional view  The system is at each time in a certain state k  The (Markovian) state transition probabilities are given by the matrix A  We cannot observe the states directly, they are hidden.  We just know the initial probability distribution over states ¼.  Each state produces a characteristic output, given by a probability distribution over output symbols Á. 31 B. Leibe Image source: C.M. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Hidden Markov Models (HMMs) HMMs  Widely used in speech recognition, natural language modelling, handwriting recognition, biological sequence analysis (DNA, proteins), financial forecasting,…  Really changed the field… Often used in special forms  E.g., left-to-right HMM How can we encode them as graphical models? 32 B. Leibe Image source: C.M. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Hidden Markov Models (HMMs) Graphical Model view  Introduce latent variables z for the current system state.  The observed output x is conditioned on this state.  The state transition probabilities are given by the entries of A : 33 B. Leibe Image source: C.M. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Hidden Markov Models (HMMs) State transitions 34 B. Leibe K states Image source: C.M. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Interpretation as a Generative Model Ancestral Sampling from an HMM  Choose initial latent variable z 1 according to ¼ k  Sample the corresponding observation x 1  Choose state of variable z 2 by sampling from p ( z 2 | z 1 ) …… 35 B. Leibe ModelSamples Image source: C.M. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Three Main Tasks in HMMs 1. Likelihood Estimation  Given: an observation sequence and a model  What is the likelihood of this sequence given the model? 2. Finding the most probable state sequence  Given: an observation sequence and a model  What is the most likely sequence of states? 3. Learning HMMs  Given: several observation sequences  How can we learn/adapt the model parameters? 36 B. Leibe “Forward-backward algorithm” “Viterbi algorithm” “Baum-Welch algorithm” Image source: C.M. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Three Main Tasks in HMMs 1. Likelihood Estimation  Given: an observation sequence and a model  What is the likelihood of this sequence given the model? 2. Finding the most probable state sequence  Given: an observation sequence and a model  What is the most likely sequence of states? 3. Learning HMMs  Given: several observation sequences  How can we learn/adapt the model parameters? 37 B. Leibe “Forward-backward algorithm” “Viterbi algorithm” “Baum-Welch algorithm”  special case of Sum-Product!  special case of Max-Sum! application of EM! Image source: C.M. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Discussion Limitations of HMMs  Representation of the times for which the system remains in a given state:  Not appropriate for certain applications.  Poor at capturing long-range correlations between the observed variables due to restriction to Markov chains.  Generative model spends much of its effort on modeling the data distribution p ( X, Z ), even though the data will be given at test time.  Extensions with discriminative training, CRFs With the Graphical Model background, we can now think of extensions… 39 B. Leibe

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 HMM Extensions Autoregressive HMM  Can better capture longer-range correlations between observed variables.  Still simple probabilistic structure Input-Output HMM  Targeted at supervised learning for sequential data 40 B. Leibe Image source: C.M. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 HMM Extensions Autoregressive HMM  Can better capture longer-range correlations between observed variables.  Still simple probabilistic structure Input-Output HMM  Targeted at supervised learning for sequential data  Used, e.g., for articulated tracking 41 B. Leibe Image source: C.M. Bishop, 2006 [Gammeter et al., ECCV’08]

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 HMM Extensions (2) Factorial HMM  Multiple independent chains of latent variables that influence the observations.  Trades off larger #variables vs. smaller #latent states.  However, more complex to train (needs approximate inference) 42 B. Leibe Image source: C.M. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Other Extensions… Up to now…  Assumption that the z n are discrete.  We can also make them continuous to model the state with a parametric distribution.  Using Gaussian distributions and linear dynamics, we get Kalman filters. 43 B. Leibe Image source: C.M. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Tracking with Linear Dynamic Systems Idea:  Track the state of an object or variable over time by making predictions and applying corrections. 44 B. Leibe Image source: C.M. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Tracking with Linear Dynamic Systems Many applications in different domains… 45 B. Leibe Radar-based tracking of multiple targets Visual tracking of articulated objects (L. Sigal et. al., 2006) Slide adapted from Erik Sudderth

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Extensions: Conditional Random Fields HMM is a generative model  Goal: model the joint distribution p ( X, Z )  Interpretation of the directed links: conditional probabilities Limitations  To define the joint probability, need to enumerate all possible observation sequences  Not practical to represent multiple interacting features or long- range dependencies 46 B. Leibe Image source: C.M. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Conditional Random Fields Alternative: conditional model  Idea: model the conditional distribution p ( Z | X ) instead.  Specify the probabilities of possible label sequences given an observation sequence. Advantages  Does not expend effort on the observations, which are fixed anyway at test time.  Interpretation of the factors: feature functions, more flexible 47 B. Leibe Image source: C.M. Bishop, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Conditional Random Fields Applications  First used for natural language interpretation  Replaced HMMs for some tasks.  Learned using iterative estimation algorithms. In the meantime, also many applications in vision  Image segmentation  Object recognition  Context modeling  Scene categorization  Gesture recognition ……  Used mainly instead of MRFs (solved with Graph Cuts). 48 B. Leibe [He, Zemel, Carreira-Perpinan, 2004]

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 References and Further Reading A thorough introduction to Graphical Models in general and Bayesian Networks in particular can be found in Chapter 8 of Bishop’s book. HMMs and their interpretation as graphical models are described in detail in Chapter 13. Original CRF paper:  J. Lafferty, A. McCallum, F. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, ICML, 2001.Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data B. Leibe 49 Christopher M. Bishop Pattern Recognition and Machine Learning Springer, 2006

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 21 Learning Bayesian Networks & Extensions 27.01.2014.

Similar presentations

Presentation on theme: "Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 21 Learning Bayesian Networks & Extensions 27.01.2014."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 21 Learning Bayesian Networks & Extensions 27.01.2014.

Similar presentations

Presentation on theme: "Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 21 Learning Bayesian Networks & Extensions 27.01.2014."— Presentation transcript:

Similar presentations

About project

Feedback