Training Conditional Random Fields using Virtual Evidence Boosting Lin Liao, Tanzeem Choudhury †, Dieter Fox, and Henry Kautz University of Washington.

Slides:

Advertisements

Similar presentations

Bayesian Belief Propagation

Advertisements

Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.

Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

Exact Inference in Bayes Nets

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Dynamic Bayesian Networks (DBNs)

Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)

CMPUT 466/551 Principal Source: CMU

Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.

Computer vision: models, learning and inference

Markov Networks.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Derek Hao Hu, Qiang Yang Hong Kong University of Science and Technology.

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

Lecture 5: Learning models using EM

A Practical Approach to Recognizing Physical Activities Jonathan Lester Tanzeem Choudhury Gaetano Borriello.

Temporal Processes Eran Segal Weizmann Institute.

Geographic Gossip: Efficient Aggregations for Sensor Networks Author: Alex Dimakis, Anand Sarwate, Martin Wainwright University: UC Berkeley Venue: IPSN.

Conditional Random Fields

Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Latent Boosting for Action Recognition Zhi Feng Huang et al. BMVC Jeany Son.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

6. Experimental Analysis Visible Boltzmann machine with higher-order potentials: Conditional random field (CRF): Exponential random graph model (ERGM):

Graphical models for part of speech tagging

From Bayesian Filtering to Particle Filters Dieter Fox University of Washington Joint work with W. Burgard, F. Dellaert, C. Kwok, S. Thrun.

CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.

Recognizing Activities of Daily Living from Sensor Data Henry Kautz Department of Computer Science University of Rochester.

Benk Erika Kelemen Zsolt

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.

Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.

An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.

1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.

Inferring High-Level Behavior from Low-Level Sensors Donald J. Patterson, Lin Liao, Dieter Fox, and Henry Kautz.

CSE 473 Ensemble Learning. © CSE AI Faculty 2 Ensemble Learning Sometimes each learning technique yields a different hypothesis (or function) But no perfect.

Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.

John Lafferty Andrew McCallum Fernando Pereira

Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

Semi-supervised Mesh Segmentation and Labeling

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

Contextual models for object detection using boosted random fields by Antonio Torralba, Kevin P. Murphy and William T. Freeman.

NTU & MSRA Ming-Feng Tsai

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819

SA-1 University of Washington Department of Computer Science & Engineering Robotics and State Estimation Lab Dieter Fox Stephen Friedman, Lin Liao, Benson.

Thrust IIA: Environmental State Estimation and Mapping Dieter Fox (Lead) Nicholas Roy MURI 8 Kickoff Meeting 2007.

Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.

Conditional Random Fields and Its Applications Presenter: Shih-Hsiang Lin 06/25/2007.

Learning Deep Generative Models by Ruslan Salakhutdinov

Boosted Augmented Naive Bayes. Efficient discriminative learning of

Machine Learning Basics

Janardhan Rao (Jana) Doppa, Alan Fern, and Prasad Tadepalli

Markov Networks.

Learning Markov Networks

CONTEXT DEPENDENT CLASSIFICATION

≠ Particle-based Variational Inference for Continuous Systems

GANG: Detecting Fraudulent Users in OSNs

Markov Networks.

Presentation transcript:

Training Conditional Random Fields using Virtual Evidence Boosting Lin Liao, Tanzeem Choudhury †, Dieter Fox, and Henry Kautz University of Washington † Intel Research Experiments Approaches to Training Conditional Random Fields (CRFs) Maximum Likelihood Run numerical optimization to find the optimal weights, which requires inference at each iteration Inefficient for complex structures Inadequate for continuous observations and feature selection Maximum Pseudo-Likelihood Convert a CRF into separate patches; each consists of a hidden node and true values of neighbors Run ML learning on separate patches Efficient but may over-estimate inter-dependency Inadequate for continuous observations and feature selection Our Approach: Virtual Evidence Boosting Convert a CRF into separate patches; each consists of a hidden node and virtual evidence of neighbors Run boosting (to select features) and belief propagation (to update virtual evidence) alternately Efficient and unified approach to feature selection and parameter estimation Suitable for both discrete and continuous observations Extension of LogitBoost with Virtual Evidence Algorithms Traditional boosting algorithms assume feature values be deterministic We extend LogitBoost algorithm to handle virtual evidence, i.e., a feature could also be a likelihood value or probability distribution INPUTS: training samples OUTPUT: F (linear combination of features) FOR each iteration FOR each sample Compute likelihood Compute sample weight Compute working response END Obtain best weak learner by solving Add the weak learner to F END Virtual Evidence Boosting for CRFs Boosted Random Fields versus VEB Closest related work to VEB is Boosted Random Fields (Torralba 2004) BRFs combine boosting and belief propagation but assume dense graph structure and weak pair- wise influence We compare the two as the pair-wise influence changes VEB performs significantly better with strong relations Feature Selection VEB can be used to extract sparse structure from complex models. In this experiment it is able to find the exact order in a high-order HMM, and thus outperforms other learning alternatives. Indoor Activities Activities: computer usage, meal, TV, meeting, and sleeping Linear chain CRF with 315 continuous input features 1100 minutes of data over 12 days Physical Activities and Spatial Contexts Context: indoors, outdoors, and vehicles Activities: stationary, walking, running, driving, and going up/down stairs Approximately 650 continuous input features 400 minutes of data over 12 episodes INPUTS: Structure of CRF and training samples OUTPUT: F (linear combination of features) FOR each iteration Run BP using current F to get virtual evidence ve(x i, n(y i )); FOR each sample Compute likelihood Compute sample weight Compute working response END Obtain best weak learner by solving Add the weak learner to F END Training AlgorithmAverage accuracy VEB88.8% MPL + all observations72.1% MPL + boosting70.9% HMM + AdaBoost85.8% Training AlgorithmAverage accuracy VEB94.1% BRF88.0% ML + all observations87.7% ML + boosting88.5% MPL + all observations87.9% MPL + boosting88.5% Goal: To develop efficient feature selection and parameter estimation technique for Conditional Random Fields (CRFs) Application domain: To learn human activity models from continuous, multi-modal sensory inputs Introduction Application: Human Activity Recognition Model human activities and select discriminatory features from multimodal sensor data. Sensors include accelerometer, audio, light, temperature, etc. Context sequence Activity sequence