Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

Slides:



Advertisements
Similar presentations
An Introduction to Conditional Random Field Ching-Chun Hsiao 1.
Advertisements

Introduction to Conditional Random Fields John Osborne Sept 4, 2009.
John Lafferty, Andrew McCallum, Fernando Pereira
Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
Hidden Markov Models Eine Einführung.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Statistical NLP: Lecture 11
Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili University of Tehran.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Lecture 6, Thursday April 17, 2003
Learning Seminar, 2004 Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data J. Lafferty, A. McCallum, F. Pereira Presentation:
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Conditional Random Fields
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
THE HIDDEN MARKOV MODEL (HMM)
Graphical models for part of speech tagging
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
1 Sequence Learning Sudeshna Sarkar 14 Aug Alternative graphical models for part of speech tagging.
H IDDEN M ARKOV M ODELS. O VERVIEW Markov models Hidden Markov models(HMM) Issues Regarding HMM Algorithmic approach to Issues of HMM.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon Machine Learning for Sequential.
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Presented by Jian-Shiun Tzeng 5/7/2009 Conditional Random Fields: An Introduction Hanna M. Wallach University of Pennsylvania CIS Technical Report MS-CIS
CS Statistical Machine learning Lecture 24
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
John Lafferty Andrew McCallum Fernando Pereira
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
Conditional Markov Models: MaxEnt Tagging and MEMMs
Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Eric Xing © Eric CMU, Machine Learning Structured Models: Hidden Markov Models versus Conditional Random Fields Eric Xing Lecture 13,
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, March 14, 2013 Session 8: Sequence Labeling This work is licensed under.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Other Models for Time Series. The Hidden Markov Model (HMM)
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Conditional Random Fields and Its Applications Presenter: Shih-Hsiang Lin 06/25/2007.
Hidden Markov Models BMI/CS 576
Structured prediction
Maximum Entropy Models and Feature Engineering CSCI-GA.2591
CSC 594 Topics in AI – Natural Language Processing
Conditional Random Fields model
Introduction to HMM (cont)
Presentation transcript:

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar

Outline Introduction Directed Graphical Models –Hidden Markov Models (HMMs) –Maximum Entropy Markov Models (MEMMs) Label Bias Problem Undirected Graphical Models –Conditional Random Fields (CRFs) Summary

The Task Labeling –Given sequence data, mark appropriate tags for each data item Segmentation –Given sequence data, segment into non- overlapping groups such that related entities are in same group

Applications Computational Linguistics –POS Tagging –Information Extraction –Syntactic Disambiguation Computational Biology –DNA and Protein Sequence Alignment –Sequence homologue searching –Protein Secondary Structure Prediction

Example : POS Tagging

Directed Graphical Models Hidden Markov models (HMMs) –Assign a joint probability to paired observation and label sequences –The parameters trained to maximize the joint likelihood of train examples

Hidden Markov Models (HMMs) Generative Model - Models the joint distribution Generation Process –Probabilistic Finite State Machine –Set of states – Correspond to tags –Alphabet - Set of words –Transition Probability – –State Probability –

HMMs (Contd..) For a given word/tag sequence pair Why Hidden? –Sequence of tags which generated word sequence not visible Why Markov? –Based on Markovian Assumption : current tag depends only on previous ‘n’ tags –Solves the “sparsity problem” Training – Learning the transition and emission probabilities from data

HMMs Tagging Process Given a string of words w, choose tag sequence t* such that Computationally expensive - Need to evaluate all possible tag sequences! –For ‘n’ possible tags, m positions – Viterbi Algorithm –Used to find the optimal tag sequence t* –Efficient dynamic programming based algorithm

Disadvantages of HMMs Need to enumerate all possible observation sequences Not possible to represent multiple interacting features Difficult to model long-range dependencies of the observations Very strict independence assumptions on the observations

Maximum Entropy Markov Models (MEMMs) Conditional Exponential Models –Assumes observation sequence given (need not model) –Trains the model to maximize the conditional likelihood P(Y|X)

MEMMs (Contd..) For a new data sequence x, the label sequence y which maximizes P(y|x,Θ) is assigned (Θ - parameter set) Arbitrary non-independent features on observation sequence possible Conditional Models known to perform well than Generative Performs Per-State Normalization –Total mass which arrives at a state must be distributed among all possible successor states

Label Bias Problem Bias towards states with fewer outgoing transitions Due to per-state normalization An Example MEMM

Undirected Graphical Models Random Fields

Conditional Random Fields (CRFs) Conditional Exponential Model like MEMM Has all the advantages of MEMMs without label bias problem –MEMM uses per-state exponential model for the conditional probabilities of next states given the current state –CRF has a single exponential model for the joint probability of the entire sequence of labels given the observation sequence Allow some transitions “vote” more strongly than others depending on the corresponding observations

Definition of CRFs

CRF Distribution Function Where : V = Set of Label Random Variables f k and g k = Features g k = State Feature f k = Edge Feature are parameters to be estimated y| e = Set of Components of y defined by edge e y| v = Set of Components of y defined by vertex v

CRF Training

CRF Training (Contd..) Condition for maximum likelihood Expected feature count computed using Model equals Empirical feature count from training data Closed form solution for parameters not possible Iterative algorithms employed - Improve log likelihood in successive iterations Examples –Generalized Iterative Scaling (GIS) –Improved Iterative Scaling (IIS)

Graphical Comparison HMMs, MEMMs, CRFs

POS Tagging Results

Summary HMMs –Directed, Generative graphical models –Cannot be used to model overlapping features on observations MEMMs –Directed, Conditional Models –Can model overlapping features on observations –Suffer from label bias problem due to per-state normalization CRFs –Undirected, Conditional Models –Avoids label bias problem –Efficient training possible

Thanks! Acknowledgements Some slides in this presentation are from Rongkun Shen’s (Oregon State Univ) Presentation on CRFs