Introduction to Conditional Random Fields John Osborne Sept 4, 2009.

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.
An Introduction to Conditional Random Field Ching-Chun Hsiao 1.
Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.
Conditional Random Fields and beyond …
Supervised Learning Recap
John Lafferty, Andrew McCallum, Fernando Pereira
Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Hidden Markov Model Cryptanalysis Chris Karlof and David Wagner.
A Graphical Model For Simultaneous Partitioning And Labeling Philip Cowans & Martin Szummer AISTATS, Jan 2005 Cambridge.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Learning Seminar, 2004 Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data J. Lafferty, A. McCallum, F. Pereira Presentation:
Conditional Random Fields
Doug Downey, adapted from Bryan Pardo,Northwestern University
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Scalable Text Mining with Sparse Generative Models
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 6: Conditional Random Fields 1.
Final review LING572 Fei Xia Week 10: 03/11/
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Conditional Random Fields
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Graphical models for part of speech tagging
CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Markov Random Fields Probabilistic Models for Images
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon Machine Learning for Sequential.
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
Presented by Jian-Shiun Tzeng 5/7/2009 Conditional Random Fields: An Introduction Hanna M. Wallach University of Pennsylvania CIS Technical Report MS-CIS
CS Statistical Machine learning Lecture 24
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.
Lecture 2: Statistical learning primer for biologists
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
A Dynamic Conditional Random Field Model for Object Segmentation in Image Sequences Duke University Machine Learning Group Presented by Qiuhua Liu March.
John Lafferty Andrew McCallum Fernando Pereira
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
Conditional Markov Models: MaxEnt Tagging and MEMMs
Markov Random Fields & Conditional Random Fields
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Conditional Random Fields and Its Applications Presenter: Shih-Hsiang Lin 06/25/2007.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Structured prediction
CSCI 5832 Natural Language Processing
Conditional Random Fields model
Sequential Learning with Dependency Nets
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

Introduction to Conditional Random Fields John Osborne Sept 4, 2009

Overview Useful Definitions Background – HMM – MEMM Conditional Random Fields – Statistical and Graph Definitions Computation (Training and Inference) Extensions – Bayesian Conditional Random Fields – Hierarchical Conditional Random Fields – Semi-CRFs Future Directions

Useful Definitions Random Field (wikipedia) – In probability theory, let S = {X 1,..., X n }, with the X i in {0, 1,..., G − 1} being a set of random variables on the sample space Ω = {0, 1,..., G − 1} n. A probability measure π is a random field if, for all ω in Ω, π(ω) > 0.probability theoryrandom variablessample space Markov Process (chain if finite sequence) – Stochastic process with Markov property Markov Property – The probability that a random variable assumes a value depends on the other random variables only through the ones that are its immediate neighbors – “memoryless” Hidden Markov Model (HMM) – Markov Model where the current state is unobserved Viterbi Algorithm – Dynamic programming technique to discover the most likely sequence of states required to explain the observed states in an HMM – Determine labels Potential Function == Feature Function – In CRF the potential function scores the compatibility of y t, y t-1 and w t (X)

Background Interest in CRFs arose from Richa’s work with gene expression Current literature shows them performing better on NLP tasks than other commonly used NLP approaches like Support Vector Machines (SVM), neural networks, HMMs and others – Termed coined by Lafftery in 2001 Predecessor was HMM and maximum entropy Markov models (MEMM)

HMM – Definition Markov Model where the current state is unobserved – Generative Model – To examine all input X would be prohibitive, hence Markov property looking at only current element in the sequence – No multiple interacting features, long range dependencies

MEMMs – McCallum et al, 2000 – Non-generative finite- state model based on next-state classifier – Directed graph – P(YjX) = ∏ t P(y t | y t-1 w t (X)) where wt(X) is a sliding window over the X sequence

Label Bias Problem Transitions leaving a given state complete only against each other, rather than against all other transitions in the model Implies “Conversation of score mass” (Bottou, 1991) Observations can be ignored, Viterbi decoding can’t downgrade a branch CRF will solve this problem by having a single exponential model for the joint probability of the ENTIRE SEQUENCE OF LABELS given the observation sequence

Big Picture Definition Wikipedia Definition (Aug 2009) – A conditional random field (CRF) is a type of discriminative probabilistic model most often used for the labeling or parsing of sequential data, such as natural language text or biological sequences.discriminativeprobabilisticparsing natural language Probabilistic model is a statistical model, in math terms “a pair (Y,P) where Y is the set of possible observations and P the set of possible probability distributions on Y” – In statistics terms this means the objective is to infer (or pick) the distinct element (probability distribution) in the set “P” given your observation Y Discriminative model meaning it models the conditional probability distribution P(y|x) which can predict y given x. – It can not do it the other way around (produce x from y) since it does not a generative model (capable of generating sample data given a model) as it does not model a joint probability distribution – Similar to other discriminative models like support vector machines and neural networks When analyzing sequential data a conditional model specifies the probabilities of possible label sequences given an observation sequence

CRF Graphical Definition Definition from Lafferty Undirected graphical model Let g = (V,E) be a graph such that Y = (Y v ) vεV, so that Y is indexed by the vertices of G. Then (X,Y) is a conditional random field in case, when conditioned on X, the random variables Y v obey the Markov property with respect to the graph: p(Y v |X,Y w,w≠v)=p(Y v |X,Y w,w~v), where w~v means that w and v are neighbors in G CRF Undirected Graph

Computation of CRF Training – Conditioning – Calculation of Feature Function – P(Y|X) = 1/Z(X)exp ∑ t PSI (y t, y t-1 and w t (X)) Z is normalizing factor Potential Function in paratheses Inference – Viterbi Decoding – Approximate Model Averaging – Others?

Training Approaches CRF is supervised learning so can train using – Maximum Likehood (original paper) Used iterative scaling method, was very slow – Gradient Assent Also slow when naïve – Mallet Implementation used BFGS algorithm Broyden-Fletcher-Goldfarb – Shanno Approximate 2 nd order algorithm – Stochastic Gradient Method (2006) accelerated via Stochastic Meta Descent – Gradient Tree Boosting (variant of a Potential functions are sums of regression trees – Decision trees using real values Published 2008 Competitive with Mallet – Bayesian (estimate posterior probability)

Conditional Random Field Extensions Semi-CRF Semi-CRF – Instead of assigning labels to each member of sequence, labels are assigned to sub-sequences – Advantage – “features for semi-CRF can measure properties of segments, and transition within a segment can be non-Markovian” – CRF.pdf CRF.pdf

Bayesian CRF Qi et al, (2005) ers/Qi-Bayesian-CRF-AIstat05.pdf ers/Qi-Bayesian-CRF-AIstat05.pdf Replacement for ML method of Lafferty Reducing over-fitting “Power EP Method”

Hierarchical CRF (HCRF) v5/ v5/ cripts/places-isrr-05.pdf cripts/places-isrr-05.pdf GPS motion, for surveillance, tracking, dividing people’s workday into labels of work, travel, sleep, etc.. Less work

Future Directions Less work on conditional random fields in biology – PubMed hits Conditional Random Field - 21 Conditional Random Fields - 43 – CRF variants & promoter/regulatory element shows no hits CRF and ontology show no hits Plan – Implement CRF in Java, apply to biology problems, try to find ways to extend?

Useful Papers Link to original paper and review paper – – Review paper: Another review – Review slides – Tutorial%20CRF%20Lafferty.pdf Tutorial%20CRF%20Lafferty.pdf The boosting paper has a nice review – ch08a.pdf ch08a.pdf