A random field…

Slides:

Advertisements

Similar presentations

Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.

Advertisements

What Did We See? & WikiGIS Chris Pal University of Massachusetts A Talk for Memex Day MSR Redmond, July 19, 2006.

CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.

An Introduction to Conditional Random Field Ching-Chun Hsiao 1.

Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.

Conditional Random Fields and beyond …

Modeling the Evolution of Product Entities Priya Radhakrishnan 1, Manish Gupta 1,2, Vasudeva Varma 1 1 Search and Information Extraction Lab, IIIT-Hyderabad,

Supervised Learning Recap

Logistic Regression Chapter 5, DDS. Introduction What is it? – It is an approach for calculating the odds of event happening vs other possibilities…Odds.

John Lafferty, Andrew McCallum, Fernando Pereira

Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical.

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.

Chapter 4: Linear Models for Classification

Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,

What is Statistical Modeling

Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili University of Tehran.

Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Today Linear Regression Logistic Regression Bayesians v. Frequentists

Conditional Random Fields

Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.

MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Review of Lecture Two Linear Regression Normal Equation

Crash Course on Machine Learning

Bayesian Decision Theory Making Decisions Under uncertainty 1.

11 CS 388: Natural Language Processing: Discriminative Training and Conditional Random Fields (CRFs) for Sequence Labeling Raymond J. Mooney University.

Conditional Random Fields   A form of discriminative modelling   Has been used successfully in various domains such as part of speech tagging and other.

Machine Learning & Data Mining CS/CNS/EE 155 Lecture 6: Conditional Random Fields 1.

Conditional Random Fields

Conditional Topic Random Fields Jun Zhu and Eric P. Xing ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011.

Graphical models for part of speech tagging

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.

LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)

Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.

Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.

Presented by Jian-Shiun Tzeng 5/7/2009 Conditional Random Fields: An Introduction Hanna M. Wallach University of Pennsylvania CIS Technical Report MS-CIS

CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.

CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.

Conditional Random Fields for ASR Jeremy Morris July 25, 2006.

Training Conditional Random Fields using Virtual Evidence Boosting Lin Liao, Tanzeem Choudhury †, Dieter Fox, and Henry Kautz University of Washington.

MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran – Dikkala Sai Nishanth – Ashwin P. Paranjape

CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.

John Lafferty Andrew McCallum Fernando Pereira

Shallow Parsing for South Asian Languages -Himanshu Agrawal.

Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.

Computer vision: models, learning and inference Chapter 2 Introduction to probability.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.

Conditional Random Fields & Table Extraction Dongfang Xu School of Information.

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Conditional Random Fields and Its Applications Presenter: Shih-Hsiang Lin 06/25/2007.

CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.

Maximum Entropy Models and Feature Engineering CSCI-GA.2591

CSC 594 Topics in AI – Natural Language Processing

Introduction to logistic regression a.k.a. Varbrul

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18

Multivariate Methods Berlin Chen

Discriminative Probabilistic Models for Relational Data

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18

Presentation transcript:

A random field…

An Introduction to Conditional Random Fields Charles Sutton and Andrew McCallum Foundations and Trends in Machine Learning, Vol. 4, No. 4 (2011) EdinburghUMass

Additional Tutorial Sources Hanna M. Wallach (2004). “Conditional Random Fields: An Introduction.” Technical Report MS-CIS Department of Computer and Information Science, University of Pennsylvania. – Easy to follow, provides high-level intuition. Presents CRFs as undirected graphical models (as opposed to undirected factor graphs). Charles Sutton and Andrew McCallum (2006). “An Introduction to Conditional Random Fields for Relational Learning.” In Introduction to Statistical Relational Learning. Edited by Lise Getoor and Ben Taskar. MIT Press, 2006 – Shorter version of the book. Rahul Gupta (2006). “Conditional Random Fields.” Unpublished report, IIT Bombay. – Provides detailed derivation of the important equations for CRFs Roland Memisevic (2006). “An Introduction to Structured Discriminative Learning.” Technical Report, University of Toronto. – Places CRFs in the context of other methods for learning to predict complex outputs, esp. SVM-inspired large-margin methods. Charles Elkan (2013). “Log-linear models and CRFs” –

Code Internet country code for the Cocos (Keeling) Islands, an Australian territory of 5.4 square miles and about 600 inhabitants. Administered by VeriSign (through subsidiary eNIC), which promotes.cc for international registration as “the next.com”

A Canonical Example: POS Tagging “I’ll be long gone before some smart person ever figures out what happened inside this Oval Office.” (George W. Bush, Washington D.C., May 12, 2008) PRP VB RB VBN IN DT JJ NN RB VBZ RP WP VBD IN DT NNP NNP

Two Views Y X P(X|Y) P(Y) Model the Joint of X and Y P(X,Y) = P(X|Y) P(Y) Can infer [label, latent state, cause] from evidence using Bayes Thrm P(Y|X) = P(X|Y) P(Y) / P(X) Y X P(Y|X) The Generative PictureThe Discriminative Picture

Graphical Models Factorization (local functions) Conditional Independence Graphical Structure (relational structure of factors) Undirected Graphical Model Directed Graphical Models

Factor Graphs Distinguish “input” (always observed) from “output” (wish to predict)

Generative-Discriminative Pairs

The logistic likelihood is formally derived as a result of modeling the log-odds ratio (aka the logit): There are no constraints on this value: it can take any real value. Binary Logistic Function Large negative Large positive

Binary Logistic Function Now, derive Note: The binary logistic function is really modeling the log-odds ratio with a linear model! Example of a generalized linear model: linear model passed through a transformation to model a quantity of interest. The Logistic (likelihood) function The Logit

Binary Logistic Likelihood The Logistic (or Sigmoid) function Linear component When target is 0: Combine both into a single probability function (Note! A fn of x)

Substitute in the component likelihoods to get the final likelihood function Binary Logistic Likelihood “Multinomial” Logistic Likelihood:

Generative-Discriminative Pairs

Feature Functions for bias for feature weights

Section Read pp for nice discussion comparing strengths and weaknesses of generative and discriminative approaches.

From HMM to Linear-Chain CRF The conditional distribution is in fact a CRF with particular choice of feature functions Every homogeneous HMM can be written in this form by setting…

Rewrite with Feature Functions Now, the conditional distribution:

The Linear Chain CRF As a factor graph…… where each factor has this fnl form

Variants of the Linear Chain CRF The “HMM-like” LCCRF

General CRFs

Clique Templating

Feature Engineering (1) Label-observation features discrete

Feature Engineering (2) Unsupported Features Explicitly represent when a rare feature is not present Assign negative weight Early large-scale CRF application had 3.8 million binary features Results in slight increase in accuracy but permits many more features

Feature Engineering (3) Edge-Observation / Node-Observation

Feature Engineering (4) Boundary Labels

Feature Engineering (5) Feature Induction (extend “unsup ftr trick”)

Feature Engineering (6) Categorical Features Text applications: CRF features are typically binary Vision and speech: typically real-valued For real-valued features: helps to normalize (mean 0, stdev 1)

Feature Engineering (7) Features from Different Time Steps

Feature Engineering (8) Features as Backoff

Feature Engineering (9) Features as Model Combination

Feature Engineering (10) Input-Dependent Structure