Conditional Random Fields   A form of discriminative modelling   Has been used successfully in various domains such as part of speech tagging and other.

Slides:



Advertisements
Similar presentations
Learning on the Test Data: Leveraging “Unseen” Features Ben Taskar Ming FaiWong Daphne Koller.
Advertisements

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Conditional Random Fields For Speech and Language Processing
An Introduction to Variational Methods for Graphical Models.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case
Experiments  Synthetic data: random linear scoring function with random constraints  Information extraction: Given a citation, extract author, book-title,
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
1 Conditional Random Fields for ASR Jeremy Morris 11/23/2009.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
OSU ASAT Status Report Jeremy Morris Yu Wang Ilana Bromberg Eric Fosler-Lussier Keith Johnson 13 October 2006.
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.
Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
1 CRFs for ASR: Extending to Word Recognition Jeremy Morris 05/16/2008.
1 Word Recognition with Conditional Random Fields Jeremy Morris 12/03/2009.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
Chapter 3 (part 2): Maximum-Likelihood and Bayesian Parameter Estimation Bayesian Estimation (BE) Bayesian Estimation (BE) Bayesian Parameter Estimation:
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
FIGURE 1: Spectrogram of the phrase “that experience”, shown with phonetic labels and corresponding neural network posterior distributions over each phonetic.
Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),
CSC321: Neural Networks Lecture 16: Hidden Markov Models
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
Discriminative Phonetic Recognition with Conditional Random Fields Jeremy Morris & Eric Fosler-Lussier The Ohio State University Speech & Language Technologies.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
1 Conditional Random Fields for Automatic Speech Recognition Jeremy Morris 06/03/2010.
John Lafferty Andrew McCallum Fernando Pereira
PoS tagging and Chunking with HMM and CRF
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Conditional Random Fields An Overview Jeremy Morris 01/11/2008.
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.
FIGURE 1: Spectrogram of the phrase “that experience”, shown with phonetic labels and corresponding neural network posterior distributions over each phonetic.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Conditional Random Fields and Its Applications Presenter: Shih-Hsiang Lin 06/25/2007.
Combining Phonetic Attributes Using Conditional Random Fields Jeremy Morris and Eric Fosler-Lussier – Department of Computer Science and Engineering A.
NTNU Speech and Machine Intelligence Laboratory 1 Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models 2016/05/31.
1 Conditional Random Fields For Speech and Language Processing Jeremy Morris 10/27/2008.
Hidden Markov Models BMI/CS 576
Olivier Siohan David Rybach
Deep Feedforward Networks
Deep Learning Amin Sobhani.
Maximum Entropy Models and Feature Engineering CSCI-GA.2591
Conditional Random Fields for ASR
Statistical Models for Automatic Speech Recognition
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
CSC 594 Topics in AI – Natural Language Processing
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
CRANDEM: Conditional Random Fields for ASR
Conditional Random Fields An Overview
Statistical Models for Automatic Speech Recognition
Jeremy Morris & Eric Fosler-Lussier 04/19/2007
Automatic Speech Recognition: Conditional Random Fields for ASR
Speech recognition, machine learning
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Discriminative Probabilistic Models for Relational Data
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Speech recognition, machine learning
Presentation transcript:

Conditional Random Fields   A form of discriminative modelling   Has been used successfully in various domains such as part of speech tagging and other Natural Language Processing tasks   Processes evidence bottom-up   Combines multiple features of the data   Builds the probability P( sequence | data)

Conditional Random Fields   CRFs are based on the idea of Markov Random Fields   Modelled as an undirected graph connecting labels with observations   Observations in a CRF are not modelled as random variables /k/ /iy/ XXXXX Transition functions add associations between transitions from one label to another State functions help determine the identity of the state

Conditional Random Fields State Feature Function f([x is stop], /t/) One possible state feature function For our attributes and labels State Feature Weight λ=10 One possible weight value for this state feature (Strong) Transition Feature Function g(x, /iy/,/k/) One possible transition feature function Indicates /k/ followed by /iy/ Transition Feature Weight μ=4 One possible weight value for this transition feature  Hammersley-Clifford Theorem states that a random field is an MRF iff it can be described in the above form  The exponential is the sum of the clique potentials of the undirected graph

Conditional Random Fields   Conceptual Overview   Each attribute of the data we are trying to model fits into a feature function that associates the attribute and a possible label   A positive value if the attribute appears in the data   A zero value if the attribute is not in the data   Each feature function carries a weight that gives the strength of that feature function for the proposed label   High positive weights indicate a good association between the feature and the proposed label   High negative weights indicate a negative association between the feature and the proposed label   Weights close to zero indicate the feature has little or no impact on the identity of the label

Experimental Setup   Attribute Detectors   ICSI QuickNet Neural Networks   Two different types of attributes   Phonological feature detectors   Place, Manner, Voicing, Vowel Height, Backness, etc.   Features are grouped into eight classes, with each class having a variable number of possible values based on the IPA phonetic chart   Phone detectors   Neural networks output based on the phone labels – one output per label   Classifiers were applied to 2960 utterances from the TIMIT training set

Experimental Setup   Output from the Neural Nets are themselves treated as feature functions for the observed sequence – each attribute/label combination gives us a value for one feature function   Note that this makes the feature functions non-binary features.

Experiment 1   Goal: Implement a Conditional Random Field Model on ASAT-style phonological feature data   Perform phone recognition   Compare results to those obtained via a Tandem HMM system

Experiment 1 - Results Model Phone Accuracy Phone Correct Tandem [monophone] 61.48%63.50% Tandem [triphone] 66.69%72.52% CRF [monophone] 65.29%66.81%  CRF system trained on monophones with these features achieves accuracy superior to HMM on monophones  CRF comes close to achieving HMM triphone accuracy

Experiment 2   Goals:   Apply CRF model to phone classifier data   Apply CRF model to combined phonological feature classifier data and phone classifier data   Perform phone recognition   Compare results to those obtained via a Tandem HMM system

Experiment 2 - Results Model Phone Acc Phone Correct Tandem [mono] (phones) 60.48%63.30% Tandem [tri] (phones) 67.32%73.81% CRF [mono] (phones) 66.89%68.49% Tandem [mono] (phones/feas) 61.78%63.68% Tandem [tri] (phones/feas) 67.96%73.40% CRF [mono] (phones/feas) 68.00%69.58% Note that Tandem HMM result is best result with only top 39 features following a principal components analysis

Experiment 3   Goal:   Previous CRF experiments used phone posteriors for CRF, and linear outputs transformed via a Karhunen-Loeve (KL) transform for the HMM sytem   This transformation is needed to improve the HMM performance through decorellation of inputs   Using the same linear outputs as the HMM system, do our results change?

Experiment 3 - Results Model Phone Accuracy Phone Correct CRF (phones) posteriors 67.27%68.77% CRF (phones) linear KL 66.60%68.25% CRF (phones) post. + linear 68.18%69.87% CRF (features) posteriors 65.25%66.65% CRF (features) linear KL 66.32%67.95% CRF (features) post + linear 66.89%68.48% CRF (features) linear (no KL) 65.89%68.46% Also shown – Adding both feature sets together and giving the system supposedly redundant information leads to a gain in accuracy

Experiment 4   Goal:   Previous CRF experiments did not allow for realignment of the training labels   Boundaries for labels provided by TIMIT hand transcribers used throughout training   HMM systems allowed to shift boundaries during EM learning   If we allow for realignment in our training process, can we improve the CRF results?

Experiment 4 - Results Model Phone Accuracy Phone Correct Tandem [tri] (phones) 67.32%73.81% CRF (phones) no realign 67.27%68.77% CRF (phones) realign 69.63%72.40% Tandem [tri] (features) 66.69%72.52% CRF (features) no realign 65.25%66.65% CRF (features) realign 67.52%70.13% Allowing realignment gives accuracy results for a monophone trained CRF that are superior to a triphone trained HMM, with fewer parameters