Bidirectional CRF for NER

Slides:

Advertisements

Similar presentations

Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.

Advertisements

Latent Variables Naman Agarwal Michael Nute May 1, 2013.

Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.

. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:

Patch to the Future: Unsupervised Visual Prediction

John Lafferty, Andrew McCallum, Fernando Pereira

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.

Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,

Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.

Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.

A Graphical Model For Simultaneous Partitioning And Labeling Philip Cowans & Martin Szummer AISTATS, Jan 2005 Cambridge.

Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.

FSA and HMM LING 572 Fei Xia 1/5/06.

Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.

Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.

Boosting Applied to Tagging and PP Attachment By Aviad Barzilai.

Sequence labeling and beam search LING 572 Fei Xia 2/15/07.

CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Conditional Random Fields   A form of discriminative modelling   Has been used successfully in various domains such as part of speech tagging and other.

Graphical models for part of speech tagging

CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.

Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.

H IDDEN M ARKOV M ODELS. O VERVIEW Markov models Hidden Markov models(HMM) Issues Regarding HMM Algorithmic approach to Issues of HMM.

Inferring strengths of protein-protein interactions from experimental data using linear programming Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics.

Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.

A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart

S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.

Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

Conditional Random Fields for ASR Jeremy Morris July 25, 2006.

A New Method for Automatic Clothing Tagging Utilizing Image-Click-Ads Introduction Conclusion Can We Do Better to Reduce Workload?

1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.

School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.

John Lafferty Andrew McCallum Fernando Pereira

Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.

Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.

Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.

Conditional Random Fields & Table Extraction Dongfang Xu School of Information.

Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),

April 21, 2016Introduction to Artificial Intelligence Lecture 22: Computer Vision II 1 Canny Edge Detector The Canny edge detector is a good approximation.

Language Identification and Part-of-Speech Tagging

Deep Learning for Bacteria Event Identification

Maximum Entropy Models and Feature Engineering CSCI-GA.2591

HMM and CRF Lin Xuming.

Conditional Random Fields

CRF &SVM in Medication Extraction

Named Entity Tagging with Conditional Random Fields

Conditional Random Fields for ASR

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15

Hidden Markov Models - Training

CSC 594 Topics in AI – Natural Language Processing

Hidden Markov Models Part 2: Algorithms

CRANDEM: Conditional Random Fields for ASR

Jeremy Morris & Eric Fosler-Lussier 04/19/2007

Three classic HMM problems

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15

N-Gram Model Formulas Word sequences Chain rule of probability

Hidden Markov Models (HMMs)

Generalizations of Markov model to characterize biological sequences

Other Classification Models: Recurrent Neural Network (RNN)

Handwritten Characters Recognition Based on an HMM Model

Parametric Methods Berlin Chen, 2005 References:

Sequential Learning with Dependency Nets

Bidirectional LSTM-CRF Models for Sequence Tagging

Presentation transcript:

Bidirectional CRF for NER Jin Mao Postdoc, School of Information, University of Arizona Sept 14th, 2016

CRF Interpretation Sequence Orders Feature Styles Model Integration AGENDA CRF Interpretation Sequence Orders Feature Styles Model Integration

CRF Interpretation CRF Let X :=(x1,...,xT ) be a sequence of tokens from a tokenized input sentence, Y :=(y1,...,yT ) be a sequence of tags. A CRF tagger selects Y that maximizes its conditional probability given X: a vector of binary features weights

CRF Interpretation CRF The probability of a tag yt depends only on its neighboring tags, given the entire sentence X The tagging by the CRF models is mainly determined by the features fα(yt−1,yt ,X) and their corresponding weights θα.

Define features in factored representation (Sha and Pereira, 2003) CRF Interpretation Feature function Define features in factored representation (Sha and Pereira, 2003) p(X,t) is a predicate (i.e. a boolean function) on X and current position t q(yt−1,yt ) is a predicate on pairs of tags, transition predicate

CRF Sequence Orders Forward VS Backward Forward Backward T-1+1 T-t+2

CRF Sequence Orders Forward VS Backward For t-th token, used the Kullback-Leibler(KL) divergence to measure the information gain from a prior distribution of yt to its posterior distribution after either previous or next tag becomes available: The training corpus from BioCreative 2: The next tag could be more helpful than the previous tag

CRF Sequence Orders Forward VS Backward The prediction accuracy of tag bigrams

HMM-style features described in Lafferty et al. (2001) CRF Feature Styles Feature function MALLET style: HMM-style features described in Lafferty et al. (2001) observation-independent state transition features

A CRF model is symmetric. CRF Feature Styles Symmetric Property A CRF model is symmetric.

CRF Feature Styles CRF with HMM-style features is symmetric, if: Symmetric Property CRF with HMM-style features is symmetric, if: two special tags are attached to the head and tail of Y the training set is the same with different orders all p predicates are either defined on a single token or symmetrically with regard to current position t.

CRF Feature Styles Thus, Symmetric Property Forward T Backward T-t+2 g2(yB; T-(t-1)+1) But, g1 + g2 for t-th token are not identical?

CRF Feature Styles Mallet-style: Symmetric Property t Forward Backward T-t+2 T-t+1 T-t T-t+1 T-1+1 t-1 Mallet-style: g1 is not symmetric any more! g1(X,Y; t) ≠ g1(XB,YB; tB)

CRF Feature Styles CRF++: HMM-style (default), Mallet-style(optional) Tools CRF++: HMM-style (default), Mallet-style(optional) Mallet: Mallet-style(default), HMM-style and others(optional)

CRF Model Integration Label results 10 best results with scores Find matches

CRF Model Integration Integration Method Simple set operations, intersection and union, failed to improve the performance because they lead to trade-off between recall and precision. Heuristic method: (1) compute the intersection of bi-directional parsing and select the solution in the intersection that minimizes the sum of its output scores; (2) for the other 18 solutions, select the labeled terms appearing in a dictionary with its length greater than three. approved gene symbols and aliases obtained from HUGO (Eyre et al., 2006) for the last step.

CRF Model Integration Integration Method Heuristic method: (1) Add more orders, 1, 2, 3, to Mallet (2) Find the intersection with the lowest scores. (3) If no tagging result appears in the top 10 lists of all models, the best tagging result of Order 1 backward model will be selected simply (4) Find the intersection of two CRF++ models (5) Union CRF++ results and Mallet results

CRF Model Integration Integration Method

Conclusions (1) Different types of feature construction affect whether a CRF model is symmetric. (2) Backward parsing models enjoy a slight advantage over forward parsing according to the information gain analysis. (3) The combination of different models can achieve higher F-scores.

Reference This presentation is from: Hsu, C. N., Chang, Y. M., Kuo, C. J., Lin, Y. S., Huang, H. S., & Chung, I. F. (2008). Integrating high dimensional bi-directional parsing models for gene mention tagging. Bioinformatics, 24(13), i286-i294.

Thank you!