Fast Full Parsing by Linear-Chain Conditional Random Fields Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou The University of Manchester.

Slides:

Advertisements

Similar presentations

Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.

Advertisements

PDAs Accept Context-Free Languages

Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.

Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.

The 5S numbers game..

ThemeInformation Extraction for World Wide Web PaperUnsupervised Learning of Soft Patterns for Generating Definitions from Online News Author Cui, H.,

The basics for simulations

Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou

Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.

Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov.

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)

HPSG parser development at U-tokyo Takuya Matsuzaki University of Tokyo.

Albert Gatt Corpora and Statistical Methods Lecture 11.

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.

Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein.

Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.

Part-of-speech tagging and chunking with log-linear models University of Manchester Yoshimasa Tsuruoka.

Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010.

Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)

John Lafferty, Andrew McCallum, Fernando Pereira

Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,

In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.

Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.

Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.

Re-ranking for NP-Chunking: Maximum-Entropy Framework By: Mona Vajihollahi.

Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.

Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.

Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.

1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.

Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.

Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.

Part-of-speech tagging and chunking with log-linear models University of Manchester National Centre for Text Mining (NaCTeM) Yoshimasa Tsuruoka.

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.

A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)

Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.

Graphical models for part of speech tagging

인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.

A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.

1 CS546: Machine Learning and Natural Language Latent-Variable Models for Structured Prediction Problems: Syntactic Parsing Slides / Figures from Slav.

Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.

AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,

INSTITUTE OF COMPUTING TECHNOLOGY Forest-based Semantic Role Labeling Hao Xiong, Haitao Mi, Yang Liu and Qun Liu Institute of Computing Technology Academy.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.

Conversion of Penn Treebank Data to Text. Penn TreeBank Project “A Bank of Linguistic Trees” (as of 11/1992) University of Pennsylvania, LINC Laboratory.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.

NLP. Introduction to NLP The probabilities don’t depend on the specific words –E.g., give someone something (2 arguments) vs. see something (1 argument)

Shallow Parsing for South Asian Languages -Himanshu Agrawal.

Conditional Markov Models: MaxEnt Tagging and MEMMs

NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.

Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.

Smoothing Issues in the Strucutred Language Model

Raymond J. Mooney University of Texas at Austin

Statistical NLP Spring 2011

Prototype-Driven Learning for Sequence Models

Machine Learning in Natural Language Processing

Chunk Parsing CS1573: AI Application Development, Spring 2003

Presentation transcript:

Fast Full Parsing by Linear-Chain Conditional Random Fields Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou The University of Manchester

Outline Motivation Parsing algorithm Chunking with conditional random fields Searching for the best parse Experiments Penn Treebank Conclusions

Motivation Parsers are useful in many NLP applications – Information extraction, Summarization, MT, etc. But parsing is often the most computationally expensive component in the NLP pipeline Fast parsing is useful when – The document collection is large – e.g. MEDLINE corpus: 70 million sentences – Real-time processing is required – e.g. web applications

Parsing algorithms History-based approaches – Bottom-up & left-to-right (Ratnaparkhi, 1997) – Shift-reduce (Sagae & Lavie 2006) Global modeling – Tree CRFs (Finkel et al., 2008; Petrov & Klein 2008) – Reranking (Collins 2000; Charniak & Johnson, 2005) – Forest (Huang, 2008)

Chunk parsing Parsing Algorithm 1.Identify phrases in the sequence. 2.Convert the recognized phrases into new non- terminal symbols. 3.Go back to 1. Previous work – Memory-based learning (Tjong Kim Sang, 2001) F-score: – Maximum entropy (Tsuruoka and Tsujii, 2005) F-score: 85.9

Parsing a sentence Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. QP NP VP NP S

Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. QP NP 1 st iteration

volume was a light million ounces. NP VBD DT JJ QP NNS. NP 2 nd iteration

volume was ounces. NP VBD NP. VP 3 rd iteration

volume was. NP VP. S 4 th iteration

was S 5 th iteration

Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. QP NP VP NP S Complete parse tree

Chunking with CRFs Conditional random fields (CRFs) Features are defined on states and state transitions Feature function Feature weight Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. QP NP

Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. Chunking with “IOB” tagging B-NPI-NPOOOB-QPI-QPOO NPQP B : Beginning of a chunk I : Inside (continuation) of the chunk O : Outside of chunks

Features for base chunking Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. ?

Features for non-base chunking volume was a light million ounces. NP VBD DT JJ QP NNS. NP VBN NN Estimated volume ?

Finding the best parse Scoring the entire parse tree The best derivation can be found by depth-first search.

Depth first search POS tagging Chunking (base) Chunking Chunking (base) Chunking

Finding the best parse

Extracting multiple hypotheses from CRF A* search – Uses a priority queue – Suitable when top n hypotheses are needed Branch-and-bound – Depth-first – Suitable when a probability threshold is given CRF BIOOOB 0.3 BIIOOB 0.2 BIOOOO 0.18

Experiments Penn Treebank Corpus – Training:sections 2-21 – Development: section 22 – Evaluation:section 23 Training – Three CRF models Part-of-speech tagger Base chunker Non-base chunker – Took 2 days on AMD Opteron 2.2GHz

Training the CRF chunkers Maximum likelihood + L1 regularization L1 regularization helps avoid overfitting and produce compact modes – OWLQN algorithm (Andrew and Gao, 2007)

Chunking performance Symbol# SamplesRecallPrecisonF-score NP317, VP76, PP66, S33, ADVP21, ADJP14, ::::: All579, Section 22, all sentences

Beam width and parsing performance BeamRecallPrecisionF-scoreTime (sec) Section 22, all sentences (1,700 sentences)

Comparison with other parsers RecallPrec.F-scoreTime (min) This work (deterministic) This work (beam = 4) Huang (2008)91.7Unk Finkel et al. (2008) >250 Petrov & Klein (2008)88.33 Sagae & Lavie (2006) Charniak & Johnson (2005) Unk Charniak (2000) Collins (1999) Section 23, all sentences (2,416 sentences)

Discussions Improving chunking accuracy – Semi-Markov CRFs (Sarawagi and Cohen, 2004) – Higher order CRFs Increasing the size of training data – Create a treebank by parsing a large number of sentences with an accurate parser – Train the fast parser using the treebank

Conclusion Full parsing by cascaded chunking – Chunking with CRFs – Depth-first search Performance – F-score = 86.9 (12msec/sentence) – F-score = 88.4 (42msec/sentence) Available soon