Download presentation
Presentation is loading. Please wait.
Published bySebastian Newhouse Modified over 10 years ago
1
Fast Full Parsing by Linear-Chain Conditional Random Fields Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou The University of Manchester
2
Outline Motivation Parsing algorithm Chunking with conditional random fields Searching for the best parse Experiments Penn Treebank Conclusions
3
Motivation Parsers are useful in many NLP applications – Information extraction, Summarization, MT, etc. But parsing is often the most computationally expensive component in the NLP pipeline Fast parsing is useful when – The document collection is large – e.g. MEDLINE corpus: 70 million sentences – Real-time processing is required – e.g. web applications
4
Parsing algorithms History-based approaches – Bottom-up & left-to-right (Ratnaparkhi, 1997) – Shift-reduce (Sagae & Lavie 2006) Global modeling – Tree CRFs (Finkel et al., 2008; Petrov & Klein 2008) – Reranking (Collins 2000; Charniak & Johnson, 2005) – Forest (Huang, 2008)
5
Chunk parsing Parsing Algorithm 1.Identify phrases in the sequence. 2.Convert the recognized phrases into new non- terminal symbols. 3.Go back to 1. Previous work – Memory-based learning (Tjong Kim Sang, 2001) F-score: 80.49 – Maximum entropy (Tsuruoka and Tsujii, 2005) F-score: 85.9
6
Parsing a sentence Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. QP NP VP NP S
7
Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. QP NP 1 st iteration
8
volume was a light million ounces. NP VBD DT JJ QP NNS. NP 2 nd iteration
9
volume was ounces. NP VBD NP. VP 3 rd iteration
10
volume was. NP VP. S 4 th iteration
11
was S 5 th iteration
12
Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. QP NP VP NP S Complete parse tree
13
Chunking with CRFs Conditional random fields (CRFs) Features are defined on states and state transitions Feature function Feature weight Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. QP NP
14
Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. Chunking with “IOB” tagging B-NPI-NPOOOB-QPI-QPOO NPQP B : Beginning of a chunk I : Inside (continuation) of the chunk O : Outside of chunks
15
Features for base chunking Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. ?
16
Features for non-base chunking volume was a light million ounces. NP VBD DT JJ QP NNS. NP VBN NN Estimated volume ?
17
Finding the best parse Scoring the entire parse tree The best derivation can be found by depth-first search.
18
Depth first search POS tagging Chunking (base) Chunking Chunking (base) Chunking
19
Finding the best parse
20
Extracting multiple hypotheses from CRF A* search – Uses a priority queue – Suitable when top n hypotheses are needed Branch-and-bound – Depth-first – Suitable when a probability threshold is given CRF BIOOOB 0.3 BIIOOB 0.2 BIOOOO 0.18
21
Experiments Penn Treebank Corpus – Training:sections 2-21 – Development: section 22 – Evaluation:section 23 Training – Three CRF models Part-of-speech tagger Base chunker Non-base chunker – Took 2 days on AMD Opteron 2.2GHz
22
Training the CRF chunkers Maximum likelihood + L1 regularization L1 regularization helps avoid overfitting and produce compact modes – OWLQN algorithm (Andrew and Gao, 2007)
23
Chunking performance Symbol# SamplesRecallPrecisonF-score NP317,59794.7994.1694.47 VP76,28191.4691.9891.72 PP66,97992.8492.6192.72 S33,73991.4890.6491.06 ADVP21,68684.2585.8685.05 ADJP14,42277.2778.4677.86 ::::: All579,25392.6392.6292.63 Section 22, all sentences
24
Beam width and parsing performance BeamRecallPrecisionF-scoreTime (sec) 186.7287.8387.2716 288.5088.8588.6741 388.6989.0888.8861 488.7289.1388.9292 588.7389.1488.93119 1088.6889.1988.93179 Section 22, all sentences (1,700 sentences)
25
Comparison with other parsers RecallPrec.F-scoreTime (min) This work (deterministic)86.387.586.90.5 This work (beam = 4)88.288.788.41.7 Huang (2008)91.7Unk Finkel et al. (2008)87.888.288.0>250 Petrov & Klein (2008)88.33 Sagae & Lavie (2006)87.888.187.917 Charniak & Johnson (2005)90.691.391.0Unk Charniak (2000)89.689.5 23 Collins (1999)88.188.388.239 Section 23, all sentences (2,416 sentences)
26
Discussions Improving chunking accuracy – Semi-Markov CRFs (Sarawagi and Cohen, 2004) – Higher order CRFs Increasing the size of training data – Create a treebank by parsing a large number of sentences with an accurate parser – Train the fast parser using the treebank
27
Conclusion Full parsing by cascaded chunking – Chunking with CRFs – Depth-first search Performance – F-score = 86.9 (12msec/sentence) – F-score = 88.4 (42msec/sentence) Available soon
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.