Inside-outside reestimation from partially bracketed corpora F. Pereira and Y. Schabes ACL 30, 1992 CS730b김병창 NLP Lab. 1998. 10. 29.

Slides:

Advertisements

Similar presentations

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.

Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 11.

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.

ANNIC ANNotations In Context GATE Training Course 27 – 28 April 2006 Niraj Aswani.

101 The Cocke-Kasami-Younger Algorithm An example of bottom-up parsing, for CFG in Chomsky normal form G :S  AB | BB A  CC | AB | a B  BB | CA | b C.

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.

10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :

1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.

March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.

Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.

Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.

Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.

RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc.

An Attempt at Unsupervised Learning of Hierarchical Dependency Parsing via the Dependency Model with Valence (DMV)

The Use of Corpora for Automatic Evaluation of Grammar Inference Systems Andrew Roberts & Eric Atwell Corpus Linguistics ’03 – 29 th March Computer Vision.

CISC667, F05, Lec19, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) RNA secondary structure.

Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.

More on Text Management. Context Free Grammars Context Free Grammars are a more natural model for Natural Language Syntax rules are very easy to formulate.

Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.

11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.

Learning PCFGs: Estimating Parameters, Learning Grammar Rules Many slides are taken or adapted from slides by Dan Klein.

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

Text Models. Why? To “understand” text To assist in text search & ranking For autocompletion Part of Speech Tagging.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.

Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

ANNIC ANNotations In Context GATE Training Course October 2006 Kalina Bontcheva (with help from Niraj Aswani)

Part of Speech Tagging & Hidden Markov Models Mitch Marcus CSE 391.

Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.

1 Statistical Parsing Chapter 14 October 2012 Lecture #9.

GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.

Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz.

Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.

Text Models Continued HMM and PCFGs. Recap So far we have discussed 2 different models for text – Bag of Words (BOW) where we introduced TF-IDF Location.

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities.

RNA Structure Prediction Including Pseudoknots Based on Stochastic Multiple Context-Free Grammar PMSB2006, June 18, Tuusula, Finland Yuki Kato, Hiroyuki.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.

November 2011CLINT-LN CFG1 Computational Linguistics Introduction Context Free Grammars.

Tokenization & POS-Tagging

CSA2050 Introduction to Computational Linguistics Parsing I.

Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.

Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.

Natural Language Processing Lecture 15—10/15/2015 Jim Martin.

Probabilistic Context Free Grammars Grant Schindler 8803-MDM April 27, 2006.

Chunk Parsing II Chunking as Tagging. Chunk Parsing “Shallow parsing has become an interesting alternative to full parsing. The main goal of a shallow.

December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)

NTU & MSRA Ming-Feng Tsai

/02/20161 Probabilistic Context Free Grammars Chris Brew Ohio State University.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)

A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.

The estimation of stochastic context-free grammars using the Inside-Outside algorithm Oh-Woog Kwon KLE Lab. CSE POSTECH.

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

PCFG estimation with EM The Inside-Outside Algorithm.

Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.

Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:

CSC 594 Topics in AI – Natural Language Processing

CSC 594 Topics in AI – Natural Language Processing

CSC 594 Topics in AI – Natural Language Processing

Training Tree Transducers

N-Gram Model Formulas Word sequences Chain rule of probability

Chunk Parsing CS1573: AI Application Development, Spring 2003

Parsing Unrestricted Text

David Kauchak CS159 – Spring 2019

Presentation transcript:

Inside-outside reestimation from partially bracketed corpora F. Pereira and Y. Schabes ACL 30, 1992 CS730b김병창 NLP Lab

NLP Lab., POSTECH 2Contents oMotivation oPartially Bracketed Text oGrammar Reestimation m The Inside-Outside Algorithm m The Extended Algorithm m Complexity oExperimental Evaluation m Inferring the Palindrome Language m Experiments on the ATIS Corpus oConclusions and Further Work

NLP Lab., POSTECH 3 Motivation I oVery simple method for learning SCFGs [Charniak] m Generate all possible SCFG rules m Assign some initial probabilities m Run the training algorithm on a sample text  raw text m remove those rules with zero probabilities oDifficulties in using SCFGs m Time complexity - O(n 3 |w| 3 ) n : the number of non-terminalsw : training sentence cf. O(s 2 |w|) : training an HMM with s states m Bad convergence properties The larger number of non-terminals, the worse. m Inferred only by chance

NLP Lab., POSTECH 4 Motivation II oExtension of the Inside-Outside algorithm m Inferring grammars from a partially parsed corpus m Advantages constituent boundary information in grammar reduced number of iteration for training better time complexity

NLP Lab., POSTECH 5 Partially Bracketed Text oExample m (((VB(DT NNS(IN((NN)(NN CD)))))).) m (((List (the fares(for((flight)(number 891)))))).) oNotations m Corpus C = { c | c = ( w, B) }, w : string, B : bracketing of w m w=w 1 w 2  w i w i+1  w j  w |w| m (i,j) delimits i w j m consistent : no overlapping in a bracketing m compatible : union of two bracketing is consistent m valid : a span is compatible with a bracketing m span in derivation  0   1     m =w if j=m, span of w i in  j is (i-1,i) if j<m,  j =  A ,  j+1 =  X 1  X k , span A in  j is (i 1,j k )

NLP Lab., POSTECH 6 Grammar Reestimation oUsing reestimation algorithm m parameter estimates for a SCFG derived by other means m grammar inferring from scratch oGrammar inferring m Given set N of Non-terminals, set  of terminals n=|N|, t=|  | N={A 1, ,A n },  ={b 1, ,b t } m CNF SCFG over N,  : n 3 +nt probabilities B p,q,r on A p  A q A r : n 3 U p,m on A p  b m : nt m oMeaning of rule probabilities : intuition of context freeness

NLP Lab., POSTECH 7 The Inside-Outside Algorithm oDefinition of inner (e) and outer (f) probabilities S i 1s-1t+1Tst Inner probability Outer probability i S i Special thanks to ohwoog

NLP Lab., POSTECH 8 The Extended Algorithm oCompatible function oExtended algorithm m Table 1. 참조 m Inside probabilities : (1), (2) ; (2) 에 compatible function 사용. m Outside probabilities : (3), (4); (4) 에 compatible function 사용. m Parameter reestimation : (5), (6) ; original algorithm 과 같음. oStopping criterior m When the cross entropy estimate becomes negligible.

NLP Lab., POSTECH 9Complexity oComplexity of original algorithm : O(|w| 3 ) for each sentence m computation of inside probability, computation of outside probability and rule probability reestimation : 각각 O(|w| 3 ) for each sentence oComplexity of extended algorithm : O(|w|) at best case m In the case of full binary bracketing B of a string w O(|w|) spans in B Only one split point for each (i,k) Each valid span must be a member of B. m Preprocessing Enumerating valid spans and split points

NLP Lab., POSTECH 10 Experimental Evaluation oTwo experiments m Artificial Language ; Palindrome m Natural Language ; Penn Treebank oEvaluation m Bracketing accuracy proportion of phrases that are compatible

NLP Lab., POSTECH 11 Inferring the Palindrome Language oL={ww R |w  {a,b}*} oInitial grammar : 135 rules ( =5 3 +5*2 ) oTraining with 100 sentences oInferred grammar : correct palindrome language grammar oBracketing accuracy : above 90% (100% in several cases) m In the unbracketing training : 15% - 69%

NLP Lab., POSTECH 12 Experiments on the ATIS Corpus oATIS(Air Travel Information System) corpus ; 770 sentences (7812 words) m 700 training set, 70 test set (901 words) oInitial grammar : 4095 rules ( = *48) m 15 nonterminals, 48 terminal symbols for POS tags oBracketing accuracy : 90.36% after 75 iteration m In the unbracketing training : 37.35% oIn the case (A) m (Delta flight number) : not compatible m (the cheapest) : linguistically wrong ; lack of information m 16 incompatibles in G R oIn the case (B) m fully compatible m 9 incompatibles in G R

NLP Lab., POSTECH 13 Conclusions and Further Work oThe use of partially bracketed corpus can m reduce the number of iterations for convergence m find good solution m infer grammars specifying linguistically reasonable constituent boundaries m reduce time complexity (linear in the best case) oMore Extensions m determination of sensitivity to the initial probability assignments training corpus lack or misplacement of brackets. m larger terminal vocabularies