Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협.

Slides:

Advertisements

Similar presentations

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.

Advertisements

Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)

May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.

Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,

Statistical Methods and Linguistics - Steven Abney Thur. POSTECH Computer Science NLP Lab Shim Jun-Hyuk.

10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :

In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.

Advanced AI - Part II Luc De Raedt University of Freiburg WS 2004/2005 Many slides taken from Helmut Schmid.

Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.

Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.

Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

CS4705 Natural Language Processing.  Regular Expressions  Finite State Automata ◦ Determinism v. non-determinism ◦ (Weighted) Finite State Transducers.

Albert Gatt LIN3022 Natural Language Processing Lecture 8.

Midterm Review CS4705 Natural Language Processing.

1/13 Parsing III Probabilistic Parsing and Conclusions.

Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,

Grammar induction by Bayesian model averaging Guy Lebanon LARG meeting May 2001 Based on Andreas Stolcke’s thesis UC Berkeley 1994.

Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.

Statistical Natural Language Processing Advanced AI - Part II Luc De Raedt University of Freiburg WS 2005/2006 Many slides taken from Helmut Schmid.

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.

Spring /22/071 Beyond PCFGs Chris Brew Ohio State University.

1 Statistical Parsing Chapter 14 October 2012 Lecture #9.

Discriminative Syntactic Language Modeling for Speech Recognition Michael Collins, Brian Roark Murat, Saraclar MIT CSAIL, OGI/OHSU, Bogazici University.

Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad

May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.

Chapter6. Statistical Inference : n-gram Model over Sparse Data 이 동 훈 Foundations of Statistic Natural Language Processing.

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.

A Multi-span Language Modeling Frame Work For Speech Recognition Jimmy Wang Speech Lab, NTU.

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.

Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.

Albert Gatt Corpora and Statistical Methods Lecture 11.

Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.

Tokenization & POS-Tagging

Chapter 23: Probabilistic Language Models April 13, 2004.

1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.

Hidden-Variable Models for Discriminative Reranking Jiawen, Liu Spoken Language Processing Lab, CSIE National Taiwan Normal University Reference: Hidden-Variable.

Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature Deyu Zhou, Yulan He and Chee Keong Kwoh School of Computer Engineering.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Inside-outside reestimation from partially bracketed corpora F. Pereira and Y. Schabes ACL 30, 1992 CS730b김병창 NLP Lab

LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.

Supertagging CMSC Natural Language Processing January 31, 2006.

Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.

A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)

Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

Language Model for Machine Translation Jang, HaYoung.

PRESENTED BY: PEAR A BHUIYAN

Probabilistic and Lexicalized Parsing

N-Gram Model Formulas Word sequences Chain rule of probability

CS4705 Natural Language Processing

Parsing Unrestricted Text

Presentation transcript:

Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협

CS730B 2Contents oAbstract oIntroduction oDecision-Tree Modeling oSPATTER Parsing oStatistical Parsing Models oDecision-Tree Growing & Smoothing oDecision-Tree Training oExperiment Results oConclusion

CS730B 3Abstract oSyntactic NL parser: not adequate for highly-ambiguous large- vocabulary text (ex. Wall Street Journal) oPremises for develop a new parser m grammars too complex to develop manually for most domains m parsing models must rely heavily on contextual information m existing n-gram model: inadequate for parsing oSPATTER: a statistical parser based on decision-tree model m better than a grammar-based parser

CS730B 4Introduction oParsing as making a sequence of disambiguation decisions oThe probability of a complete parse tree(T) of a sentence(S) oAutomatically discovering the rules for disambiguation oProducing a parser without a complicated grammar oLong-distance lexical information is crucial to disambiguate interpretations accurately

CS730B 5 Decision-Tree Modeling oComparison m Grammarian: two crucial tasks for parsing identifying the features relevant to each decision deciding which choice to select based on the values of the features m Decision-Tree: above 2 tasks + 3rd task assigning a probability distribution to the possible choices, and providing a ranking system

CS730B 6Continued oWhat is a Statistical Decision Tree? m A decision-making device assigning a probability to each of the possible choices based on the context of the decision m P ( f | h ), where f : an element of the future vocabulary h : a history (the context of the decision) m The probability determined by asking a sequence of questions m i th question determined by the answers to the i - 1 previous question m Example: Part-of-speech tagging problem ( Figure 1 )

CS730B 7Continued oDecision Trees vs. n-grams m Equivalent to an interpolated n - gram model in expressive power m Model Parameterization n -gram model: n -gram model can be represented by decision-tree model ( n-1 questions ) Example: part-of-speech tagging

CS730B 8Continued m Model Estimation n-gram model

CS730B 9Continued decision-tree model decision-tree model can be represented by interpolated n- gram

CS730B 10Continued oWhy use decision-tree? m As n grows, the parameter space for an n-gram model grows exponentially m On the other hand, the decision-tree learning algorithm increases the size of a model only as the training data allows m So, it can consider much contextual information

CS730B 11 SPATTER Parsing oSPATTER Representation m Parse: as a geometric pattern m 4 features in node: words, tags, labels, and extensions (Figure 3) oThe Parsing Algorithm m Starting with the sentence’s words as leaves (Figure 3) m Gradually tagging, labeling, and extending nodes m Constraints Bottom-up, left-to-right No new node is constructed until its children completed Using DWC( derivational window constraints ), # of active nodes restricted m A single-rooted, labeled tree is constructed

CS 730B 12 Statistical Parsing Models oThe Tagging Model oThe Extension Model oThe Label Model oThe Derivation Model oThe Parsing Model

CS730B 13 Decision-Tree Growing & Smoothing o3 main models (tagging, extension, and label) oDividing the training corpus into 2 sets: (90% for growing, 10% for smoothing) oGrowing & Smoothing Algorithm m Figure 3.5

CS730B 14 Decision-Tree Training oParsing model can not be estimated by direct frequency counts because the model contains a hidden component: the derivation model oIn the corpus, no information about orders of derivations oSo, the training process must process discover which derivations assign higher probability to the parses oForward-Backward Reestimation used

CS730B 15Continued oTraining Algorithm

CS730B 16 Experiment Results oIBM computer Manual m annotated by the University of Lancaster m 195 part-of-speech tags and 19 non-terminal labels m trained on 30,800 sentences, and tested on 1,473 new sentences m 0-crossing-brackets score IBM’s rule-based, unification-style PCFG parse: 69% SPATTER: 76%

CS730B 17Continued oWall Street Journal m To test ability to accurately parse a highly-ambiguous, large- vocabulary domain m Annotated in the Penn Treebank, version 2 m 46 part-of-speech tags, and 27 non-terminal labels m Trained on 40,000 sentences, and tested on 1,920 new sentences m Using PARSEVAL

CS730B 18Conclusion oLarge amounts of contextual information can be incorporated into a statistical model for by applying decision-tree learning algorithm oAutomatically discovering rules are possible