A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Slides:



Advertisements
Similar presentations
Albert Gatt Corpora and Statistical Methods Lecture 11.
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
UNIT-III By Mr. M. V. Nikum (B.E.I.T). Programming Language Lexical and Syntactic features of a programming Language are specified by its grammar Language:-
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Evaluating Hypotheses
Grammar induction by Bayesian model averaging Guy Lebanon LARG meeting May 2001 Based on Andreas Stolcke’s thesis UC Berkeley 1994.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
1 Efficient Discovery of Conserved Patterns Using a Pattern Graph Inge Jonassen Pattern Discovery Arwa Zabian 13/07/2015.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Natural Language Understanding
Albert Gatt Corpora and Statistical Methods Lecture 9.
1 Two-dimensional Context-Free Grammars: Mathematical Formulae Recognition Daniel Průša, Václav Hlaváč Center for Machine Perception Faculty of Electrical.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Issues with Data Mining
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
CS 2104 Prog. Lang. Concepts Dr. Abhik Roychoudhury School of Computing Introduction.
Spring /22/071 Beyond PCFGs Chris Brew Ohio State University.
Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi and introduced in Tree-adjoining grammars are somewhat similar to context-free.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Muhammad Idrees, Lecturer University of Lahore 1 Top-Down Parsing Top down parsing can be viewed as an attempt to find a leftmost derivation for an input.
Supertagging CMSC Natural Language Processing January 31, 2006.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
Data Mining and Decision Support
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
The estimation of stochastic context-free grammars using the Inside-Outside algorithm Oh-Woog Kwon KLE Lab. CSE POSTECH.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Estimation of Distribution Algorithm and Genetic Programming Structure Complexity Lab,Seoul National University KIM KANGIL.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Natural Language Processing Vasile Rus
CSC 594 Topics in AI – Natural Language Processing
Linguistic Graph Similarity for News Sentence Searching
Web News Sentence Searching Using Linguistic Graph Similarity
Natural Language Processing (NLP)
CS 388: Natural Language Processing: Syntactic Parsing
Parsing and More Parsing
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
N-Gram Model Formulas Word sequences Chain rule of probability
Parsing Unrestricted Text
Natural Language Processing (NLP)
David Kauchak CS159 – Spring 2019
Natural Language Processing (NLP)
Presentation transcript:

A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation University of Amsterdam

Outline of the lecture  Introduction  Disambiguation  Data Oriented Parsing  DOP1 computational aspects and experiments  Memory Based Learning framework  Conclusions

Introduction  Human language cognition: Analogy-based processes on a store of past experiences  Modern linguistics Set of rules  Language processing algorithms  Performance model of human language processing Competence grammar as broad framework to performance models. Memory / Analogy - based language processing

The Problem of Ambiguity Resolution  Every input string has unmanageable large number of analyses  Uncertain input – generate guesses and choose one  Syntactic disambiguation might be a side effect of semantic one

The Problem of Ambiguity Resolution  Frequency of occurrence of lexical item and syntactic structures: People register frequencies People prefer analyses they already experienced than constructing a new ones More frequent analyses are preferred to less frequent ones

From Probabilistic Competence- Grammars to Data-Oriented Parsing  Probabilistic information derived from past experience  Characterization of the possible sentence-analyses of the language  Stochastic Grammar Define : all sentences, all analyses. Assign : probability for each Achieve : preference that people display when they choose sentence or analyses.

Stochastic Grammar  These predictions are limited  Platitudes and conventional phrases  Allow redundancy  Use Tree Substitution Grammar

Stochastic Tree Substitution Grammar  Set of elementary trees  Tree rewrite process  Redundant model  Statistically relevant phrases  Memory based processing model

Memory based processing model  Data oriented parsing approach: Corpus of utterances – past experience STSG to analyze new input  In order to describe a specific DOP model A formalism for representing utterance- analyses An extraction function Combination operations A probability model

A Simple Data Oriented Parsing Model: DOP1  Our corpus: DOP1 - Imaginary corpus of two treesDOP1 - Imaginary corpus of two trees  Possible sub trees: t consists of more than one node t is connected except for the leaf nodes of t, each node in t has the same daughter-nodes as the corresponding node in T  Stochastic Tree Substitution Grammar – set of sub trees  Generation process – composition: A B – B is substituted on the leftmost non terminal leaf node of A

Example of sub trees

DOP1 - Imaginary corpus of two trees

Derivation and parse #1 She saw the dress with the telescope.

Derivation and parse #2 She saw the dress with the telescope.

Probability Computations:  Probability of substituting a sub tree t on a specific node  Probability of Derivation  Probability of Parse Tree

Computational Aspects of DOP1  Parsing  Disambiguation Most Probable Derivation Most Probable Parse  Optimizations

Parsing  Chart-like parse forest  Derivation forest Elementary tree t as a context-free rule: root(t) — > yield(t) Label phrase with it ’ s syntactic category and its full elementary tree

Elementary trees of an example STSG

Derivation forest for the string abcd

Derivations and parse trees for the string abcd

Disambiguation  Derivation forest define all derivation and parses  Most likely parse must be chosen  MPP in DOP1  MPP vs. MPD

Most Probable Derivation  Viterbi algorithm: Eliminate low probability sub derivations using bottom-up fashion Select the most probable sub derivation at each chart entry, eliminate other sub derivation of that root node.

Viterbi algorithm  Two derivations for abc  d1 > d2 : eliminate the right derivation

Algorithm 1 – Computing the probability of most probable derivation  Input : STSG, S, R, P  Elementary trees in R are in CNF  A — >t H : tree t, root A, sequence of labels H.  - non terminal A in chart entry (i,j) after parsing the input W1,...,Wn.  PPMPD – probability of MPD of input string W1,...,Wn.

Algorithm 1 – Computing the probability of most probable derivation

The Most Probable Parse  Computing MPP in STSG is NP hard  Monte Carlo method Sample derivations Observe frequent parse tree Estimate parse tree probability Random – first search  The algorithm  Law of Large Numbers

Algorithm 2: Sampling a random derivation  for length := 1 to n do for start := 0 to n - length do  for each root node X chart-entry (start, start + length) do: 1. select at random a tree from the distribution of elementary trees with root node X 2. eliminate the other elementary trees with root node X from this chart-entry

Results of Algorithm 2  Random derivation for the whole sentence  First guess for MPP  Compute the size of the sampling set Probability of error  Upper bound  0 index of MPP,i index of parse i, N derivation No unique MPP – ambiguity

Reminder

Conclusions – lower bound for N  Lower bound for N: Pi is probability of parse i B - Estimated probability by frequencies in N Var(B) = Pi*(1-Pi)/N 0 Var(B) <= 1/(4*N) s = sqrt(Var(B)) -> S <= 1/(2*sqrt(N)) 1/(4*s^2) <= N 100 s <= 0.05

Algorithm 3: Estimating the parse probabilities  Given a derivation forest of a sentence and a threshold sm for the standard error:  N := the smallest integer larger than 1/(4 sm 2)  repeat N times: sample a random derivation from the derivation forest store the parse generated by this derivation  for each parse i: estimate the conditional probability given the sentence by pi := #(i) / N

Complexity of Algorithm 3  Assumes value of max allowed standard error  Samples number of derivations which is guaranteed to achieve the error  Number of needed samples is quadratic in chosen error

Optimizations  Sima ’ an : MPD in linear time in STSG size  Bod : MPP on small random corpus of sub trees  Sekine and Grishman : use only sub trees rooted with S or NP  Goodman : different polynomial time

Experimental Properties of DOP1  Experiments on the ATIS corpus MPP vs. MPD Impact of fragment size Impact of fragment lexicalization Impact of fragment frequency  Experiments on SRI-ATIS and OVIS Impact of sub tree depth

Experiments on ATIS corpus  ATIS = Air Travel Information System  750 annotated sentence analyses  Annotated by Penn Treebank  Purpose: compare accuracy obtained in undiluted DOP1 with the one obtained in restricted STSG

Experiments on ATIS corpus  Divide into training and test sets 90% = 675 in training set 10% = 75 in test set  Convert training set into fragments and enrich with probabilities  Test set sentences parsed with sub trees from the training set  MPP was estimated from 100 sampled derivations  Parse accuracy = % of MPP that are identical to test set parses

Results  On 10 random training / test splits of ATIS: Average parse accuracy = 84.2% Standard deviation = 2.9 %

Impact of overlapping fragments MPP vs. MPD  Can MPD achieve parse accuracies similar to MPP  Can MPD do better than MPP Overlapping fragments  Accuracies generated by MPD on test set  The result is 69%  Comparing to accuracy achieved with MPP on test set : 69% vs. 85%  Conclusion: overlapping fragments play important role in predicting the appropriate analysis of a sentence

The impact of fragment size  Large fragments capture more lexical/syntactic dependencies than small ones.  The experiment: Use DOP1 with restricted maximum depth Max depth 1 -> DOP1 = SCFG Compute the accuracies both for MPD and MPP for each max depth

Impact of fragment size

Impact of fragment lexicalization  Lexicalized fragment  More words -> more lexical dependencies  Experiment: Different version of DOP1 Restrict max number of words per fragment Check accuracy for MPP and MPD

Impact of fragment lexicalization

Impact of fragment frequency  Frequent fragments contribute more  large fragments are less frequent than small ones but might contribute more  Experiment: Restrict frequency to min number of occurrences Not other restrictions Check accuracy for MPP

Impact of fragment frequency

Experiments on SRI-ATIS and OVIS  Employ MPD because the corpus is bigger  Tests performed on DOP1 and SDOP  Use set of heuristic criteria for selecting the fragments: Constraints of the form of sub trees  d - upper bound on depth  n – number of substitution sites  l – number of terminals  L – number of consecutive terminals Apply constraints on all sub trees besides those with depth 1

Experiments on SRI-ATIS and OVIS  d4 n2 l7 L3  DOP(i)  Evaluation metrics: Recognized Tree Language Coverage – TLC Exact match Labeled bracketing recall and precision

Experiments on SRI-ATIS  annotated syntactically utterances  Annotation scheme originated from Core Language Engine system  Fixed parameters except sub tree bound: n2 l4 L3  Training set – trees  Test set – 1000 trees  Experiment: Train and test on different depths upper bounds (takes more than 10 days for DOP(4) !!! )

Impact of sub tree depth SRI-ATIS

Experiments on OVIS corpus  syntactically and semantically annotated trees  Both annotations treated as one  More non terminal symbols  Utterances are answers to questions in dialog -> short utterances (avg. 3.43)  Sima ’ an results – sentences with at least 2 words, avg  n2 l7 L3

Experiments on OVIS corpus  Experiment: Check different sub tree depth  1,3,4,5 Test set with 1000 trees Train set with 9000 trees

Impact of sub tree depth - OVIS

Summary of results  ATIS: Accuracy of parsing is 85% Overlapping fragments have impact on accuracy Accuracy increases as fragment depth increases both for MPP and MPD Optimal lexical maximum for ATIS is 8 Accuracy decreases if lower bound of fragment frequency increases (for MPP)

Summary of results  SRI-ATIS: Availability of more data is more crucial to accuracy of MPD. Depth has impact Accuracy is improved when using memory based parsing(DOP(2)) and not SCFG (DOP(1))

Summary of results  OVIS: Recognition power isn ’ t affected by depth No big difference between exact match in DOP1(1) and DOP1(4) mean and standard deviations

DOP: probabilistic recursive MBL  Relationship between present DOP framework and Memory Based Learning framework  DOP extends MBL to deal with disambiguation  MBL vs. DOP Flat or intermediate description vs. hierarchical

Case Based Reasoning - CBR  Case Based learning Lazy learning, doesn ’ t generalize Lazy generalization  Classify by means of similarity function  Refer this paradigm as MBL  CBR vs. other variants of MBL Task concept Similarity function Learning task

The DOP framework and CBR  CBR method A formalism for representing utterance- analyses - case description language An extraction function – retrieve units Combination operations – reuse and revision  Missing in DOP: Similarity function  Extend CBR: A probability model DOP model defines CBR system for natural language analysis

DOP1 and CBR methods  DOP1 as extension to CBR system  = classified instance  Retrieve sub trees and construct tree  Sentence = instance  Tree = class  Set of sentences = instance space  Set of trees – class space  Frontier, SSF,  Infinite runtime case-base containing instance-class-weight triples:

DOP1 and CBR methods  Task and similarity function: Task = disambiguation Similarity function:  Parsing -> recursive string matching procedure  Ambiguity -> computing probability and selecting the highest.  Conclusion: DOP1 is a lazy probabilistic recursive CBR classifier

DOP vs. other MBL approached in NLP  K-NN vs. DOP  Memory Based Sequence Learning DOP – stochastic model fro computing probabilities MBSL – ad hoc heuristics for computing scores DOP – globally based ranking strategy of alternative analyzes MBSL – locally based one Different generalization power

Conclusions  Memory Based aspects of DOP model  Disambiguation  Probabilities to account frequencies  DOP as probabilistic recursive Memory Based model  DOP1 - properties, computational aspects and experiments.  DOP and MBL - differences