Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Similar presentations

Presentation on theme: "A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation."— Presentation transcript:

1 A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation University of Amsterdam

2 Outline of the lecture  Introduction  Disambiguation  Data Oriented Parsing  DOP1 computational aspects and experiments  Memory Based Learning framework  Conclusions

3 Introduction  Human language cognition: Analogy-based processes on a store of past experiences  Modern linguistics Set of rules  Language processing algorithms  Performance model of human language processing Competence grammar as broad framework to performance models. Memory / Analogy - based language processing

4 The Problem of Ambiguity Resolution  Every input string has unmanageable large number of analyses  Uncertain input – generate guesses and choose one  Syntactic disambiguation might be a side effect of semantic one

5 The Problem of Ambiguity Resolution  Frequency of occurrence of lexical item and syntactic structures: People register frequencies People prefer analyses they already experienced than constructing a new ones More frequent analyses are preferred to less frequent ones

6 From Probabilistic Competence- Grammars to Data-Oriented Parsing  Probabilistic information derived from past experience  Characterization of the possible sentence-analyses of the language  Stochastic Grammar Define : all sentences, all analyses. Assign : probability for each Achieve : preference that people display when they choose sentence or analyses.

7 Stochastic Grammar  These predictions are limited  Platitudes and conventional phrases  Allow redundancy  Use Tree Substitution Grammar

8 Stochastic Tree Substitution Grammar  Set of elementary trees  Tree rewrite process  Redundant model  Statistically relevant phrases  Memory based processing model

9 Memory based processing model  Data oriented parsing approach: Corpus of utterances – past experience STSG to analyze new input  In order to describe a specific DOP model A formalism for representing utterance- analyses An extraction function Combination operations A probability model

10 A Simple Data Oriented Parsing Model: DOP1  Our corpus: DOP1 - Imaginary corpus of two treesDOP1 - Imaginary corpus of two trees  Possible sub trees: t consists of more than one node t is connected except for the leaf nodes of t, each node in t has the same daughter-nodes as the corresponding node in T  Stochastic Tree Substitution Grammar – set of sub trees  Generation process – composition: A B – B is substituted on the leftmost non terminal leaf node of A

11 Example of sub trees

12 DOP1 - Imaginary corpus of two trees

13 Derivation and parse #1 She saw the dress with the telescope.

14 Derivation and parse #2 She saw the dress with the telescope.

15 Probability Computations:  Probability of substituting a sub tree t on a specific node  Probability of Derivation  Probability of Parse Tree

16 Computational Aspects of DOP1  Parsing  Disambiguation Most Probable Derivation Most Probable Parse  Optimizations

17 Parsing  Chart-like parse forest  Derivation forest Elementary tree t as a context-free rule: root(t) — > yield(t) Label phrase with it ’ s syntactic category and its full elementary tree

18 Elementary trees of an example STSG 0 1 2 3 4

19 Derivation forest for the string abcd

20 Derivations and parse trees for the string abcd


22 Disambiguation  Derivation forest define all derivation and parses  Most likely parse must be chosen  MPP in DOP1  MPP vs. MPD

23 Most Probable Derivation  Viterbi algorithm: Eliminate low probability sub derivations using bottom-up fashion Select the most probable sub derivation at each chart entry, eliminate other sub derivation of that root node.

24 Viterbi algorithm  Two derivations for abc  d1 > d2 : eliminate the right derivation

25 Algorithm 1 – Computing the probability of most probable derivation  Input : STSG, S, R, P  Elementary trees in R are in CNF  A — >t H : tree t, root A, sequence of labels H.  - non terminal A in chart entry (i,j) after parsing the input W1,...,Wn.  PPMPD – probability of MPD of input string W1,...,Wn.

26 Algorithm 1 – Computing the probability of most probable derivation

27 The Most Probable Parse  Computing MPP in STSG is NP hard  Monte Carlo method Sample derivations Observe frequent parse tree Estimate parse tree probability Random – first search  The algorithm  Law of Large Numbers

28 Algorithm 2: Sampling a random derivation  for length := 1 to n do for start := 0 to n - length do  for each root node X chart-entry (start, start + length) do: 1. select at random a tree from the distribution of elementary trees with root node X 2. eliminate the other elementary trees with root node X from this chart-entry

29 Results of Algorithm 2  Random derivation for the whole sentence  First guess for MPP  Compute the size of the sampling set Probability of error  Upper bound  0 index of MPP,i index of parse i, N derivation No unique MPP – ambiguity

30 Reminder

31 Conclusions – lower bound for N  Lower bound for N: Pi is probability of parse i B - Estimated probability by frequencies in N Var(B) = Pi*(1-Pi)/N 0 Var(B) <= 1/(4*N) s = sqrt(Var(B)) -> S <= 1/(2*sqrt(N)) 1/(4*s^2) <= N 100 s <= 0.05

32 Algorithm 3: Estimating the parse probabilities  Given a derivation forest of a sentence and a threshold sm for the standard error:  N := the smallest integer larger than 1/(4 sm 2)  repeat N times: sample a random derivation from the derivation forest store the parse generated by this derivation  for each parse i: estimate the conditional probability given the sentence by pi := #(i) / N

33 Complexity of Algorithm 3  Assumes value of max allowed standard error  Samples number of derivations which is guaranteed to achieve the error  Number of needed samples is quadratic in chosen error

34 Optimizations  Sima ’ an : MPD in linear time in STSG size  Bod : MPP on small random corpus of sub trees  Sekine and Grishman : use only sub trees rooted with S or NP  Goodman : different polynomial time

35 Experimental Properties of DOP1  Experiments on the ATIS corpus MPP vs. MPD Impact of fragment size Impact of fragment lexicalization Impact of fragment frequency  Experiments on SRI-ATIS and OVIS Impact of sub tree depth

36 Experiments on ATIS corpus  ATIS = Air Travel Information System  750 annotated sentence analyses  Annotated by Penn Treebank  Purpose: compare accuracy obtained in undiluted DOP1 with the one obtained in restricted STSG

37 Experiments on ATIS corpus  Divide into training and test sets 90% = 675 in training set 10% = 75 in test set  Convert training set into fragments and enrich with probabilities  Test set sentences parsed with sub trees from the training set  MPP was estimated from 100 sampled derivations  Parse accuracy = % of MPP that are identical to test set parses

38 Results  On 10 random training / test splits of ATIS: Average parse accuracy = 84.2% Standard deviation = 2.9 %

39 Impact of overlapping fragments MPP vs. MPD  Can MPD achieve parse accuracies similar to MPP  Can MPD do better than MPP Overlapping fragments  Accuracies generated by MPD on test set  The result is 69%  Comparing to accuracy achieved with MPP on test set : 69% vs. 85%  Conclusion: overlapping fragments play important role in predicting the appropriate analysis of a sentence

40 The impact of fragment size  Large fragments capture more lexical/syntactic dependencies than small ones.  The experiment: Use DOP1 with restricted maximum depth Max depth 1 -> DOP1 = SCFG Compute the accuracies both for MPD and MPP for each max depth

41 Impact of fragment size

42 Impact of fragment lexicalization  Lexicalized fragment  More words -> more lexical dependencies  Experiment: Different version of DOP1 Restrict max number of words per fragment Check accuracy for MPP and MPD

43 Impact of fragment lexicalization

44 Impact of fragment frequency  Frequent fragments contribute more  large fragments are less frequent than small ones but might contribute more  Experiment: Restrict frequency to min number of occurrences Not other restrictions Check accuracy for MPP

45 Impact of fragment frequency

46 Experiments on SRI-ATIS and OVIS  Employ MPD because the corpus is bigger  Tests performed on DOP1 and SDOP  Use set of heuristic criteria for selecting the fragments: Constraints of the form of sub trees  d - upper bound on depth  n – number of substitution sites  l – number of terminals  L – number of consecutive terminals Apply constraints on all sub trees besides those with depth 1

47 Experiments on SRI-ATIS and OVIS  d4 n2 l7 L3  DOP(i)  Evaluation metrics: Recognized Tree Language Coverage – TLC Exact match Labeled bracketing recall and precision

48 Experiments on SRI-ATIS  13335 annotated syntactically utterances  Annotation scheme originated from Core Language Engine system  Fixed parameters except sub tree bound: n2 l4 L3  Training set – 12335 trees  Test set – 1000 trees  Experiment: Train and test on different depths upper bounds (takes more than 10 days for DOP(4) !!! )

49 Impact of sub tree depth SRI-ATIS

50 Experiments on OVIS corpus  10000 syntactically and semantically annotated trees  Both annotations treated as one  More non terminal symbols  Utterances are answers to questions in dialog -> short utterances (avg. 3.43)  Sima ’ an results – sentences with at least 2 words, avg. 4.57  n2 l7 L3

51 Experiments on OVIS corpus  Experiment: Check different sub tree depth  1,3,4,5 Test set with 1000 trees Train set with 9000 trees

52 Impact of sub tree depth - OVIS

53 Summary of results  ATIS: Accuracy of parsing is 85% Overlapping fragments have impact on accuracy Accuracy increases as fragment depth increases both for MPP and MPD Optimal lexical maximum for ATIS is 8 Accuracy decreases if lower bound of fragment frequency increases (for MPP)

54 Summary of results  SRI-ATIS: Availability of more data is more crucial to accuracy of MPD. Depth has impact Accuracy is improved when using memory based parsing(DOP(2)) and not SCFG (DOP(1))

55 Summary of results  OVIS: Recognition power isn ’ t affected by depth No big difference between exact match in DOP1(1) and DOP1(4) mean and standard deviations

56 DOP: probabilistic recursive MBL  Relationship between present DOP framework and Memory Based Learning framework  DOP extends MBL to deal with disambiguation  MBL vs. DOP Flat or intermediate description vs. hierarchical

57 Case Based Reasoning - CBR  Case Based learning Lazy learning, doesn ’ t generalize Lazy generalization  Classify by means of similarity function  Refer this paradigm as MBL  CBR vs. other variants of MBL Task concept Similarity function Learning task

58 The DOP framework and CBR  CBR method A formalism for representing utterance- analyses - case description language An extraction function – retrieve units Combination operations – reuse and revision  Missing in DOP: Similarity function  Extend CBR: A probability model DOP model defines CBR system for natural language analysis

59 DOP1 and CBR methods  DOP1 as extension to CBR system  = classified instance  Retrieve sub trees and construct tree  Sentence = instance  Tree = class  Set of sentences = instance space  Set of trees – class space  Frontier, SSF,  Infinite runtime case-base containing instance-class-weight triples:

60 DOP1 and CBR methods  Task and similarity function: Task = disambiguation Similarity function:  Parsing -> recursive string matching procedure  Ambiguity -> computing probability and selecting the highest.  Conclusion: DOP1 is a lazy probabilistic recursive CBR classifier

61 DOP vs. other MBL approached in NLP  K-NN vs. DOP  Memory Based Sequence Learning DOP – stochastic model fro computing probabilities MBSL – ad hoc heuristics for computing scores DOP – globally based ranking strategy of alternative analyzes MBSL – locally based one Different generalization power

62 Conclusions  Memory Based aspects of DOP model  Disambiguation  Probabilities to account frequencies  DOP as probabilistic recursive Memory Based model  DOP1 - properties, computational aspects and experiments.  DOP and MBL - differences

Download ppt "A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation."

Similar presentations

Ads by Google