Download presentation
Presentation is loading. Please wait.
Published byShauna Wood Modified over 9 years ago
1
Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협
2
CS730B 2Contents oAbstract oIntroduction oDecision-Tree Modeling oSPATTER Parsing oStatistical Parsing Models oDecision-Tree Growing & Smoothing oDecision-Tree Training oExperiment Results oConclusion
3
CS730B 3Abstract oSyntactic NL parser: not adequate for highly-ambiguous large- vocabulary text (ex. Wall Street Journal) oPremises for develop a new parser m grammars too complex to develop manually for most domains m parsing models must rely heavily on contextual information m existing n-gram model: inadequate for parsing oSPATTER: a statistical parser based on decision-tree model m better than a grammar-based parser
4
CS730B 4Introduction oParsing as making a sequence of disambiguation decisions oThe probability of a complete parse tree(T) of a sentence(S) oAutomatically discovering the rules for disambiguation oProducing a parser without a complicated grammar oLong-distance lexical information is crucial to disambiguate interpretations accurately
5
CS730B 5 Decision-Tree Modeling oComparison m Grammarian: two crucial tasks for parsing identifying the features relevant to each decision deciding which choice to select based on the values of the features m Decision-Tree: above 2 tasks + 3rd task assigning a probability distribution to the possible choices, and providing a ranking system
6
CS730B 6Continued oWhat is a Statistical Decision Tree? m A decision-making device assigning a probability to each of the possible choices based on the context of the decision m P ( f | h ), where f : an element of the future vocabulary h : a history (the context of the decision) m The probability determined by asking a sequence of questions m i th question determined by the answers to the i - 1 previous question m Example: Part-of-speech tagging problem ( Figure 1 )
7
CS730B 7Continued oDecision Trees vs. n-grams m Equivalent to an interpolated n - gram model in expressive power m Model Parameterization n -gram model: n -gram model can be represented by decision-tree model ( n-1 questions ) Example: part-of-speech tagging
8
CS730B 8Continued m Model Estimation n-gram model
9
CS730B 9Continued decision-tree model decision-tree model can be represented by interpolated n- gram
10
CS730B 10Continued oWhy use decision-tree? m As n grows, the parameter space for an n-gram model grows exponentially m On the other hand, the decision-tree learning algorithm increases the size of a model only as the training data allows m So, it can consider much contextual information
11
CS730B 11 SPATTER Parsing oSPATTER Representation m Parse: as a geometric pattern m 4 features in node: words, tags, labels, and extensions (Figure 3) oThe Parsing Algorithm m Starting with the sentence’s words as leaves (Figure 3) m Gradually tagging, labeling, and extending nodes m Constraints Bottom-up, left-to-right No new node is constructed until its children completed Using DWC( derivational window constraints ), # of active nodes restricted m A single-rooted, labeled tree is constructed
12
CS 730B 12 Statistical Parsing Models oThe Tagging Model oThe Extension Model oThe Label Model oThe Derivation Model oThe Parsing Model
13
CS730B 13 Decision-Tree Growing & Smoothing o3 main models (tagging, extension, and label) oDividing the training corpus into 2 sets: (90% for growing, 10% for smoothing) oGrowing & Smoothing Algorithm m Figure 3.5
14
CS730B 14 Decision-Tree Training oParsing model can not be estimated by direct frequency counts because the model contains a hidden component: the derivation model oIn the corpus, no information about orders of derivations oSo, the training process must process discover which derivations assign higher probability to the parses oForward-Backward Reestimation used
15
CS730B 15Continued oTraining Algorithm
16
CS730B 16 Experiment Results oIBM computer Manual m annotated by the University of Lancaster m 195 part-of-speech tags and 19 non-terminal labels m trained on 30,800 sentences, and tested on 1,473 new sentences m 0-crossing-brackets score IBM’s rule-based, unification-style PCFG parse: 69% SPATTER: 76%
17
CS730B 17Continued oWall Street Journal m To test ability to accurately parse a highly-ambiguous, large- vocabulary domain m Annotated in the Penn Treebank, version 2 m 46 part-of-speech tags, and 27 non-terminal labels m Trained on 40,000 sentences, and tested on 1,920 new sentences m Using PARSEVAL
18
CS730B 18Conclusion oLarge amounts of contextual information can be incorporated into a statistical model for by applying decision-tree learning algorithm oAutomatically discovering rules are possible
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.