Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협.

Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협

CS730B 2Contents oAbstract oIntroduction oDecision-Tree Modeling oSPATTER Parsing oStatistical Parsing Models oDecision-Tree Growing & Smoothing oDecision-Tree Training oExperiment Results oConclusion

CS730B 3Abstract oSyntactic NL parser: not adequate for highly-ambiguous large- vocabulary text (ex. Wall Street Journal) oPremises for develop a new parser m grammars too complex to develop manually for most domains m parsing models must rely heavily on contextual information m existing n-gram model: inadequate for parsing oSPATTER: a statistical parser based on decision-tree model m better than a grammar-based parser

CS730B 4Introduction oParsing as making a sequence of disambiguation decisions oThe probability of a complete parse tree(T) of a sentence(S) oAutomatically discovering the rules for disambiguation oProducing a parser without a complicated grammar oLong-distance lexical information is crucial to disambiguate interpretations accurately

CS730B 5 Decision-Tree Modeling oComparison m Grammarian: two crucial tasks for parsing identifying the features relevant to each decision deciding which choice to select based on the values of the features m Decision-Tree: above 2 tasks + 3rd task assigning a probability distribution to the possible choices, and providing a ranking system

CS730B 6Continued oWhat is a Statistical Decision Tree? m A decision-making device assigning a probability to each of the possible choices based on the context of the decision m P ( f | h ), where f : an element of the future vocabulary h : a history (the context of the decision) m The probability determined by asking a sequence of questions m i th question determined by the answers to the i - 1 previous question m Example: Part-of-speech tagging problem ( Figure 1 )

CS730B 7Continued oDecision Trees vs. n-grams m Equivalent to an interpolated n - gram model in expressive power m Model Parameterization n -gram model: n -gram model can be represented by decision-tree model ( n-1 questions ) Example: part-of-speech tagging

CS730B 8Continued m Model Estimation n-gram model

CS730B 9Continued decision-tree model decision-tree model can be represented by interpolated n- gram

CS730B 10Continued oWhy use decision-tree? m As n grows, the parameter space for an n-gram model grows exponentially m On the other hand, the decision-tree learning algorithm increases the size of a model only as the training data allows m So, it can consider much contextual information

CS730B 11 SPATTER Parsing oSPATTER Representation m Parse: as a geometric pattern m 4 features in node: words, tags, labels, and extensions (Figure 3) oThe Parsing Algorithm m Starting with the sentence’s words as leaves (Figure 3) m Gradually tagging, labeling, and extending nodes m Constraints Bottom-up, left-to-right No new node is constructed until its children completed Using DWC( derivational window constraints ), # of active nodes restricted m A single-rooted, labeled tree is constructed

CS 730B 12 Statistical Parsing Models oThe Tagging Model oThe Extension Model oThe Label Model oThe Derivation Model oThe Parsing Model

CS730B 13 Decision-Tree Growing & Smoothing o3 main models (tagging, extension, and label) oDividing the training corpus into 2 sets: (90% for growing, 10% for smoothing) oGrowing & Smoothing Algorithm m Figure 3.5

CS730B 14 Decision-Tree Training oParsing model can not be estimated by direct frequency counts because the model contains a hidden component: the derivation model oIn the corpus, no information about orders of derivations oSo, the training process must process discover which derivations assign higher probability to the parses oForward-Backward Reestimation used

CS730B 15Continued oTraining Algorithm

CS730B 16 Experiment Results oIBM computer Manual m annotated by the University of Lancaster m 195 part-of-speech tags and 19 non-terminal labels m trained on 30,800 sentences, and tested on 1,473 new sentences m 0-crossing-brackets score IBM’s rule-based, unification-style PCFG parse: 69% SPATTER: 76%

CS730B 17Continued oWall Street Journal m To test ability to accurately parse a highly-ambiguous, large- vocabulary domain m Annotated in the Penn Treebank, version 2 m 46 part-of-speech tags, and 27 non-terminal labels m Trained on 40,000 sentences, and tested on 1,920 new sentences m Using PARSEVAL

CS730B 18Conclusion oLarge amounts of contextual information can be incorporated into a statistical model for by applying decision-tree learning algorithm oAutomatically discovering rules are possible

Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협.

Similar presentations

Presentation on theme: "Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협.

Similar presentations

Presentation on theme: "Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협."— Presentation transcript:

Similar presentations

About project

Feedback