Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extracted TAGs and Aspects of Their Use in Stochastic Modeling

Similar presentations


Presentation on theme: "Extracted TAGs and Aspects of Their Use in Stochastic Modeling"— Presentation transcript:

1 Extracted TAGs and Aspects of Their Use in Stochastic Modeling
John Chen Department of Computer Science Columbia University

2 Motivation (1/3) Lexicalized stochastic models are important for NLP
Parsing (Collins 99; Charniak 00) Summarization (McKeown, et al. 01) Machine translation (Berger, et al. 94)

3 Motivation (2/3) Tree-Adjoining Grammar (TAG) is a lexicalized formalism… (Joshi, et al. 75; Schabes et al. 88) …but until recently, not much work in modeling of stochastic TAG (Srinivas 00; Chiang 00) S NP NP VP NP Nu Vu NP Nu Creationism lost credibility

4 Motivation (3/3) Problem: Lack of large scale corpora with TAG annotations Penn Treebank (Marcus, et al. 93) TAG-annotated corpus S NP-SBJ VP NNP VBD NP PRP NN Bell increased its earnings S NP NP VP NP NP Nu Vu NP Gu NP* Nu Bell increased its earnings

5 Introduction (1/2) Approach: automatically extract a TAG from the Penn Treebank Given a bracketed sentence, derive a set of TAG trees out of which it is composed Extracted TAGs should conform to the principles that guide the formation of hand-crafted TAGs S NP-SBJ VP S NNP VBD NP NP NP VP NP NP PRP NN Nu Vu NP Gu NP* Nu Bell increased its earnings Bell increased its earnings

6 Introduction (2/2) Uses To estimate parameters for statistical TAG models To avoid having to hand-craft your own grammar To evaluate TAGs extracted using different design methodologies To improve a hand-crafted grammar To do a comparative evaluation of grammars extracted from different kinds of treebanks Of different languages Of different sublanguages

7 Outline Motivation, Introduction Extraction of a TAG from a Treebank
Tree-Adjoining Grammars Extraction Procedure Variations on Extraction Evaluation Experiment to increase coverage Smoothing Models for TAG Using Extracted TAG Features to Predict Semantic Roles Conclusions and future work

8 Tree-Adjoining Grammar (TAG)
A TAG is a set of lexicalized trees Lexicalized tree == TAG elementary tree Anchor of an elementary tree == lexical item Operations combine lexicalized trees into parse trees NP NP NP Au NP* Nu A N Wet paint Wet paint

9 Kinds of Trees in TAG Lexicalized tree vs. tree frame
Lexicalized trees Tree frames (supertags) Initial vs. auxiliary tree Initial trees Substitution node Auxiliary trees Foot node S NP NP NP VP NP Au NP* Nu Vu NP Nu Wet paint paint enjoys S NP VP NP NP NP Vu S* Au NP* Au NP* Nu Wet thinks

10 TAG Operations Substitution Adjoining
NP-5 S0 NP VP NP VP NP VP Vu S* NP Vu NP -NONE- Vu NP thinks Nu enjoys *-5 enjoys S Terry NP-5 S0 S NP VP NP VP V S0’ N V NP “Who everyone thinks enjoys pea soup” “Terry enjoys pea soup” thinks NP VP Terry enjoys -NONE- V NP *-5 enjoys

11 Principles of TAG Formation (1/2)
Localization of dependencies Project lexical head to include all of its complements S S S NP-9 S NP PP NP VP NP VP NP VP Nu INu NP Vu NP Vu NP PP Vu NP PP nectarines beside plays gives put -NONE- *-9

12 Principles of TAG Formation (2/2)
Factoring recursion Modifier auxiliary trees Example Predicative auxiliary trees Example VP VP* PP S INu NP NP VP between The messenger ran [PP between the cars] [PP across the street] [PP towards the police station] . Vu S* thought What [S everyone thought] [S the manager believed] [S the employees imagined] to be a time-saver.

13 Extraction Procedure (1/4) (Chen 01; cf. Xia 99, Chiang 00)
Extraction of a particular tree Example Step 1: Determination of path of projection of TAG tree How far to go up starting from lexical item? Find out using head percolation table (Magerman 95) Heuristics look at relationships between parent and its children S S S NP-SBJ VP VP VP NNP NNP ADVP-MNR VBD VBD NP VBDu Mr. Lane disputed disputed RB DT NNS vehemently those estimates

14 Extraction Procedure (2/4)
Extraction of a particular tree (continued) Step 2: Distinguish complements and adjuncts Example Heuristics determine complements/adjuncts Complements become substitution nodes Adjuncts become modifier auxiliary trees S VP S NP-C VP ADVP VP* NP VP NNP NNP ADVP-A VBD NP-C RBu VBDu NP Mr. Lane disputed RB DT NNS vehemently disputed vehemently those estimates

15 Extraction Procedure (3/4)
Other aspects of extraction procedure Extracting predicative auxiliary trees Localizing traces with their landing sites Detecting and extracting appropriate conjunction trees Extracting a TAG tree containing multiple lexical items

16 Extraction Procedure (4/4)
S S NP VP WHNP S S -NONE- VBu NP NP VP WHNP-17 S * eat -NONE- VBu S WP NP VP * seems what NNP VB S S Jon seems NP VP WHNP-17 S -NONE- TO VP S NP VP to * VB NP NP VP -NONE- VBu NP eat -NONE- VBu S* * eat -NONE- *-17 seems *-17

17 Variations on Extracted Grammars
Extraction procedure can be parameterized We want to study Effects of parameterization on resulting extracted grammars Effects on a statistical model based on different extracted grammars Kinds of variation of extracted grammars Detection of complements Empty elements Label set

18 Variation in Detection of Complements (1/2)
Recall that an important principle of TAG formation is that an elementary tree includes a lexical head and all of its complements Principle of “domain of locality” Notion of complement of a lexical head is a fuzzy notion One way to extract grammars with different domains of locality is to vary the way complements are detected

19 Variation in Detection of Complements (2/2)
CA1 CA2 S S NP VP NP VP NP-SBJ VP VBDu NP PP VBDu NP VBD NP PP-CLR joined IN NP joined DT NN IN NP as Pierre Vinken joined the board as an executive director Kinds of ways to detect complements CA1 (Chen, Vijay-Shanker 99) ---more nodes are complements CA2 (Xia 99) —more nodes are adjuncts

20 Variation of Treatment of Empty Elements
Kinds of empty elements Penn Treebank has many different kinds of empty elements Standard TAG analyses only treat a certain subset of these Different treatments of empty elements ALL: include all empty elements in the Penn Treebank in the extracted grammar SOME: include only those empty elements in the extracted grammar that do not violate TAG’s domain of locality

21 Variation of Label Set Kinds of label sets
Penn Treebank has a detailed label set, especially for part of speech Standard TAG analyses assume a simplified label set Extracted grammars based on different label sets FULL: Elementary trees labeled with Penn Treebank label set MERGED: Elementary trees labeled with (simplified) XTAG label set

22 Evaluation of Extracted Grammars
Different ways to evaluated extracted grammars Size Coverage Supertagging accuracy Trace localization Each grammar variation is extracted from PTB Sections 02-21

23 Size of Grammar (1/3) Ways to measure size Importance of size
Number of lexicalized trees Number of tree frames Importance of size Efficiency of statistical models Impact on sparse data problem Lexicalized Tree Tree Frame S S NP VP NP VP VBDu NP VBDu NP disputed

24 Size of Grammar (2/3) Change in #Frames > Change in #LexTrees Comp
Empty Label #Frames #LexTrees CA1 ALL FULL 8675 113456 MERGE 5953 109774 SOME 7446 110134 5053 106457 CA2 6488 110034 4358 107285 4723 106422 3075 102679 Change in #Frames > Change in #LexTrees

25 Size of Grammar (3/3) Variance(#frames): Label > Comp = Empty
#LexTrees CA1 ALL FULL 8675 113456 MERGE 5953 109774 SOME 7446 110134 5053 106457 CA2 6488 110034 4358 107285 4723 106422 3075 102679 Variance(#frames): Label > Comp = Empty Variance(#LexTree): Empty > Label > Comp

26 Coverage of Grammar (1/3)
Measuring coverage Extract grammar G from training corpus Extract grammar G’ from test corpus (PTB Sec 23) Compute percentage of instances of (lex tree/tree frame) in G’ that are also in G Importance of coverage Impact on sparse data problem Measure of amount of linguistic generalization

27 Coverage of Grammar (2/3)
Comp Empty Label %FramesSeen %LexTreesSeen CA1 ALL FULL 99.56 92.04 MERGE 99.69 92.35 SOME 99.65 92.41 99.77 92.74 CA2 92.26 99.76 92.59 99.81 92.76 99.88 93.08 Frame coverage > 99% Extraction procedure making good syntactic generalizations LexTree coverage poor Not surprising, given # of (word x frame) combinations

28 Coverage of Grammar (3/3)
Comp Empty Label W,T seen separately W or T not seen CA1 ALL FULL 63.94 36.06 MERGE 36.08 SOME 63.24 36.76 63.09 36.91 CA2 64.00 36.00 63.83 36.17 63.54 36.46 62.86 37.14 We can recover about 2/3 of missing lextree coverage if we can guess “valid” word x frame combinations from words and frames found in training corpus

29 Supertagging Accuracy (1/5)
Input: words of a sentence Output: each word associated with a tree frame S S NP VP NP VP * Vu NP Vu NP NP NP Au NP* Nu NP NP Au NP* Nu Wet paint Wet paint

30 Supertagging Accuracy (2/5)
Supertagging as “almost parsing” S NP VP * V NP Wet NP NP N paint Au NP* Nu Wet paint NP A N Wet paint

31 Supertagging Accuracy (3/5)
Importance of supertagging accuracy Measuring impact of different kinds of grammars on statistical model Experimental design Trigram model of supertagging (Srinivas 97; cf. Chen, et al. 99) Training set: PTB Sections 02-21 Test set: PTB Section 23

32 Supertagging Accuracy (4/5)
Comp Empty Label %correct supertags CA1 ALL FULL 78.55 MERGE 79.23 SOME 79.34 80.09 CA2 79.07 79.65 80.03 80.62 Relatively low accuracy Sparse data problem with extracted grammars Variance(%correct): Empty > Label > Comp

33 Supertagging Accuracy (5/5)
Correlation between supertagging accuracy and other measures Weak correlation between an extracted grammar’s supertagging accuracy and its size in tree frames Very strong correlation (R=0.98) between an extracted grammar’s supertagging accuracy and its size in lexicalized trees When designing a TAG to be modeled stochastically, it might be a good idea to design it so as to minimize in particular the number of lexicalized trees

34 Representation of Empty Elements in Extracted Grammars (1/3)
Kinds of empty elements in linguistic theory Traces Null elements Recall TAG analyses typically include certain kinds of empty elements Penn Treebank has these and other kinds of empty elements

35 Representation of Empty Elements in Extracted Grammars (2/3)
We measure Number of traces localized with landing sites Number of null elements Importance Theoretical Which kinds of traces can/cannot be localized by TAG formalism Practical Some kinds of localization may improve performance of statistical models (cf. Collins 99) It can ease interface between extracted TAG and semantics

36 Representation of Empty Elements in Extracted Grammars (3/3)
Comp Empty #Trace Types #Null Types #Trace Tokens (%of all TT) #Null Tokens CA1 ALL 2381 2560 21508(60%) 42593 SOME 1258 1392 18394(50%) 25305 CA2 1847 2130 21458(59%) 42643 589 655 16153(44%) 24035 Variance(%trace types):Comp > Empty Examples of non-localizable traces: In TAG: traces across coordination CA1 vs CA2: complements far from head ALL vs SOME: Adverbial movement

37 Evaluation Reveals a Sparse Data Problem with Extracted Grammars
Lexicalized tree coverage is generally bad for extracted grammars This is one major reason for poor supertagging accuracy #LexTrees %LexTree Coverage on Unseen %supertag accuracy Extracted Grammar 113456 92.04 78.55

38 Feature Vector Decomposition of Extracted Grammars (1/3)
Motivation Can help ameliorate extracted gr’s sparse data problem Help map extracted grammar onto semantics, other grammars Feature vector description POS, Subcat frame, Modifyee, direction, co-anchors, root Transformations: declarative, Subj-aux inversion, Topicalization, Wh-movement, complement, etc Example S POS VB Subcat {NP} Modifyee S Direction left Compl? Yes S S* IN S NP VP VBu NP hurt

39 Feature Vector Decomposition of Extracted Grammars (2/3)
Detection of features based on pattern matching of structural relationships (after linguistic theory) Pos Preterminal Subcat Sister substitution nodes to preterminal Compl? S S S* POS VB Subcat {NP} Compl? Yes IN S NP VP VBu NP hurt X S lexical item

40 Feature Vector Decomposition of Extracted Grammars (3/3)
Determination of feature vector information allows annotation of tree frames with deep (syntactic) role information Examples of role: subject(0), object(1) Pass tranformation is activated for this tree frame Therefore, deep-roles for nodes in this tree frame differ from surface-roles surf-role: 0 deep-role: 1 S NP-9 VP Vu NP PP bitten -NONE- P NP by *-9 surf-role: 1 deep-role: 0

41 Procedure to Increase Coverage of Extracted Grammar (1/3)
Step 1: Induce tree families from feature vector representation of extracted grammar A tree family (XTAG-Group 2001) is a set of tree frames Having the same pos, subcat features Represent pred-arg structure NP S S NP* S S NP S NP S NP S NP VP NP VP NP VP NP VP e e VBu NP VBu NP VBu NP VBu NP e e

42 Procedure to Increase Coverage of Extracted Grammar (2/3)
Step 2: Augment extracted grammar using tree families F S Go S S NP S NP VP NP VP NP VP VBu NP VBu NP VBu NP hurt e S S S NP NP NP S NP S NP S NP* S NP* S NP VP NP VP NP VP NP VP NP VP e VBu NP VBu NP VBu NP VBu NP -NONE- VBu NP hurt e e -NONE- hurt -NONE- hurt * * *

43 Procedure to Increase Coverage of Extracted Grammar (3/3)
Results 26% reduction in misses in overall coverage 62% reduction in misses in verb-only coverage

44 Outline Motivation, Introduction Extraction of TAG from a Treebank
Smoothing Models for TAG Sparse data problem using extracted grammars Supertagging Baselines for supertagging Smoothing approaches for supertagging Future work Using Extracted TAG Features to Predict Semantic Roles Conclusions and Future Work

45 Sparse Data in Statistical Models using Extracted Grammars
Sparse data in supertagging Sparse data in other kinds of stochastic modeling as well Focus on smoothing supertagging models, but our procedure is applicable to others #LexTrees %LexTree Coverage on Unseen %supertag accuracy Extracted Grammar 113456 92.04 78.55

46 Recall: Supertagging Input: words of a sentence
Output: each word associated with a tree frame S S NP VP NP VP * Vu NP Vu NP NP NP Au NP* Nu NP NP Au NP* Nu Wet paint Wet paint

47 Trigram Model for Supertagging

48 Smoothing the Trigram Model
Trigram model for supertagging Probability distribution to smooth P(ti| ti-1 ti-2) – use Katz’ backoff P(wi|ti) -- focus of consideration here Characterization of different kinds of P(wi|ti) W unseen: (Weischedel, et al. to smooth) Focus of smoothing here w and t seen (Recall: these are the majority of cases)

49 Experimental Setup Grammar: CA1-SOME-FULL
Train: PTB 02-21, Development: 22, Test: 23

50 Two Supertagging Baselines
Results Baseline 2: train on and 23 for p(w|t) only Low score for Baseline 2 + w,t separate Overall w,t together w,t separate No smooth 79.24% 84.96% 0% Baseline 2 85.60% 87.36% 53.19% Trigram supertagging accuracy

51 Smoothing Equations Smoothing Unsmoothed probability

52 Smoothing using Part of Speech
Results Issues Correct prediction hampered by flatness of pos probabilities This also causes efficiency problems Overall w,t together w,t separate No smooth 79.24% 84.96% 0% Baseline 2 85.60% 87.36% 53.19% Pos Smooth 79.34% 85.00% 1.36%

53 Smoothing using Tree Families (1/2)
Training Tree families are defined as before FAMILY-tag each word in training as follows Word that is part of a tree family is tagged with POS+SUBCAT features Otherwise, word is tagged with POS feature only Example Compute p(w|FAMILY) given this markup The//DT cat//NN eats//VB_NP lettuce//NN

54 Smoothing using Tree Families (2/2)
Results Only a bit better Can’t smooth two supertags that are both not in any tree family (the more common case) Can’t leverage fact that one non-tree family supertag can be evidence for existence of a tree-family supertag, and vice versa Flatness of probability distribution (but less flat than pos) Overall w,t together w,t separate No smooth 79.24% 84.96% 0% Baseline 2 85.60% 87.36% 53.19% Pos Smooth 79.34% 85.00% 1.36% Tree Family 79.46% 85.10% 1.92%

55 Smoothing using Distributional Similarity (1/4)
(Dagan, et al.) Predicting the next word wnext in the sentence given the current word Approximate PSIM(w|t) using PMLE(w|t’) for t’ “close to” t SIM(t,t’) is distance between distribution of words over t and distribution of words over t’ Conjecture: If these two distributions are about the same, then t and t’ belong in the same tree family

56 Smoothing using Distributional Similarity (2/4)
Results Baseline 2: train on and 23 for p(w|t) only DS-smooth: about 7% reduction in error overall (statistically significant) Overall w,t together w,t separate No smooth 79.24% 84.96% 0% Baseline 2 85.60% 87.36% 53.19% Pos Smooth 79.34% 85.00% 1.36% Tree Family 79.46% 85.10% 1.92% DS-smooth 80.65% 85.39% 21.34% Trigram supertagging accuracy

57 Smoothing using Distributional Similarity (3/4)
Most similar tree frames form automatically-induced tree families Most similar trees to a3 according to a-skew S S S IN S S NP VP NP VP NP VP NP VP VBu NP -NONE- VBu NP VBu NP VBPu NP * a3 a4 a5 a6

58 Smoothing using Distributional Similarity (4/4)
Error Analysis Low frequency supertags not smoothed well Obviously, because of the way we smooth There are a lot of these cases (tree frames have Zipfian distribution) Error due to high freq supertags are dramatically reduced, but much error persists Reasons for error tend to be idiosyncratic Mistaking capitalization for start of sentence instead of headlines Errors in Penn Treebank annotation

59 Towards Improving Smoothing
Smoothing using distributional similarity Works well, especially for high and med freq supertags Problem with low freq supertags (and there are a lot of these cases) Low freq supertags never smoothed together Smoothing using tree families Low freq supertags will be smoothed together But they will be given too much probability mass on average (flatness of distribution) Handling low freq supertags Have tree families approach suggest supertags if they are low frequency (if high freq, use distributional similarity) Make distribution less flat by taking more context into consideration ( p(w|big context), not p(w|t) )

60 Outline Motivation Introduction Extraction of TAG from a Treebank
Smoothing Models for TAG Using Extracted TAG Features to Predict Semantic Roles (Joint work with Owen Rambow) PropBank semantic annotation Extracted TAG features in prediction models Conclusions and Future Work

61 Motivation Syntactic information in the form of TAG is useful for natural language applications Predicate is localized with its arguments Relations between words are disambiguated S NP NP NP VP NP NP NP* PP NP Nu Vu NP Gu NP* Nu Pu NP Nu Mitsubishi increased its sales of automobiles

62 Motivation Sometimes, a purely syntactic annotation is insufficient
The subject argument in the first sentence stands in a different relationship with the predicate broke than the subject argument in the second sentence. S S NP NP VP NP NP VP NP Nu Vu Nu Vu NP Nu WindowsXP broke Hackers broke WindowsXP

63 Adding Semantic Labels to Arguments
We can solve the problem by labeling each argument with how it relates semantically with its predicate Kinds of semantic labels Domain specific Flight-travel: ORIG-CITY, DEST-CITY, … Terrorism: PERPETRATOR, POLITICAL-GROUP, … Semantic roles are more general AGENT(0): entity performing some action PATIENT(1): entity being acted upon Etc…

64 Example of Annotating Arguments with Semantic Roles
Semantic role information reifies the similarity between the subject of broke in the first sentence and the object of broke in the second sentence. sem-role: 0 sem-role: 1 sem-role: 1 S S NP NP VP NP NP VP NP Nu Vu Nu Vu NP Nu WindowsXP broke Hackers broke WindowsXP

65 PropBank (Kingsbury, et al. 02)
PropBank adds a layer of semantic annotation to the Penn Treebank Semantic information in the PropBank Each predicate is annotated with Sense (word sense) Roleset (set of semantic roles assoc. with this predicate) Each argument is labeled with its semantic role

66 Incomplete State of PropBank Annotation
Initial release of the PropBank is scheduled for June 2003 We used a pre-release version of the PropBank Not all predicates + arguments in the Penn Treebank are annotated, though the most frequently occurring ones are Predicates are not annotated for word sense We will therefore focus on semantic roles Word senses are also needed for semantic interpretation, but… 65% of predicate tokens in PropBank have only one sense In another 7%, semantic roles on arguments completely disambiguate the word sense

67 Our Problem: Predicting Semantic Roles
Goal: Predict the semantic role of an argument given syntactic and lexical information sem-role: 1 S S NP NP VP NP NP VP Nu Vu Nu Vu WindowsXP broke WindowsXP broke

68 Previous Work (Gildea, Palmer 02)
Predict the semantic role given either a gold-standard parse or an automatic parse Syntactic features Phrase Direction Path Voice Lexical Features Predicate headword Argument headword Path: V-VP-S-NP S Phrase: NP NP VP Direction:left N V WindowsXP broke Voice: Active

69 Some Results of Previous Work
Task: Given: Automatically parsed text Mark Boundary of each argument in the input sentence Semantic role of each argument Results Recall: 50.0% Precision: 57.7%

70 One Problem with Previous Work
(Gildea, Palmer 02) note sparse data afflicts their path feature Example of how sparse data can be exacerbated They try to get around it by modifying the path feature, but little improvement Path: V-VP-VP-S-NP S Path: V-VP-S-NP S NP VP NP VP N VP AP N V WindowsXP V A WindowsXP broke broke repeatedly

71 Conjecture Surface syntax features like path have some limitations in identification of semantic role Features based on TAG ameliorate some of the limitation because it localizes the syntactically relevant information See previous example Deep-syntax features that are in our extracted TAG can also help because it abstracts away from less relevant aspects of surface syntax Use of path versus use of path+voice

72 Our Deep-Syntax Features from Extracted TAG Feature Vectors
deep-role, deep-subcat Example annotated with surface- and deep- syntax features Path: V-VP-S-NP deep-role: 0 S S Phrase: NP NP VP NP VP Direction:left NP N V Vu Nu WindowsXP broke broke WindowsXP Voice: Active deep-subcat: NP0

73 Prediction using Features Based on Gold-Standard Parses
Corpora Training: Sec of PropBank Test: Sec 00 of PropBank Use of C4.5 to train models Feature Set %Accuracy Pred_hw+Arg_hw+Deep_role+Deep_subcat 93.2 Pred_hw+Arg_hw+Deep_role 92.0 Pred_hw+Arg_hw+Phrase+Dir+Path+voice 86.2

74 Prediction using Features Based on LDA (1/2)
LDA (Srinivas 97) is a deterministic partial parser which uses supertagging as a first step Procedure to predict semantic roles Partial parse raw input text using LDA For each deep-syntax argument that LDA identifies Extract features corresponding to that argument Run features through a C4.5 trained-model to get corresponding semantic role

75 Prediction using Features Based on LDA (2/2)
Results are close to (Gildea, Palmer 02) (0.50 R/0.58 P) Feature set: Pred_hw+Arg_hw+Deep_role+Deep_subcat Task Recall Precision Sem_role+Arg_hw 0.64 0.74 Sem_role+Bnd 0.50 0.58 Sem_role+Bnd+Arg_hw 0.49 0.57

76 Conclusions (1/3) Extraction of TAG from a Treebank
Procedure to extract a linguistically motivated TAG Error-free resource for statistical TAG models Evaluation of variations in extraction procedure Trade-offs between localizing dependencies and grammar size, supertagging accuracy Feature vector decomposition of extracted TAGs For increasing grammar coverage For mapping extracted TAGs onto semantics, other grammars

77 Conclusions (2/3) Smoothing Models for TAG
Distributional similarity smoothing Significantly increases supertagging accuracy Similarity metric automatically induces tree families Generally, TAG suffers from greater sparse data problems than other grammars (e.g., Collins 99), but from TAG perspective, we can try different smoothing techniques than have been usually employed

78 Conclusions (3/3) Using Extracted TAG Features to Predict Semantic Roles TAG might help because it localizes dependencies Deep-syntax features from our extracted TAG improves prediction over surface-syntax features They may help to such an extent that a partial parser (LDA) can be used to annotate raw text with semantic roles with accuracy comparable to that of a full parser

79 Future Work (1/3) PropBank-inspired Work
Compare using Deep-syntax versus Surface-syntax features over LDA output Wait for final version of PropBank to run experiments to predict word senses Extract a TAG with a semantic rather than syntactic domain of locality

80 Future Work (2/3) Extraction of TAG from a Treebank
Replace heuristics with statistical approach Learn that V projects to VP by looking at the distribution of VPs in the treebank Try to minimize number of lextrees produced, to optimize stochastic model on which the resulting TAG is based Detailed comparison of extracted TAG and hand-written TAG Pinpoint missing constructions in hand-written TAG Find errors in treebank annotation Examine differences between extracted TAGs from different sublanguage corpora VP VP V NP V S

81 Future Work (3/3) Smoothing models for TAG
Combining smoothing using distributional similarity and smoothing using tree families Smoothing using feature vectors, besides tree families Comparison of smoothing methods to smoothing traditionally employed for LCFGs Smoothing probability distributions other than P(w|t)


Download ppt "Extracted TAGs and Aspects of Their Use in Stochastic Modeling"

Similar presentations


Ads by Google