Download presentation
Presentation is loading. Please wait.
Published byAmelia Horn Modified over 9 years ago
1
GALE Banks 11/9/06 1 Parsing Arabic: Key Aspects of Treebank Annotation Seth Kulick Ryan Gabbard Mitch Marcus
2
GALE Banks 11/9/06 2 Outline Summary of recent results Part of Speech/Treebank “mismatches” Components of Flat NPs Test and Train Results Conclusion
3
GALE Banks 11/9/06 3 Recent Results Effect of Sentence Splitting – S->S (wa) S (wa) S Breaking these improves F-measure by 1.25% Investigating automatic accuracy of S splitting Effect of “Spurious NPs” in coordination (NP (NP x) and (NP y)) changed to (NP x and y and z) Improves F-measure by 0.5%
4
GALE Banks 11/9/06 4 Pos/Treebank Mismatches “Ideal” – XP projection headed by X Ideal and Reality in the PTB and ATB Ambiguities for (Pos word) makes parser’s job harder
5
GALE Banks 11/9/06 5 VP headed by noun 6% of VPs in ATB have a nonverbal head Changed heads to have new POS tag – “DV” Temporary approximation to current annotation changes 0.7 increase in F-measure ( VP (NOUN mugAdar+at+i- [departure]) (NP-SBJ (POSS_PRON –hi [his]) (NP-OBJ (DET+NOUN Al+bayot+a [the house]) (DET+ADJ Al+>aboyaD+a [the white])))
6
GALE Banks 11/9/06 6 NP headed by adj – #1 ( S (NP-SBJ (PRON_1S –niy [I]) (NP-PRD (ADJ saEiyd+N [happy])) ADJ heads NP-PRD, elsewhere ADJP-PRD ( VP (PV+PVSUFF_SUBJ kAn+a [be+he]) (NP-SBJ-1 (-NONE- *T*)) (ADJP-PRD (ADJ saEiyd+AF happy) (PP … [with the voting])))
7
GALE Banks 11/9/06 7 NP headed by adj - #2 (VP (IV ta+Eomal+a [they work]) (NP-SBJ rAbiT+ap+u Al+maxAtyr+i [league of the mukhtars(village chiefs)]) (NP-ADV (ADJ dA}im+AF [always])) ADJ heads NP-ADV, elsewhere ADVP,ADJP (VP (IV na+>omal+a [we hope for] (NP-SBJ (-NONE- *)) (ADVP (ADJ dA}im+AF [always])) (VP (IV ya+SiH~+u he/it+be correct (NP-SBJ-1 (-NONE- *T*)) (ADJP (ADJ dA}im+AF [always])
8
GALE Banks 11/9/06 8 ADJP headed by noun ( S (NP-SBJ (NOUN >um~ah+At+u- [mothers]) (POSS_PRON_3P -hum [their])) (ADJP-PRD (NOUN >amiyrokiy~+At+N [American])) Also as ADJ ( NP (NOUN >um~ah+At+K [mothers]) (ADJ >amiyrokiy~+At+K [American]))
9
GALE Banks 11/9/06 9 ADVP headed by conj (S (ADVP (FOCUS_PART >am~A [as_for/concerning])) (NP-TPC-1 Haqiyb+ap+u Al+xArijiy~+ap+I [the foreign ministry’s portfolio]) (ADVP (CONJ fa- [and/so])) (VP …. (CONJ fa-) also as child of S (S (S …) (PUNC,) (CONJ fa- [and/so]) (S…)
10
GALE Banks 11/9/06 10 Mismatches in ATB and PTB ATB3PTB2.0 VP6.0%0.5% NP5.0%1.6% ADJP7.3%23.4% ADVP45.37%8.0% PP0.8%1.8%
11
GALE Banks 11/9/06 11 XP/X mismatches - Summary This matters: headless VPs to “DV” modification : +0.7% PTB: 23.4% mismatch for ADJP Overall: 88.28 ADJP: 70.68 Real-life linguistic complexity Need guidelines – visual prop time Some automatic changes likely No guarantee of level of improvement, but: Should be a priority
12
GALE Banks 11/9/06 12 Flat NPs Flat NPs – only (Pos word) children Experiment – Evaluate with Flat NPs as different bracket Affects overall score (Gold) ( NP (NOUN -<ijorA’+i [conducting]) (NP (NOUN {inotixAb+At+K [elections]) (ADJ niyAbiy~+ap+K [representative])))
13
GALE Banks 11/9/06 13 Flat NPs (Gold) (NP (NOUN -<ijorA’+i [conducting]) (NP (NOUN {inotixAb+At+K [elections]) (ADJ niyAbiy~+ap+K [representative]))) (Test) (NP (NN -<ijorA’+i [conducting]) (NNS {inotixAb+At+K [elections]) (JJ niyAbiy~+ap+K [representative])) Under regular evaluation, top NPs match
14
GALE Banks 11/9/06 14 Flat NPs (Gold) ( NP (NOUN -<ijorA’+i [conducting]) (FLATNP (NOUN {inotixAb+At+K [elections]) (ADJ niyAbiy~+ap+K [ representative]))) (Test) (FLATNP (NN -<ijorA’+i [conducting]) (NNS {inotixAb+At+K [elections]) (JJ niyAbiy~+ap+K [representative])) With FlatNP evalution, no match
15
GALE Banks 11/9/06 15 Flat NPs Importance of Flat NPs 30% of brackets are Flat NPs Errors percolate Up ATB3 score on Flat NPs not good enough Unclear why, but need some things from ATB Flat NPsOverall PTB2.094.2087.54 ATB386.7777.27
16
GALE Banks 11/9/06 16 Flat NPs Clear statement of what can go in flat NPs Regular expressions for each head Certain things fall out: Questionable categories – e.g. (DET+NOUN DET+NOUN) (NP Al+baHor+i [the sea] Al+>aHomar+i [the red]) Nouns that occur before a head noun are limited to a small class : quantifiers
17
GALE Banks 11/9/06 17 Flat NPs (NP (NOUN kul~+a [every/all/each_one]) (DET+NOUN Al+nuSuws+I [the texts] (DET+ADJ Al+tijAriy~+ap+I [the business]) Quantifier as prenominal modifier in flat NP Quantifier as taking NP complement (NP (NOUN kul~+a [every/all/each_one]) (NP (DET+NOUN Al+duwal+i [the countries]) (DET+ADJ A+Earabiy~+ap+I [the Arabic])) Quantifiers take NP complement 15%
18
GALE Banks 11/9/06 18 Flat NPs - Summary Real-life linguistic complexity Need guidelines for NP structure, quantifiers Some automatic changes likely Maybe different POS tag for NOUNs with different distribution? No guarantee of level of improvement, but: Should be a priority
19
GALE Banks 11/9/06 19 Test on Train ATB3 lower, but not so much Analysis of dependency errors All<=40 PTB2.096.8097.10 ATB394.3195.34
20
GALE Banks 11/9/06 20 Dependency Analysis PTB2.0ATB3 % allFmeas%allFmeas 31.08%99.19%16.33%95.83% 0.0%N/A10.13%97.08% NPB headmod NP headNP % all = % of all dependencies NPB = “base NP”, non-recursive NP More evidence that minimal NPs matter a lot
21
GALE Banks 11/9/06 21 Dependency Analysis PTB2.0ATB3 % allFmeas%allFmeas 5.23%94.785.74%89.05 0.04%30.401.28%65.08 NP NPBPP NP PP Why the difference in PP adjoining to NP, and not just NPB?
22
GALE Banks 11/9/06 22 PP attachment in PTB Adjuncts at the same level OkayNot Okay (NP (NP ….) (PP ….) (PP …)) (NP (NP (NP …) (PP …)) (PP …)) This is true for ATB also
23
GALE Banks 11/9/06 23 PP attachment in PTB (NP (NP streets) (PP of (NP (NP the city) (PP of (NP Long Beach)) (PP in (NP the state…))))) (NP (NP streets) (PP of (NP (NP (NP the city) (PP of (NP Long Beach))) (PP in (NP the state…))))) First is okay, second is not PPs in PTB do not adjoin to recursive NPs PPs in ATB do, because of Al<DAfp
24
GALE Banks 11/9/06 24 PP attachment in PTB and ATB (NP (NP streets) (PP of (NP (NP (NP the city) (PP of (NP Long Beach))) (PP in (NP the state…))))) (NP ($awAriE [streets]) (NP (NP madinyn+ap [the city]) (NP luwnog byt$ [Long Beach])) (PP fiy [in] (NP wilAy+ap [the state].. )))) PTB: PP adjoining to recursive NP – bad structure ATB: PP adjoining to recursive NP – good structure
25
GALE Banks 11/9/06 25 Dependency Analysis PTB2.0ATB3 % allFmeas%allFmeas 5.23%94.785.74%89.05 0.04%30.401.28%65.08 NP NPBPP NP PP Parser distinguishes NPB, helps for PTB. A wider range of attachment possibilities for ATB Challenge for the parser
26
GALE Banks 11/9/06 26 Conclusion We need guidelines We need to create the guidelines Interaction - Parsing and Treebank Identify useful consistency checks Run as part of each release Better understanding of problematic areas What sort of changes are necessary? Parsing – automatic transformations Treebank – Pos changes, etc. Proper time allocation?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.