Presentation is loading. Please wait.

Presentation is loading. Please wait.

GALE Banks 11/9/06 1 Parsing Arabic: Key Aspects of Treebank Annotation Seth Kulick Ryan Gabbard Mitch Marcus.

Similar presentations


Presentation on theme: "GALE Banks 11/9/06 1 Parsing Arabic: Key Aspects of Treebank Annotation Seth Kulick Ryan Gabbard Mitch Marcus."— Presentation transcript:

1 GALE Banks 11/9/06 1 Parsing Arabic: Key Aspects of Treebank Annotation Seth Kulick Ryan Gabbard Mitch Marcus

2 GALE Banks 11/9/06 2 Outline  Summary of recent results  Part of Speech/Treebank “mismatches”  Components of Flat NPs  Test and Train Results  Conclusion

3 GALE Banks 11/9/06 3 Recent Results  Effect of Sentence Splitting – S->S (wa) S (wa) S Breaking these improves F-measure by 1.25% Investigating automatic accuracy of S splitting  Effect of “Spurious NPs” in coordination (NP (NP x) and (NP y)) changed to (NP x and y and z) Improves F-measure by 0.5%

4 GALE Banks 11/9/06 4 Pos/Treebank Mismatches  “Ideal” – XP projection headed by X Ideal and Reality in the PTB and ATB  Ambiguities for (Pos word) makes parser’s job harder

5 GALE Banks 11/9/06 5 VP headed by noun  6% of VPs in ATB have a nonverbal head  Changed heads to have new POS tag – “DV”  Temporary approximation to current annotation changes  0.7 increase in F-measure ( VP (NOUN mugAdar+at+i- [departure]) (NP-SBJ (POSS_PRON –hi [his]) (NP-OBJ (DET+NOUN Al+bayot+a [the house]) (DET+ADJ Al+>aboyaD+a [the white])))

6 GALE Banks 11/9/06 6 NP headed by adj – #1 ( S (NP-SBJ (PRON_1S –niy [I]) (NP-PRD (ADJ saEiyd+N [happy])) ADJ heads NP-PRD, elsewhere ADJP-PRD ( VP (PV+PVSUFF_SUBJ kAn+a [be+he]) (NP-SBJ-1 (-NONE- *T*)) (ADJP-PRD (ADJ saEiyd+AF happy) (PP … [with the voting])))

7 GALE Banks 11/9/06 7 NP headed by adj - #2 (VP (IV ta+Eomal+a [they work]) (NP-SBJ rAbiT+ap+u Al+maxAtyr+i [league of the mukhtars(village chiefs)]) (NP-ADV (ADJ dA}im+AF [always])) ADJ heads NP-ADV, elsewhere ADVP,ADJP (VP (IV na+>omal+a [we hope for] (NP-SBJ (-NONE- *)) (ADVP (ADJ dA}im+AF [always])) (VP (IV ya+SiH~+u he/it+be correct (NP-SBJ-1 (-NONE- *T*)) (ADJP (ADJ dA}im+AF [always])

8 GALE Banks 11/9/06 8 ADJP headed by noun ( S (NP-SBJ (NOUN >um~ah+At+u- [mothers]) (POSS_PRON_3P -hum [their])) (ADJP-PRD (NOUN >amiyrokiy~+At+N [American])) Also as ADJ ( NP (NOUN >um~ah+At+K [mothers]) (ADJ >amiyrokiy~+At+K [American]))

9 GALE Banks 11/9/06 9 ADVP headed by conj (S (ADVP (FOCUS_PART >am~A [as_for/concerning])) (NP-TPC-1 Haqiyb+ap+u Al+xArijiy~+ap+I [the foreign ministry’s portfolio]) (ADVP (CONJ fa- [and/so])) (VP …. (CONJ fa-) also as child of S (S (S …) (PUNC,) (CONJ fa- [and/so]) (S…)

10 GALE Banks 11/9/06 10 Mismatches in ATB and PTB ATB3PTB2.0 VP6.0%0.5% NP5.0%1.6% ADJP7.3%23.4% ADVP45.37%8.0% PP0.8%1.8%

11 GALE Banks 11/9/06 11 XP/X mismatches - Summary  This matters: headless VPs to “DV” modification : +0.7% PTB: 23.4% mismatch for ADJP Overall: 88.28 ADJP: 70.68  Real-life linguistic complexity Need guidelines – visual prop time Some automatic changes likely  No guarantee of level of improvement, but: Should be a priority

12 GALE Banks 11/9/06 12 Flat NPs  Flat NPs – only (Pos word) children  Experiment – Evaluate with Flat NPs as different bracket Affects overall score (Gold) ( NP (NOUN -<ijorA’+i [conducting]) (NP (NOUN {inotixAb+At+K [elections]) (ADJ niyAbiy~+ap+K [representative])))

13 GALE Banks 11/9/06 13 Flat NPs (Gold) (NP (NOUN -<ijorA’+i [conducting]) (NP (NOUN {inotixAb+At+K [elections]) (ADJ niyAbiy~+ap+K [representative]))) (Test) (NP (NN -<ijorA’+i [conducting]) (NNS {inotixAb+At+K [elections]) (JJ niyAbiy~+ap+K [representative])) Under regular evaluation, top NPs match

14 GALE Banks 11/9/06 14 Flat NPs (Gold) ( NP (NOUN -<ijorA’+i [conducting]) (FLATNP (NOUN {inotixAb+At+K [elections]) (ADJ niyAbiy~+ap+K [ representative]))) (Test) (FLATNP (NN -<ijorA’+i [conducting]) (NNS {inotixAb+At+K [elections]) (JJ niyAbiy~+ap+K [representative])) With FlatNP evalution, no match

15 GALE Banks 11/9/06 15 Flat NPs  Importance of Flat NPs 30% of brackets are Flat NPs Errors percolate Up  ATB3 score on Flat NPs not good enough  Unclear why, but need some things from ATB Flat NPsOverall PTB2.094.2087.54 ATB386.7777.27

16 GALE Banks 11/9/06 16 Flat NPs  Clear statement of what can go in flat NPs  Regular expressions for each head  Certain things fall out: Questionable categories – e.g. (DET+NOUN DET+NOUN) (NP Al+baHor+i [the sea] Al+>aHomar+i [the red]) Nouns that occur before a head noun are limited to a small class : quantifiers

17 GALE Banks 11/9/06 17 Flat NPs (NP (NOUN kul~+a [every/all/each_one]) (DET+NOUN Al+nuSuws+I [the texts] (DET+ADJ Al+tijAriy~+ap+I [the business]) Quantifier as prenominal modifier in flat NP Quantifier as taking NP complement (NP (NOUN kul~+a [every/all/each_one]) (NP (DET+NOUN Al+duwal+i [the countries]) (DET+ADJ A+Earabiy~+ap+I [the Arabic])) Quantifiers take NP complement 15%

18 GALE Banks 11/9/06 18 Flat NPs - Summary  Real-life linguistic complexity Need guidelines for NP structure, quantifiers Some automatic changes likely Maybe different POS tag for NOUNs with different distribution?  No guarantee of level of improvement, but: Should be a priority

19 GALE Banks 11/9/06 19 Test on Train  ATB3 lower, but not so much  Analysis of dependency errors All<=40 PTB2.096.8097.10 ATB394.3195.34

20 GALE Banks 11/9/06 20 Dependency Analysis PTB2.0ATB3 % allFmeas%allFmeas 31.08%99.19%16.33%95.83% 0.0%N/A10.13%97.08% NPB headmod NP headNP  % all = % of all dependencies  NPB = “base NP”, non-recursive NP  More evidence that minimal NPs matter a lot

21 GALE Banks 11/9/06 21 Dependency Analysis PTB2.0ATB3 % allFmeas%allFmeas 5.23%94.785.74%89.05 0.04%30.401.28%65.08 NP NPBPP NP PP  Why the difference in PP adjoining to NP, and not just NPB?

22 GALE Banks 11/9/06 22 PP attachment in PTB Adjuncts at the same level OkayNot Okay (NP (NP ….) (PP ….) (PP …)) (NP (NP (NP …) (PP …)) (PP …))  This is true for ATB also

23 GALE Banks 11/9/06 23 PP attachment in PTB (NP (NP streets) (PP of (NP (NP the city) (PP of (NP Long Beach)) (PP in (NP the state…))))) (NP (NP streets) (PP of (NP (NP (NP the city) (PP of (NP Long Beach))) (PP in (NP the state…)))))  First is okay, second is not  PPs in PTB do not adjoin to recursive NPs  PPs in ATB do, because of Al<DAfp

24 GALE Banks 11/9/06 24 PP attachment in PTB and ATB (NP (NP streets) (PP of (NP (NP (NP the city) (PP of (NP Long Beach))) (PP in (NP the state…))))) (NP ($awAriE [streets]) (NP (NP madinyn+ap [the city]) (NP luwnog byt$ [Long Beach])) (PP fiy [in] (NP wilAy+ap [the state].. )))) PTB: PP adjoining to recursive NP – bad structure ATB: PP adjoining to recursive NP – good structure

25 GALE Banks 11/9/06 25 Dependency Analysis PTB2.0ATB3 % allFmeas%allFmeas 5.23%94.785.74%89.05 0.04%30.401.28%65.08 NP NPBPP NP PP  Parser distinguishes NPB, helps for PTB.  A wider range of attachment possibilities for ATB  Challenge for the parser

26 GALE Banks 11/9/06 26 Conclusion  We need guidelines We need to create the guidelines  Interaction - Parsing and Treebank Identify useful consistency checks Run as part of each release  Better understanding of problematic areas What sort of changes are necessary? Parsing – automatic transformations Treebank – Pos changes, etc.  Proper time allocation?


Download ppt "GALE Banks 11/9/06 1 Parsing Arabic: Key Aspects of Treebank Annotation Seth Kulick Ryan Gabbard Mitch Marcus."

Similar presentations


Ads by Google