Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantic Role Labeling for Arabic using Kernel Methods Mona Diab Alessandro Moschitti Daniele Pighin.

Similar presentations


Presentation on theme: "Semantic Role Labeling for Arabic using Kernel Methods Mona Diab Alessandro Moschitti Daniele Pighin."— Presentation transcript:

1 Semantic Role Labeling for Arabic using Kernel Methods Mona Diab Alessandro Moschitti Daniele Pighin

2 What is SRL? Proposition John opened the door

3 What is SRL? Proposition [John] Agent [opened] Predicate [the door] Theme

4 What is SRL? Proposition [John] Agent [opened] Predicate [the door] Theme Subject Object

5 What is SRL? Proposition [John] Agent [opened] Predicate [the door] Theme Subject Object [The door] Theme [opened] Predicate

6 What is SRL? Proposition [John] Agent [opened] Predicate [the door] Theme Object Subject [The door] Theme [opened] Predicate

7 What is SRL? Proposition [John] Agent [opened] Predicate [the door] Theme FrameNet Agent Container_portal [The door] Theme [opened] Predicate

8 What is SRL? Proposition [John] Agent [opened] Predicate [the door] Theme PropBank ARG0 ARG1 [The door] Theme [opened] Predicate

9 Why SRL? Useful for information extraction Useful for Question Answering Useful for Machine Translation?

10 Our Goal Last Sunday India to official visit Rongji Zhu the- Chinese the-Ministers president started The Chinese Prime Minister Zho Rongji started an official visit to India last sunday

11 Our Goal Last Sunday India to official visit Rongji Zhu the- Chinese the-Ministers president started The Chinese Prime Minister Zho Rongji started an official visit to India last Sunday ARGM-TMP

12 RoadMap Arabic Characteristics Our Approach Experiments & Results Conclusions & Future Directions

13 Morphology Rich complex morphology –Templatic, concatenative, derivational, inflectional wbHsnAthm w+b+Hsn+At+hm and by virtue(s) their –Verbs are marked for tense, person, gender, aspect, mood, voice –Nominals are marked for case, number, gender, definiteness Orthography is underspecified for short vowels and consonant doubling (diacritics)

14 Syntax Pro-drop language –Akl AlbrtqAl‘[he] ate the orange(s)’ –hw Akl AlbrtqAl‘he ate the orange(s)’ Relative free word order –VSO, SVO, OVS, etc. –The canonical order is VSO, dialects are more SVO –In Arabic Treebank v3.2 we observe equal distribution of SVO (35%) and VSO (35%) and pro-drop (30%) Complex noun phrases expressing possession ‘idafa constructions –mlk AlArdn‘king_INDEF Jordan’ king of Jordan

15 Characteristics relevant for SRL Typical underspecification of short vowels masks morphological features such as case and agreement –Example: rjl Albyt Alkbyr Man_ masc the-house_ masc the-big_ masc “the big man of the house” or “the man of the big house”

16 Characteristics relevant for SRL Typical underspecification of short vowels masks morphological features such as case and agreement –Example: rjlu Albyti Alkbyri Man_ masc-Nom the-house_ masc-Gen the-big_ masc-Gen the man of the big house

17 Characteristics relevant for SRL Typical underspecification of short vowels masks morphological features such as case and agreement –Example: rjlu Albyti Alkbyru Man_ masc-Nom the-house_ masc-Gen the-big_ masc-Nom the big man of the house

18 Characteristics relevant for SRL Idafa constructions make indefinite nominals syntactically definite hence allowing for agreement, therefore better scoping –Example: [rjlu Albyti] Alkbyru Man_ masc-Nom-Def the-house_ masc-Gen the-big_ masc-Nom- Def the big man of the house

19 Characteristics relevant for SRL Passive constructions are hard to detect due to underspecified short vowels marking passivization inflection. Best automatic systems are at 68% acc. –Example: qtl Emr bslAH qAtl…. [He] pro-drop killed Amr by a deadly weapon… Amr killed by a deadly weapon … Amr was killed by a deadly weapon ….

20 Characteristics relevant for SRL Passive constructions are hard to detect due to underspecified short vowels marking passivization inflection. Hence –Example: qatal Emra _ACC_ARG1 bslAHiK qAtliK…. [He] pro-drop killed Amr _ACC_ARG1 by a deadly weapon… Amr killed by a deadly weapon … Amr was killed by a deadly weapon ….

21 Characteristics relevant for SRL Passive constructions are hard to detect due to underspecified short vowels marking passivization inflection. Hence –Example: qatal Emru_ NOM_ARG0 bslAHiK qAtliK…. [He] pro-drop killed Amr by a deadly weapon… Amr _NOM_ARG0 killed by a deadly weapon … Amr was killed by a deadly weapon ….

22 Characteristics relevant for SRL Passive constructions are hard to detect due to underspecified short vowels marking passivization inflection. Hence –Example: qutil Emru _NOM_ARG1 bslAHiK qAtliK…. [He] pro-drop killed Amr by a deadly weapon… Amr killed by a deadly weapon … Amr _NOM_ARG1 was killed by a deadly weapon ….

23 Characteristics relevant for SRL Passive constructions differ from English in that they can not have an explicit non- instrument underlying subject, hence only ARG1 and ARG2. ARG0 are not allowed. –Example: qutil Emru bslAHiK qAtliK *qutl [Emru] ARG1 [bslmY] ARG0 *[Amr] ARG1 was killed [by SalmA] ARG0

24 Characteristics relevant for SRL Passive constructions differ from English in that they can not have an explicit non- instrument underlying subject, hence only ARG1 and ARG2. ARG0 are not allowed. –Example: qutil [Emru] ARG1 [bslAHiK qAtliK] ARG2 [Amr] ARG1 was killed [by a deadly weapon] ARG2

25 Characteristics relevant for SRL Relative free word order combined by agreement patterns between Subject and Verb could be helpful when explicit yet confusing with absence of case and passive marker and pro-drop VSO = Gender agreement only between V and S SVO = Gender and Number agreement

26 Our Approach

27 Semantic Role Labeling Steps Given a sentence and an associated syntactic parse An SRL system identifies the arguments for a given predicate The arguments are identified in two steps –Argument boundary detection –Argument role classification For the overall system we apply a heuristic for argument label conflict resolution one label per argument

28 The Sentence The Chinese Prime Minister Zho Rongji started an official visit to India last sunday

29 The Parse Tree

30 Boundary Identification

31 Role Classification

32 Our Approach Experiment with different kernels Experiment with Standard Features (similar to English) and rich morphological features specific to Arabic

33 Different Kernels Polynomial Kernels (1-6) with standard features Tree Kernels Where N t 1 and N t 2 are the sets of nodes in t 1 and t 2, and Δ(.) evaluates the common substructures rooted in n 1 and n 2

34 Argument Structure Trees (AST) NP D N VP V delivers a talk S N Paul in PP IN NP jj formal N style Arg. 1 Defined as the minimal subtree encompassing the predicate and one of its arguments

35 Tree Substructure Representations NP D N VP V delivers a talk NP D N VP V delivers a NP D N VP V delivers NP D N VP V NP VP V

36 The overall set of AST substructures

37 Explicit feature space counts the number of common substructures

38 Standard Features Predicate: Lemmatization of the predicate Path: Syntactic path linking the predicate and an argument NN  NP  VP  VBD Partial Path: Path feature limited to the branching of arg No Direction path without the traversals Phrase type Last and first POS of words in the arguments Verb subcategorization frame: production expanding the predicate parent node Position of the argument relative to predicate Syntactic Frame: positions of the surrounding NPs relative to predicate

39 Extended Features for Arabic Definiteness, Number, Gender, Case, Mood, Person, Lemma (vocalized), English Gloss, Unvocalized surface form, Vocalized Surface form Expanded the leaf nodes in AST with 10 attribute value pairs creating EAST

40 Arabic AST Sample AST from our example ARG0

41 Arabic AST Sample AST from our example ARG0

42 Extended AST (EAST) ……

43 Experiments & Results

44 Experimental Set Up SemEval 2007 Task 18 data set, Pilot Arabic Propbank 95 most frequent verbs in ATB3v2 Gold parses, Unvowelized, Bies reduced POS tag set (25 tags) Num Sentences: Dev (886), Test (902), Train (8402) 26 role types (5 numbered ARGs)

45 Experimental Set Up Experimented only with 350k examples We use the SVM-Light TK Toolkit (Moschitti, 2004, 2006) with SVM light default parameters Evaluation metrics of precision, recall and F measure are obtained using the CoNLL evaluator

46 Boundary Detection Results

47 Role Classification Results

48 Overall Results

49 Observations-BD AST and EAST don’t differ much for boundary detection AST+EAST+ Poly (3) gives best BD results AST and EAST perform significantly better than Poly (1)

50 Observations – RC & SRL For classification, EAST is 2 absolute f-score points better than AST AST is better than Poly(1) and EAST is better than Poly(1) and AST for both classification and overall system Poly 2 and 3 are similar to EAST in classification AST+EAST+best Poly, Poly(3), yields best classification results Best results yielded are for ARG0 and ARG1 ARG1 because of passive cases in Arabic is harder than in English

51 Conclusions Explicitly encoding the rich morphological features helps with SRL in Arabic Tree Kernels is indeed a feasible way of dealing with large feature spaces that are structural in nature Combining kernels yields better results

52 Future Directions Experiment with richer POS tag sets Experiment with automatic parses Experiment with different syntactic formalisms Integrate polynomial kernels with tree kernels Experiment with better conflict resolution approaches

53 Thank You

54 The parse tree


Download ppt "Semantic Role Labeling for Arabic using Kernel Methods Mona Diab Alessandro Moschitti Daniele Pighin."

Similar presentations


Ads by Google