Download presentation
Presentation is loading. Please wait.
Published byJonathan Washington Modified over 9 years ago
1
Semantic Role Labeling for Arabic using Kernel Methods Mona Diab Alessandro Moschitti Daniele Pighin
2
What is SRL? Proposition John opened the door
3
What is SRL? Proposition [John] Agent [opened] Predicate [the door] Theme
4
What is SRL? Proposition [John] Agent [opened] Predicate [the door] Theme Subject Object
5
What is SRL? Proposition [John] Agent [opened] Predicate [the door] Theme Subject Object [The door] Theme [opened] Predicate
6
What is SRL? Proposition [John] Agent [opened] Predicate [the door] Theme Object Subject [The door] Theme [opened] Predicate
7
What is SRL? Proposition [John] Agent [opened] Predicate [the door] Theme FrameNet Agent Container_portal [The door] Theme [opened] Predicate
8
What is SRL? Proposition [John] Agent [opened] Predicate [the door] Theme PropBank ARG0 ARG1 [The door] Theme [opened] Predicate
9
Why SRL? Useful for information extraction Useful for Question Answering Useful for Machine Translation?
10
Our Goal Last Sunday India to official visit Rongji Zhu the- Chinese the-Ministers president started The Chinese Prime Minister Zho Rongji started an official visit to India last sunday
11
Our Goal Last Sunday India to official visit Rongji Zhu the- Chinese the-Ministers president started The Chinese Prime Minister Zho Rongji started an official visit to India last Sunday ARGM-TMP
12
RoadMap Arabic Characteristics Our Approach Experiments & Results Conclusions & Future Directions
13
Morphology Rich complex morphology –Templatic, concatenative, derivational, inflectional wbHsnAthm w+b+Hsn+At+hm and by virtue(s) their –Verbs are marked for tense, person, gender, aspect, mood, voice –Nominals are marked for case, number, gender, definiteness Orthography is underspecified for short vowels and consonant doubling (diacritics)
14
Syntax Pro-drop language –Akl AlbrtqAl‘[he] ate the orange(s)’ –hw Akl AlbrtqAl‘he ate the orange(s)’ Relative free word order –VSO, SVO, OVS, etc. –The canonical order is VSO, dialects are more SVO –In Arabic Treebank v3.2 we observe equal distribution of SVO (35%) and VSO (35%) and pro-drop (30%) Complex noun phrases expressing possession ‘idafa constructions –mlk AlArdn‘king_INDEF Jordan’ king of Jordan
15
Characteristics relevant for SRL Typical underspecification of short vowels masks morphological features such as case and agreement –Example: rjl Albyt Alkbyr Man_ masc the-house_ masc the-big_ masc “the big man of the house” or “the man of the big house”
16
Characteristics relevant for SRL Typical underspecification of short vowels masks morphological features such as case and agreement –Example: rjlu Albyti Alkbyri Man_ masc-Nom the-house_ masc-Gen the-big_ masc-Gen the man of the big house
17
Characteristics relevant for SRL Typical underspecification of short vowels masks morphological features such as case and agreement –Example: rjlu Albyti Alkbyru Man_ masc-Nom the-house_ masc-Gen the-big_ masc-Nom the big man of the house
18
Characteristics relevant for SRL Idafa constructions make indefinite nominals syntactically definite hence allowing for agreement, therefore better scoping –Example: [rjlu Albyti] Alkbyru Man_ masc-Nom-Def the-house_ masc-Gen the-big_ masc-Nom- Def the big man of the house
19
Characteristics relevant for SRL Passive constructions are hard to detect due to underspecified short vowels marking passivization inflection. Best automatic systems are at 68% acc. –Example: qtl Emr bslAH qAtl…. [He] pro-drop killed Amr by a deadly weapon… Amr killed by a deadly weapon … Amr was killed by a deadly weapon ….
20
Characteristics relevant for SRL Passive constructions are hard to detect due to underspecified short vowels marking passivization inflection. Hence –Example: qatal Emra _ACC_ARG1 bslAHiK qAtliK…. [He] pro-drop killed Amr _ACC_ARG1 by a deadly weapon… Amr killed by a deadly weapon … Amr was killed by a deadly weapon ….
21
Characteristics relevant for SRL Passive constructions are hard to detect due to underspecified short vowels marking passivization inflection. Hence –Example: qatal Emru_ NOM_ARG0 bslAHiK qAtliK…. [He] pro-drop killed Amr by a deadly weapon… Amr _NOM_ARG0 killed by a deadly weapon … Amr was killed by a deadly weapon ….
22
Characteristics relevant for SRL Passive constructions are hard to detect due to underspecified short vowels marking passivization inflection. Hence –Example: qutil Emru _NOM_ARG1 bslAHiK qAtliK…. [He] pro-drop killed Amr by a deadly weapon… Amr killed by a deadly weapon … Amr _NOM_ARG1 was killed by a deadly weapon ….
23
Characteristics relevant for SRL Passive constructions differ from English in that they can not have an explicit non- instrument underlying subject, hence only ARG1 and ARG2. ARG0 are not allowed. –Example: qutil Emru bslAHiK qAtliK *qutl [Emru] ARG1 [bslmY] ARG0 *[Amr] ARG1 was killed [by SalmA] ARG0
24
Characteristics relevant for SRL Passive constructions differ from English in that they can not have an explicit non- instrument underlying subject, hence only ARG1 and ARG2. ARG0 are not allowed. –Example: qutil [Emru] ARG1 [bslAHiK qAtliK] ARG2 [Amr] ARG1 was killed [by a deadly weapon] ARG2
25
Characteristics relevant for SRL Relative free word order combined by agreement patterns between Subject and Verb could be helpful when explicit yet confusing with absence of case and passive marker and pro-drop VSO = Gender agreement only between V and S SVO = Gender and Number agreement
26
Our Approach
27
Semantic Role Labeling Steps Given a sentence and an associated syntactic parse An SRL system identifies the arguments for a given predicate The arguments are identified in two steps –Argument boundary detection –Argument role classification For the overall system we apply a heuristic for argument label conflict resolution one label per argument
28
The Sentence The Chinese Prime Minister Zho Rongji started an official visit to India last sunday
29
The Parse Tree
30
Boundary Identification
31
Role Classification
32
Our Approach Experiment with different kernels Experiment with Standard Features (similar to English) and rich morphological features specific to Arabic
33
Different Kernels Polynomial Kernels (1-6) with standard features Tree Kernels Where N t 1 and N t 2 are the sets of nodes in t 1 and t 2, and Δ(.) evaluates the common substructures rooted in n 1 and n 2
34
Argument Structure Trees (AST) NP D N VP V delivers a talk S N Paul in PP IN NP jj formal N style Arg. 1 Defined as the minimal subtree encompassing the predicate and one of its arguments
35
Tree Substructure Representations NP D N VP V delivers a talk NP D N VP V delivers a NP D N VP V delivers NP D N VP V NP VP V
36
The overall set of AST substructures
37
Explicit feature space counts the number of common substructures
38
Standard Features Predicate: Lemmatization of the predicate Path: Syntactic path linking the predicate and an argument NN NP VP VBD Partial Path: Path feature limited to the branching of arg No Direction path without the traversals Phrase type Last and first POS of words in the arguments Verb subcategorization frame: production expanding the predicate parent node Position of the argument relative to predicate Syntactic Frame: positions of the surrounding NPs relative to predicate
39
Extended Features for Arabic Definiteness, Number, Gender, Case, Mood, Person, Lemma (vocalized), English Gloss, Unvocalized surface form, Vocalized Surface form Expanded the leaf nodes in AST with 10 attribute value pairs creating EAST
40
Arabic AST Sample AST from our example ARG0
41
Arabic AST Sample AST from our example ARG0
42
Extended AST (EAST) ……
43
Experiments & Results
44
Experimental Set Up SemEval 2007 Task 18 data set, Pilot Arabic Propbank 95 most frequent verbs in ATB3v2 Gold parses, Unvowelized, Bies reduced POS tag set (25 tags) Num Sentences: Dev (886), Test (902), Train (8402) 26 role types (5 numbered ARGs)
45
Experimental Set Up Experimented only with 350k examples We use the SVM-Light TK Toolkit (Moschitti, 2004, 2006) with SVM light default parameters Evaluation metrics of precision, recall and F measure are obtained using the CoNLL evaluator
46
Boundary Detection Results
47
Role Classification Results
48
Overall Results
49
Observations-BD AST and EAST don’t differ much for boundary detection AST+EAST+ Poly (3) gives best BD results AST and EAST perform significantly better than Poly (1)
50
Observations – RC & SRL For classification, EAST is 2 absolute f-score points better than AST AST is better than Poly(1) and EAST is better than Poly(1) and AST for both classification and overall system Poly 2 and 3 are similar to EAST in classification AST+EAST+best Poly, Poly(3), yields best classification results Best results yielded are for ARG0 and ARG1 ARG1 because of passive cases in Arabic is harder than in English
51
Conclusions Explicitly encoding the rich morphological features helps with SRL in Arabic Tree Kernels is indeed a feasible way of dealing with large feature spaces that are structural in nature Combining kernels yields better results
52
Future Directions Experiment with richer POS tag sets Experiment with automatic parses Experiment with different syntactic formalisms Integrate polynomial kernels with tree kernels Experiment with better conflict resolution approaches
53
Thank You
54
The parse tree
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.