Two-Stage Constraint Based Hindi Parser LTRC, IIIT Hyderabad
Brief Recap Broad coverage parser Dependency Paninian framework vibhakti-karaka correspondence karaka frames (basic + transformation) Source groups, demand groups Constraints Three basic constraints Constraints as Integer programming equations
Parser Two stage strategy Appropriate constraints formed Stage I (Intra-clausal relations) Dependency relations marked Relations such as k1, k2, k3, etc. for each verb Stage II (Inter-clausal relations & conjunct relations) Conjuncts, relative clauses, kriya mula, etc In certain cases, separates syntax from semantics (eg. kriya mula), in others, reduces the complexity.
Steps in Parsing Morph, POS tagging, Chunking SENTENCE Identify Demand Groups Load Frames & Transform Find Candidates Apply Constraints & Solve Final Parse Is Complex NO YES STAGE - II
Stage I: Types being handled Simple Sentences (finite verbs) Clausal arguments Non-finite verbs wA_huA wA_hI nA kara 0_rahe, etc. Copula Genitive
Stage - II Handles: Conjuncts Subordinating & Coordinating Relative clauses Complex predicates Basic constraints similar to Stage-I Some additional constraints New demand groups New candidates
Steps (Stage II) Identify New Demand Groups Load Frames & Transform Find Candidates Apply Constraints & Solve FINAL PARSE Repair Output of STAGE - I
Example – Relative Clause vaha puswaka jo rAma ne mohana ko xI hE prasixXa hE that book which Ram ERG. Mohana DAT. gave is famous is ‘The book which Ram gave to Mohana is famous’
Output after Stage - I xI puswaka mohanarAma k2 k4 k1 _ROOT_ jo hE k1 prasixXa k1s main vaha
Identify the demand group xiyA ‘give’ Main verb of the relative clause
Identify the demand group, Load and Transform DF jo ‘which’ transformation (special) Transforms the demand frame of the main verb of the relative clause arc-label necessity vibhakti lextype src-pos arc-dir oprt nmod__relc m any n r|l p insert
Karaka Frame vaha puswaka jo rAma ne mohana ko xI prasixXa hE | that book which Ram ERG. Mohana DAT. gave famous is ‘The book which Ram gave to Mohana is famous’ Main verb of relative clause arc-label necessity vibhakti lextype src-pos arc-dir oprt nmod__relc m any n r|l p insert Transformed frame for xe after applying the jo trasformation New row inserted after transformation
Possible candidates vaha puswaka jo rAma ne mohana ko xI hE prasixXa hE | nmod__relc
Output after Stage - II xiyA hE vaha puswaka mohana rAma k2 k4 k1 _ROOT_ jo hE k1 prasixXa k1s nmod__relc main
Example II – Coordination rAma Ora siwA kala Aye | Ram and Sita yesterday came ‘Ram and Sita came yesterday’
Output of Stage - I rAma _ROOT_ Aye k1 siwA Ora kala k7t dummy main
For Stage – II (Constraint Graph) rAma _ROOT_ Aye k1 siwA Ora kala main k7t ccof
Candidate Arcs rAma _ROOT_ Aye k1 siwA Ora kala main k1 ccof
Solution Graph rAma _ROOT_ Aye siwA Ora kala k7t main k1 ccof
Parse tree Aye kalaOra k7t k1 _ROOT_ rAma siwA ccof main Output after Stage II
Finite Verb Coordination rAma Gara gayA Ora vaha so gayA | Ram home went and he sleep went ‘Ram went home and slept’ rAma _ROOT_ so Ora vaha k1 dummy main gayA Gara k1k2 Output after Stage I
Karaka Frame - Ora Finite Ora v_fin Ora so gayA ccof
Finite Verb Coordination (Parse Tree) rAma _ROOT_ so Ora vaha k1 main gayA Gara k1k2 Output after Stage II ccof
Relative Clause Coordination rAma ne vaha puswaka KarIxI jo prasixXa hE Ora jo saswI hE ‘Ram purchased the book which is famous and which is cheap’ KarIxI puswakarAma k2k1 _ROOT_ jo hE k1 prasixXa k1s main Ora jo hE k1 saswI k1s main dummy Output after Stage I
Karaka Frame - Ora Relative Clause Ora n v_rel Ora puswaka hE ccof nmod__relc
Relative Clause Coordination (Parse Tree) KarIxI puswakarAma k2k1 _ROOT_ jo hE k1 prasixXa k1s main Ora jo hE k1 saswI k1s Output after Stage II ccof nmod__relc
Non-Finite Verb Coordination rAma Kelakara Ora KAnA KAkara so gayA Ram having played and food having eaten sleep went _ROOT_ so Ora rAma k1 vmod main Kelakara KAnA KAkara dummy k2 Output after Stage I vmod
Karaka Frame - Ora Non-Finite Ora v_fin v_nfin Ora so Kelakara KAkara ccof
Non-Finite Verb Coordination (Parse Tree) _ROOT_ so Ora rAma k1 vmod main Kelakara KAnA KAkara k2 ccof Output after Stage II
Nominal Coordination rAma Ora siwA kala Aye | Ram and Sita yesterday came ‘Ram and Sita came yesterday’ rAma _ROOT_ Aye k1 siwA Ora kala k7t dummy main Output after Stage I
Karaka Frame - Ora Nominal Ora n n rAma siwA ccof
Nominal Coordination (Parse Tree) Aye kalaOra k7t k1 _ROOT_ rAma siwA ccof main Output after Stage II
Example rAma Ora siwA kala Aye | rAma _ROOT_ Aye k1 siwA Ora kala k7t dummy main
Steps (Stage II) Identify Nodes Load Frames & Transform Find Candidates Apply Constraints & Solve FINAL PARSE Repair Identify New Demand Groups Output of STAGE - I
Constraint Graph Nodes (Stage II) Selected from the intermediate parse tree (Stage I) Set-I (demand nodes) 1. Conjuncts 2. Nearest verbal ancestor of ‘jo’ (usually just the parent) 3. _ROOT_ 4. Children of _ROOT_ other than (1) and (2). 5. Other nodes which are added due to nodes in Set 2
Constraint Graph Nodes (Stage II) Set-II (source nodes) 1. Possible children and parents of conjuncts 2. Possible heads of the relative clause. Identification of nodes in Set-II will generally trigger the repair.
Steps (Stage II) Identify Nodes Load Frames & Transform Find Candidates Apply Constraints & Solve FINAL PARSE Repair Identify New Demand Groups Output of STAGE - I
Identify the demand group Ora Aye
Steps (Stage II) Identify Nodes Load Frames & Transform Find Candidates Apply Constraints & Solve FINAL PARSE Repair Identify New Demand Groups Output of STAGE - I
General Principles Repair/Revision 1. Any node which becomes a potential child in stage 2, its arc to its existing parent is open to revision rAma Ora siwA kala Aye Node 4 becomes potential child (of node 1) Its parent (node 2) is open to revision
General Principles Repair/Revision after parse of stage I 2. Any node which becomes a potential parent must be re-looked at. rAma Ora siwA kala Aye Node 2 becomes potential parent (of 1) Its child (node 4) is open to revision
Algorithm Identify nodes of the constraint graph From Set 1, and From Set 2 Remove all outgoing edges from _ROOT_. Find possible candidates for demand nodes present in Set 1 from Set 2 Parent candidate for finite verb Parent and children for conjuncts Children of _ROOT_ Convert the formed constraint graph into integer programming (IP) problem. Solve the IP equations to get the possible solution parse.
An example raama aura sitaa kala aaye ’Ram’ ’and’ ’Sita’ ’yesterday’ ‘came’ Ram and Sita came yesterday Output after stage I rAma _ROOT_ Aye k1 siwA Ora kala k7t dummy main
Identify Nodes Set 1 nodes Set 1 and Set 2 rAma _ROOT_ Aye k1 siwA Ora kala k7t dummy main rAma _ROOT_ Aye k1 siwA Ora kala k7t dummy main
Constraint Graph New Constraint Graph Ora, Aye and _ROOT_ are the demand groups Note: ‘kala’ remains attached to its parent ‘aaye’ (does not show up in stage 2) rAma _ROOT_ Aye siwA Ora ccof k1 main
Example Final Parse Aye kalaOra k7t k1 _ROOT_ rAma siwA ccof main
Types of complex sentences Relative clauses Initial Final Medial Conjuncts (Coordination) Simple clause Relative clause Non-finite Nominal, adjectival, adverbial
Some other examples: rAma ne vaha puswaka KarIxI jo saswI hE Ora jo bAjZAra meM prasixXa hE | samIra Ora aBay ne vaha puswaka KarIxI jo saswI hE Ora jo bAjZAra meM prasixXa hE | rAma Ora mohana ke xoswa kI baccI Aye | Only baccI came, or Both rAma and baccI came Use of ‘gnp’ of the main verb, Aye vs. AI
THANKS!!