Korean Treebank & Propbank Martha Palmer, Narae Han, Jinyoung Choi, Shijong Ryu University of Pennsylvania May 23, 2005.

1 Korean Treebank & Propbank Martha Palmer, Narae Han, Jinyoung Choi, Shijong Ryu University of Pennsylvania May 23, 2005

2 5/23/20052 Outline Status Report –Korean Treebank –Korean Propbank Frames Files –lemma Split Argument

3 5/23/20053 Korean Treebank - Done Virginia Corpus –54.5 thousand words (symbols tokenized) –Language training in a military setting Newswire Corpus –131.8 thousand words (symbols tokenized) –Korean Press Agency news articles from June 2, 1994, to March 20, 2000

4 5/23/20054 Korean Propbank – Current Status First subtask: 54.5K Virginia corpus –9,590 predicate tokens double-annotated (100%) Second subtask: 131.8K Newswire corpus –3,800 predicate tokens annotated out of 23,700 (15%) Frames files –1,800 predicates out of 2,800 (64%)

5 5/23/20055 Korean Frames files Similar xml structure to English and Chinese Frames files to get compatibilty Lemma of Korean Frames files is root, not stem Stem = Root + Derivational suffix Root has its own predicate argument structure Derivational suffix has grammatical function

6 5/23/20056 Frames filse 1 – verb root frameset meok.01 "eat": –Roleset: ArgA: causer Arg0: eater Arg1: food ‘meok-ta’: active form –Arg0: SBJ –Arg1: OBJ ‘meok-hi-ta’ : passive form –Arg0: COMP –Arg1: SBJ ‘meok-i-ta’: causative form –ArgA: SBJ –Arg0: COMP –Arg1: OBJ

7 5/23/20057 Frames files 2 – deverbal noun frameset kong-keup.01 “supply": –Roleset: Arg0: giver Arg1: thing provided Arg2: receiver ‘kong-keup-ha-ta’: active form –Arg0: SBJ –Arg1: OBJ –Arg2: COMP ‘kong-keup-toe-ta’ : passive form –Arg0: S –Arg1: SBJ –Arg2: COMP ‘kong-keup-pat-ta’: recipient form –Arg0: COMP –Arg1: OBJ –Arg2: SBJ

8 5/23/20058 Split Arguments Possessor & Possessee Floating Quantifier Small Clause Deverbal Noun structure

9 5/23/20059 Possessor & Possessee 1 kho-kki-ri-ka kho-ka kil-ta.Elephant’s trunk is long a-peo-ci-ka ton-i phil-yo-ha-ta.Father needs money (S (NP-SBJ kho-kki-ri-ka)elephant-nom (S (NP-SBJ kho-ka)trunk-nom (ADJP kil-ta)))long (S (NP-SBJ a-peo-ci-ka)father-nom (S (NP-SBJ ton-i)money-nom (ADJP phil-yo-ha-ta)))need

10 5/23/200510 Possessor & Possessee 2 kho-kki-ri-yi kho-ka kil-ta.Elephant’s trunk is long *a-peo-ci-yi ton-i phil-yo-ha-ta.*Father’s money needs (S (NP-SBJ (NP kho-kki-ri-yi)elephant-poss (NP kho-ka))trunk-nom (ADJP kil-ta))long (S (NP-SBJ (NP a-peo-ci-yi))father-poss (NP ton-i))money-nom (ADJP (NP-COMP *pro*) phil-yo-ha-ta))need

11 5/23/200511 Floating Quantifier hak-saeng-i se myeong-i o-ass-ta.Three student came. (S (NP-SBJ hak-saeng-i)student-nom (VP (NP-ADV se myeong-i)three-nom (VP o-ass-ta)))come-past se myeong-yi hak-saeng-i o-ass-ta. (S (NP-SBJ (NP se myeong-i)three-poss (NP hak-saeng-i))student-nom (VP o-ass-ta))come-past

12 5/23/200512 Small Clause 1 na-neun keu-reul pa-po-ro saeng-kak-ha-eoss-ta. ‘I thought of him as a fool’ na-neun keu-reul pan-cang-eu-ro ppop-ass-ta. ‘I elected him as the class president’ (S (NP-SBJ na-neun)I-nom (VP (NP-OBJ keu-reul)him-acc (NP-COMP pa-po-ro)fool-abl saeng-kak-haess-ta))think-past (S (NP-SBJ na-neun)I-nom (VP (NP-OBJ keu-reul)him-acc (NP-COMP pan-cang-eu-ro)class president-abl ppop-ass-ta))elect-past

13 5/23/200513 Small Clause 2 na-neun keu-ka pa-po-ra-ko saeng-kak-ha-eoss-ta. * na-neun keu-ka pan-cang-i-ra-ko ppop-ass-ta. saeng-kak Arg0: thinker Arg1: thought ppop- Arg0: voter Arg1: candidate Arg2: position

14 5/23/200514 Deverbal Noun structure 1 na-neun eom-ma-e-ke-seo neuc-ke wa-to coh-ta-ko heo-rak-eul pat-ass-ta. ‘I had permission from mom that I can return home late’ (S (NP-SBJ na-neun) (VP (NP-COMP eom-ma-e-ke-seo) (VP (S (S-SBJ (NP-SBJ *pro*) (VP (ADVP neuc-ke) (VP wa-to))) (ADJP coh-ta-ko)) (VP (NP-OBJ heo-rak-eul) pat-ass-ta))))

15 5/23/200515 Deverbal Noun structure 2 na-neun eom-ma-e-ke-seo neuc-ke wa-to coh-ta-neun heo-rak-eul pat-ass- ta. (S (NP-SBJ na-neun) (VP (NP-COMP eom-ma-e-ke-seo) (NP-OBJ (S (S-SBJ (NP-SBJ *pro*) (VP (ADVP neuc-ke) (VP wa-to))) (ADJP coh-ta-neun)) (NP heo-rak-eul)) pat-ass-ta)) pat- –Arg0: receiver –Arg1: thing gotten –Arg2: giver

16 5/23/200516 Throughput Creating Frames files –Approximately 70 predicates per week –Need 14 weeks to complete Frames files Annotation –Approximately 1,600 predicate tokens per week –Need 14 weeks to complete annotation for the Newswire corpus

17 5/23/200517 To be done in future Adjudicate & publish Korean Propbank Revise Korean treebank guideline Write Korean propbank guideline

18 Thank You

