Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chinese Proposition Bank Nianwen Xue, Chingyi Chia Scott Cotton, Seth Kulick, Fu-Dong Chiou, Martha Palmer, Mitch Marcus.

Similar presentations


Presentation on theme: "Chinese Proposition Bank Nianwen Xue, Chingyi Chia Scott Cotton, Seth Kulick, Fu-Dong Chiou, Martha Palmer, Mitch Marcus."— Presentation transcript:

1 Chinese Proposition Bank Nianwen Xue, Chingyi Chia Scott Cotton, Seth Kulick, Fu-Dong Chiou, Martha Palmer, Mitch Marcus

2 Outline  Motivation  Overview  Guidelines and Frame Files  Annotation procedure  Project status

3 Machine Translation 他 /he 在 /at 这 /this 个 /CL 文件 /document 上 /on 签 /sign 了 /ASP 自己 /self 的 /DE 名字 /name SYSTRAN: He has signed own name in this document Correct: He signed his own name on this document 他 /he 在 /at 这 /this 个 /CL 文件 /document 上 /on 签字 /sign SYSTRAN: He signs in this document Correct: He signed this document. Problem: Prepositional phrase is NOT semantic adjunct.

4 MT: Further examples 俄罗斯 /Russia 撤回 /withdraw 军队 /army. SYSTRAN: Russia withdraws the army. Correct: Russia withdrew the army. 俄罗斯 /Russia 军队 /army 撤回 /withdraw 莫斯科 /Moscow. SYSTRAN: The Russian army withdraws Moscow Correct: The Russian army withdrew to Moscow. Problem: Argument is the goal (arg2), not theme (arg1)!

5 Where We Are  Motivation  Overview  Guidelines and Frame Files  Annotation procedure  Project status

6 An Example 国会 /Congress 最近 /recently 通过 /pass 了 /ASP 银行法 /banking law “The Congress passed the banking law recently.” 银行法 /banking law 最近 /recently 通过 /pass 了 /ASP “The banking law passed recently.” 火车 /train 正在 /now 通过 /pass 遂道 /tunnel “The train is passing through the tunnel.” 火车 /train 正在 /now 通过 /pass “The train is passing.” Frameset1 Frameset2

7 Annotation Model VERB FS 0 FS 1 FS 2 …… FS i F 0 F 1 F 2 …… F j Arg 0 Arg 1 Arg 2 …… Arg k 国会 /Congress 通过 /pass 了 /ASP 银行法 /banking law

8 Where We Are  Motivation  Overview  Guidelines and Frame Files  Annotation procedure  Project status

9 Annotation Approach  Guidelines: specify how to create frame files and address some general annotation issues, e.g. the annotation of semantic adjuncts.  Frames files: specify how each verb is annotated.

10 Frame Files  Description of the framesets and subcat frames belonging to each frameset  Description of the set of roles associated with each frameset  Mapping between the syntactic entities and the argument labels, e.g. 法案 /bill 通过 /pass 了 /AS: SUBJECT->arg1, VERB -> REL  Annotated example for each subcat frame.

11 Defining Framesets (1)  Defining framesets involves characterizing the arguments of a verb in terms of (a) their syntactic realizations (subcat frames) and (b) their “semantic” properties.  Two subcat frames are the same if they have the same type and number of arguments, otherwise they are different  One subcat frame subsumes another if the arguments of the latter is a subset of the former.  All subcat frames that belong to a frameset should either be identical to or subsume one another.

12 Defining Frameset (2)  Syntactic realizations and semantic properties are expected to coincide most of the time: difference (similarity) in meaning is reflected in difference (similarity) in syntactic realizations (c.f. Levin 1993), e.g. 通过 /pass

13 Defining Framesets (3)  Framesets are NOT distinguished if a verb has different “senses” that are realized in the same subcat frame or set of subcat frames, e.g. 统一 Sense 1: standardize 分词 /segmentation 标准 /standard 要 /should 统一 /standardize 我们 /we 要 /will 统一 /standardize 分词 /segmentation 标准 /standard Sense 2: reunite 韩国 /Korea 要 /should 统一 /reunite 他们 /they 要 /should 统一 /reunite 韩国 /Korea

14 Don’t Forget the Adjuncts  Adjuncts are more global, i.e., not specific to individual verbs or a class of verbs.  The adjuncts are tagged as ArgM + functional tags indicating type.  The annotation of the adjuncts are specified in the guidelines.

15 Functional Tags for Adjuncts ADV: adverbial, default tag BNF: beneficiary CND: condition DIR: direction DGR: degree FRQ: frequency LOC: locative MNR: manner PRP: purpose or reason TMP: temporal TPC: topic

16 Functional Tags for Arguments and Phrasal Verbs PRD: predicate AS : 为, 是, 作, 做 AT: 在, 于 INTO: 成, 入, 进 ONTO: 上 TO: 到, 至 TOWARDS: 向, 往

17 An Actual Example 商检 /commercial inspection 部门 /department 最近 /recently 将 /ba 检验 /inspection 时间 /time 由 /from 七 /seven 至 /to 十 /ten 天 /day 缩短 /shorten 到 /to 一 /one 至 /to 三 /three 天 /day. “Commercial inspection department recently shortened the inspection time from 7 ~ 10 days to 1 ~ 3 days.” REL: 缩短 /shorten Arg0 (agent): 商检 /commercial inspection 部门 /department Arg1 (theme): 检验 /inspection 时间 /time Arg2 (range): Arg3 (starting point): 由 /from 七 /seven 至 /to 十 /ten 天 /day Arg4 (end point): 到 /to 一 /one 至 /to 三 /three 天 /day ArgM-TMP: 最近 /recently

18 Where We Are  Motivation  Overview  Guidelines and Frame Files  Annotation Procedure  Project Status

19 Annotation Procedure  Automatic Preprocessing: Preliminary results: 30 predicates, 95%(verbs only), 79%(with nominalizations) (Xue and Kulick, HlT’03)  Manual checking Double blind annotation and Adjudication

20 Extracting Subcat Frames  Traversing a parse tree, picking up constituents of interest, i.e., potential arguments, and generating a template representing the subcat frame.  “Normalizing” special constructions such as ba- and bei-constructions, and verb compounds

21 Using the Subcat Frames  Tagging the arguments Input: subcat frames, mappings Output: argument labels  Sorting the verbs by subcat frames

22 Where We Are  Motivation  Overview  Guidelines and Frame Files  Annotation procedure  Project status

23 Project Status  Guidelines ready. No major revision expected.  About 300 frame files created, at a 40~50 verbs per week.  Automatic tagger ready  Annotation interface ready

24 What’s Coming  Continued creation of frame files  Double-blind hand-correction  Adjudication


Download ppt "Chinese Proposition Bank Nianwen Xue, Chingyi Chia Scott Cotton, Seth Kulick, Fu-Dong Chiou, Martha Palmer, Mitch Marcus."

Similar presentations


Ads by Google