Chinese Proposition Bank Nianwen Xue, Chingyi Chia Scott Cotton, Seth Kulick, Fu-Dong Chiou, Martha Palmer, Mitch Marcus.

Slides:



Advertisements
Similar presentations
BIS 360 – Lecture Seven Process Modeling (Chapter 8)
Advertisements

QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.
Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.
Syntax-Semantics Mapping Rajat Kumar Mohanty CFILT.
Layering Semantics (Putting meaning into trees) Treebank Workshop Martha Palmer April 26, 2007.
Multilinugual PennTools that capture parses and predicate-argument structures, and their use in Applications Martha Palmer, Aravind Joshi, Mitch Marcus,
E XTRACTING SEMANTIC ROLE INFORMATION FROM UNSTRUCTURED TEXTS Diana Trandab ă 1 and Alexandru Trandab ă 2 1 Faculty of Computer Science, University “Al.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
Writing: letter of advice.
计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Semantic Role Labeling Abdul-Lateef Yussiff
10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University.
PropBanks, 10/30/03 1 Penn Putting Meaning Into Your Trees Martha Palmer Paul Kingsbury, Olga Babko-Malaya, Scott Cotton, Nianwen Xue, Shijong Ryu, Ben.
1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.
Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.
Simple Features for Chinese Word Sense Disambiguation Hoa Trang Dang, Ching-yi Chia, Martha Palmer, Fu- Dong Chiou Computer and Information Science University.
Information Extraction and Ontology Learning Guided by Web Directory Authors:Martin Kavalec Vojtěch Svátek Presenter: Mark Vickers.
信息利用与学术论文写作 Library of Jiangsu University, Zhenjiang Sha Zhenjiang
Systems Analysis I Data Flow Diagrams
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
EMPOWER 2 Empirical Methods for Multilingual Processing, ‘Onoring Words, Enabling Rapid Ramp-up Martha Palmer, Aravind Joshi, Mitch Marcus, Mark Liberman,
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.
 The last national convention with all Toastmasters clubs from Mainland China, Hong Kong and Macau;  最后一次中国大陆、香港、澳门的俱乐部在同一 个大区的峰会; What’s Particular?
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
Interpreting Dictionary Definitions Dan Tecuci May 2002.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
Penn 1 Kindle: Knowledge and Inference via Description Logics for Natural Language Dan Roth University of Illinois, Urbana-Champaign Martha Palmer University.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.
Linguistics The eleventh week. Chapter 4 Syntax  4.1 Introduction  4.2 Word Classes.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
《 Professional English for Secretaries 》 Unit 5 Meeting Arrangements Task 2 Meeting Notice 1.
Combining Lexical Resources: Mapping Between PropBank and VerbNet Edward Loper,Szu-ting Yi, Martha Palmer September 2006.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Supertagging CMSC Natural Language Processing January 31, 2006.
LING 6520: Comparative Topics in Linguistics (from a computational perspective) Martha Palmer Jan 15,
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
ARDA Visit 1 Penn Lexical Semantics at Penn: Proposition Bank and VerbNet Martha Palmer, Dan Gildea, Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Karin.
NLP. Introduction to NLP Last week, Min broke the window with a hammer. The window was broken with a hammer by Min last week With a hammer, Min broke.
1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.
Multilinugual PennTools that capture parses and predicate-argument structures, for use in Applications Martha Palmer, Aravind Joshi, Mitch Marcus, Mark.
> > Matakuliah: > Tahun: > Bina Nusantara University 3 语法 Grammar 用 “ 吗 ” 的问句 Question with “ 吗 ” An interrogative sentence is formed by adding the modal.
An Introduction to Semantic Parts of Speech Rajat Kumar Mohanty rkm[AT]cse[DOT]iitb[DOT]ac[DOT]in Centre for Indian Language Technology Department of Computer.
Human-Assisted Machine Annotation Sergei Nirenburg, Marjorie McShane, Stephen Beale Institute for Language and Information Technologies University of Maryland.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
进口食品标签中介服务平台介绍 上海顺益信息科技有限公司. 目录页 平台操作 平台定义 平台功能 进口食品标签中介服务平台介绍.
Generation of Chinese Character Based on Human Vision and Prior Knowledge of Calligraphy 报告人: 史操 作者: 史操、肖建国、贾文华、许灿辉 单位: 北京大学计算机科学技术研究所 NLP & CC 2012: 基于人类视觉和书法先验知识的汉字自动生成.
Unit 3 Let’s celebrate!. Happy Halloween! make pumpkin lanterns dress up (wear special costumes) have a special party play a game called “trick or treat”
A Database of Narrative Schemas A 2010 paper by Nathaniel Chambers and Dan Jurafsky Presentation by Julia Kelly.
Lec. 10.  In this section we explain which constituents of a sentence are minimally required, and why. We first provide an informal discussion and then.
31 st Which festival is on October 31th? Halloween.
Leonardo Zilio Supervisors: Prof. Dr. Maria José Bocorny Finatto
Knowledge Representation Techniques
English Proposition Bank: Status Report
Coarse-grained Word Sense Disambiguation
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Representation of Actions as an Interlingua
Progress report on Semantic Role Labeling
Owen Rambow 6 Minutes.
Presentation transcript:

Chinese Proposition Bank Nianwen Xue, Chingyi Chia Scott Cotton, Seth Kulick, Fu-Dong Chiou, Martha Palmer, Mitch Marcus

Outline  Motivation  Overview  Guidelines and Frame Files  Annotation procedure  Project status

Machine Translation 他 /he 在 /at 这 /this 个 /CL 文件 /document 上 /on 签 /sign 了 /ASP 自己 /self 的 /DE 名字 /name SYSTRAN: He has signed own name in this document Correct: He signed his own name on this document 他 /he 在 /at 这 /this 个 /CL 文件 /document 上 /on 签字 /sign SYSTRAN: He signs in this document Correct: He signed this document. Problem: Prepositional phrase is NOT semantic adjunct.

MT: Further examples 俄罗斯 /Russia 撤回 /withdraw 军队 /army. SYSTRAN: Russia withdraws the army. Correct: Russia withdrew the army. 俄罗斯 /Russia 军队 /army 撤回 /withdraw 莫斯科 /Moscow. SYSTRAN: The Russian army withdraws Moscow Correct: The Russian army withdrew to Moscow. Problem: Argument is the goal (arg2), not theme (arg1)!

Where We Are  Motivation  Overview  Guidelines and Frame Files  Annotation procedure  Project status

An Example 国会 /Congress 最近 /recently 通过 /pass 了 /ASP 银行法 /banking law “The Congress passed the banking law recently.” 银行法 /banking law 最近 /recently 通过 /pass 了 /ASP “The banking law passed recently.” 火车 /train 正在 /now 通过 /pass 遂道 /tunnel “The train is passing through the tunnel.” 火车 /train 正在 /now 通过 /pass “The train is passing.” Frameset1 Frameset2

Annotation Model VERB FS 0 FS 1 FS 2 …… FS i F 0 F 1 F 2 …… F j Arg 0 Arg 1 Arg 2 …… Arg k 国会 /Congress 通过 /pass 了 /ASP 银行法 /banking law

Where We Are  Motivation  Overview  Guidelines and Frame Files  Annotation procedure  Project status

Annotation Approach  Guidelines: specify how to create frame files and address some general annotation issues, e.g. the annotation of semantic adjuncts.  Frames files: specify how each verb is annotated.

Frame Files  Description of the framesets and subcat frames belonging to each frameset  Description of the set of roles associated with each frameset  Mapping between the syntactic entities and the argument labels, e.g. 法案 /bill 通过 /pass 了 /AS: SUBJECT->arg1, VERB -> REL  Annotated example for each subcat frame.

Defining Framesets (1)  Defining framesets involves characterizing the arguments of a verb in terms of (a) their syntactic realizations (subcat frames) and (b) their “semantic” properties.  Two subcat frames are the same if they have the same type and number of arguments, otherwise they are different  One subcat frame subsumes another if the arguments of the latter is a subset of the former.  All subcat frames that belong to a frameset should either be identical to or subsume one another.

Defining Frameset (2)  Syntactic realizations and semantic properties are expected to coincide most of the time: difference (similarity) in meaning is reflected in difference (similarity) in syntactic realizations (c.f. Levin 1993), e.g. 通过 /pass

Defining Framesets (3)  Framesets are NOT distinguished if a verb has different “senses” that are realized in the same subcat frame or set of subcat frames, e.g. 统一 Sense 1: standardize 分词 /segmentation 标准 /standard 要 /should 统一 /standardize 我们 /we 要 /will 统一 /standardize 分词 /segmentation 标准 /standard Sense 2: reunite 韩国 /Korea 要 /should 统一 /reunite 他们 /they 要 /should 统一 /reunite 韩国 /Korea

Don’t Forget the Adjuncts  Adjuncts are more global, i.e., not specific to individual verbs or a class of verbs.  The adjuncts are tagged as ArgM + functional tags indicating type.  The annotation of the adjuncts are specified in the guidelines.

Functional Tags for Adjuncts ADV: adverbial, default tag BNF: beneficiary CND: condition DIR: direction DGR: degree FRQ: frequency LOC: locative MNR: manner PRP: purpose or reason TMP: temporal TPC: topic

Functional Tags for Arguments and Phrasal Verbs PRD: predicate AS : 为, 是, 作, 做 AT: 在, 于 INTO: 成, 入, 进 ONTO: 上 TO: 到, 至 TOWARDS: 向, 往

An Actual Example 商检 /commercial inspection 部门 /department 最近 /recently 将 /ba 检验 /inspection 时间 /time 由 /from 七 /seven 至 /to 十 /ten 天 /day 缩短 /shorten 到 /to 一 /one 至 /to 三 /three 天 /day. “Commercial inspection department recently shortened the inspection time from 7 ~ 10 days to 1 ~ 3 days.” REL: 缩短 /shorten Arg0 (agent): 商检 /commercial inspection 部门 /department Arg1 (theme): 检验 /inspection 时间 /time Arg2 (range): Arg3 (starting point): 由 /from 七 /seven 至 /to 十 /ten 天 /day Arg4 (end point): 到 /to 一 /one 至 /to 三 /three 天 /day ArgM-TMP: 最近 /recently

Where We Are  Motivation  Overview  Guidelines and Frame Files  Annotation Procedure  Project Status

Annotation Procedure  Automatic Preprocessing: Preliminary results: 30 predicates, 95%(verbs only), 79%(with nominalizations) (Xue and Kulick, HlT’03)  Manual checking Double blind annotation and Adjudication

Extracting Subcat Frames  Traversing a parse tree, picking up constituents of interest, i.e., potential arguments, and generating a template representing the subcat frame.  “Normalizing” special constructions such as ba- and bei-constructions, and verb compounds

Using the Subcat Frames  Tagging the arguments Input: subcat frames, mappings Output: argument labels  Sorting the verbs by subcat frames

Where We Are  Motivation  Overview  Guidelines and Frame Files  Annotation procedure  Project status

Project Status  Guidelines ready. No major revision expected.  About 300 frame files created, at a 40~50 verbs per week.  Automatic tagger ready  Annotation interface ready

What’s Coming  Continued creation of frame files  Double-blind hand-correction  Adjudication