1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Using Query Patterns to Learn the Durations of Events Andrey Gusev joint work with Nate Chambers, Pranav Khaitan, Divye Khilnani, Steven Bethard, Dan Jurafsky.
Layering Semantics (Putting meaning into trees) Treebank Workshop Martha Palmer April 26, 2007.
Multilinugual PennTools that capture parses and predicate-argument structures, and their use in Applications Martha Palmer, Aravind Joshi, Mitch Marcus,
Outline Linguistic Theories of semantic representation  Case Frames – Fillmore – FrameNet  Lexical Conceptual Structure – Jackendoff – LCS  Proto-Roles.
VerbNet Martha Palmer University of Colorado LING 7800/CSCI September 16,
June 6, 20073rd PIRE Meeting1 Tectogrammatical Representation of English in Prague Czech-English Dependency Treebank Lucie Mladová Silvie Cinková, Kristýna.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.
Semantic Role Labeling Abdul-Lateef Yussiff
A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.
10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer University of Pennsylvania October 9, 2001 Columbia University.
PropBanks, 10/30/03 1 Penn Putting Meaning Into Your Trees Martha Palmer Paul Kingsbury, Olga Babko-Malaya, Scott Cotton, Nianwen Xue, Shijong Ryu, Ben.
Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.
The Relevance of a Cognitive Model of the Mental Lexicon to Automatic Word Sense Disambiguation Martha Palmer and Susan Brown University of Colorado August.
石化的 IT 大挑战 洛阳石化工程公司. 石化公司简介 中国石化集团洛阳石油化工工程公司,是国内能源化 工领域集技术专利商与工程承包商于一体的高科技企 业。拥有中国综合设计甲级资质,为国家首批业务涵 盖 21 个行业的工程咨询企业之一,拥有工程总承包、 工程设计、工程监理、工程咨询和环境影响评价等甲.
1 NSF-ULA Sense tagging and Eventive Nouns Martha Palmer, Miriam Eckert, Jena D. Hwang, Susan Windisch Brown, Dmitriy Dligach, Jinho Choi, Nianwen Xue.
Simple Features for Chinese Word Sense Disambiguation Hoa Trang Dang, Ching-yi Chia, Martha Palmer, Fu- Dong Chiou Computer and Information Science University.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
1 Annotation Guidelines for the Penn Discourse Treebank Part B Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber.
PropBank Martha Palmer University of Colorado. Unified Linguistic Annotation: Merging PropBank, NomBank, TimeBank, Penn Discourse Treebank, Coreference,
10/9/01PropBank1 Proposition Bank: a resource of predicate-argument relations Martha Palmer, Dan Gildea, Paul Kingsbury University of Pennsylvania February.
EMPOWER 2 Empirical Methods for Multilingual Processing, ‘Onoring Words, Enabling Rapid Ramp-up Martha Palmer, Aravind Joshi, Mitch Marcus, Mark Liberman,
ELN – Natural Language Processing Giuseppe Attardi
L EXICAL S EMANTICS AND S EMANTIC A NNOTATION CLSW 2011 NTU, Taipei May 4, 2011 James Pustejovsky (with additional slides from: Martha Palmer, Nianwen.
PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.
I remember meeting all of you in Grade 6. Unit 12.
Korean Treebank & Propbank Martha Palmer, Narae Han, Jinyoung Choi, Shijong Ryu University of Pennsylvania May 23, 2005.
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
Language Points (Ⅱ) Grammar in Unit 5. The –ing form as the Adverbial 动词 -ing 作状语 V-ing 作状语时表示的动作是主语动作的一部分,与 谓语表示的动作或状态时同时或几乎同时发生的,或 是先于谓语动词发生,它的逻辑主语与句子的主语一.
Penn 1 Kindle: Knowledge and Inference via Description Logics for Natural Language Dan Roth University of Illinois, Urbana-Champaign Martha Palmer University.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
Semantic Role Labeling: English PropBank
1 Scaling Up Word Sense Disambiguation via Parallel Texts Yee Seng Chan Hwee Tou Ng Department of Computer Science National University of Singapore.
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
Jeopardy ABCDE Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300 Q $400 Q $500 Final Jeopardy.
Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October.
1 Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science August.
《 Professional English for Secretaries 》 Unit 5 Meeting Arrangements Task 2 Meeting Notice 1.
Combining Lexical Resources: Mapping Between PropBank and VerbNet Edward Loper,Szu-ting Yi, Martha Palmer September 2006.
外研版 高一 第二册 Module 1 Cultural Corner I. Read about the health care system in three different countries and answer the question.
NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder.
Jackie Chan began to live in Hong Kong in He has lived in Hong Kong for ______years. He has lived in Hong Kong since______ 从 … 以来.
nd PIRE project workshop1 Tectogrammatical Representation of English Silvie Cinková Lucie Mladová, Anja Nedoluzhko, Jiří Semecký, Jana Šindlerová,
CSE391 – 2005 NLP 1 Events From KRR lecture. CSE391 – 2005 NLP 2 Ask Jeeves – A Q/A, IR ex. What do you call a successful movie? Tips on Being a Successful.
“ 百链 ” 云图书馆. 什么是百链云图书馆?1 百链云图书馆的实际效果?2 百链云图书馆的实现原理?3 百链云图书馆的价值?44 图书馆要做什么?55 提 纲.
ARDA Visit 1 Penn Lexical Semantics at Penn: Proposition Bank and VerbNet Martha Palmer, Dan Gildea, Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Karin.
开放教育学员入学须知 第一部分:浏览山东理工大学远程与继续 教育学院网站浏览山东理工大学远程与继续 教育学院网站 第二部分:浏览中央电大教学平台浏览中央电大教学平台 第三部分:浏览山东电大教学平台浏览山东电大教学平台 第四部分:浏览淄博电大教学平台浏览淄博电大教学平台 第五部分:淄博电大教学平台使用淄博电大教学平台使用.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
NLP. Introduction to NLP Last week, Min broke the window with a hammer. The window was broken with a hammer by Min last week With a hammer, Min broke.
1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.
Multilinugual PennTools that capture parses and predicate-argument structures, for use in Applications Martha Palmer, Aravind Joshi, Mitch Marcus, Mark.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Chinese Proposition Bank Nianwen Xue, Chingyi Chia Scott Cotton, Seth Kulick, Fu-Dong Chiou, Martha Palmer, Mitch Marcus.
八年级英语冀教版上 Lesson 43 制作人:张国凤. Teaching aims 1.Vocabulary: pal, apartment, metre, tongue, noun, verb, American, a pen pal, all morning, the meaning of the.
第二节 财政的基本特征 第二节 财政的基本特征 一、财政分配以政府为主体 二、财政分配一般具有强制性 三、财政分配一般具有无偿性 第一章 财政概论 四、财政分配一般具有非营利性.
一、中国梦与中国特色社会主义道路创新 中国梦即实现中华民族的伟大复兴 中国梦的实现路径是中国道路 中国特色社会主义道路的科学表述 中国特色社会主义道路的形成 是对新中国成 立 60 年成功经 验的总结 是对中国近代 170 年以来历 史经验的总结 是对五千年中 华优秀文化文 化的总结.
Book Three. I. Listening Listen to the tape and complete the dialogue. A: What do you ___ ___ ___ after finishing your school? B: Well, it’s too ______.
G20 主讲人: 戴 喆 浙江交通职业技术学院.
English Proposition Bank: Status Report
Coarse-grained Word Sense Disambiguation
Parsing in Multiple Languages
CS224N Section 3: Corpora, etc.
CS224N Section 3: Project,Corpora
Progress report on Semantic Role Labeling
Presentation transcript:

1 Penn English and Chinese PropBanks Martha Palmer University of Pennsylvania with Olga Babko-Malaya, Nianwen Xue, and Ben Snyder April 14, 2005 Semantic Representation Meeting University of Maryland

2 Penn What is a PropBank?  A PropBank is a corpus annotated with the predicate-argument structure of the verbs:  English Propbank: 3/’04 LDCwww.cis.upenn.edu/~ace Kingsbury and Palmer 2002, Palmer, Gildea, Kingsbury, 2005  Wall Street Journal, 1M words, 120K+ predicate instances  Brown, 14K predicate instances  Chinese Propbank: Xue and Palmer 2003, Xue 2004  Xinhua (250K words – almost done),  Sinorama (250K words – estimated 2007)  Nominalized verbs for English = NomBank/NYU  Chinese NomBank?

3 Penn Capturing “neutral” semantic roles  Boyan broke [ Arg1 the LCD-projector.] break (agent(Boyan), patient(LCD-projector))  [Arg1 The windows] were broken by the hurricane.  [Arg1 The vase] broke into pieces when it toppled over

4 Penn Frames File example: give < 4000 Frames for PropBank Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefs a standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation

5 Penn Frames File example: give w/ Thematic Role Labels Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefs a standing ovation. Arg0: Agent The executives REL: gave Arg2: Recipient the chefs Arg1: Theme a standing ovation VerbNet – based on Levin classes

6 Penn PropBank Exercise Ex.  [He]-Arg1 Theme [will]-MOD [probably]-MOD be [extradited]-rel [to the U.S]-DIR [for trial under an extradition treaty President Virgilia Barco has revived]-PRP.  He will probably be extradited to the U.S for trial under [an extradition treaty]-Arg1 Theme [President Virgilia Barco]-Arg0 Agent has [revived]-rel.

7 Penn A Chinese Treebank Sentence 国会 /Congress 最近 /recently 通过 /pass 了 /ASP 银行法 /banking law “The Congress passed the banking law recently.” (IP (NP-SBJ (NN 国会 /Congress)) (VP (ADVP (ADV 最近 /recently)) (VP (VV 通过 /pass) (AS 了 /ASP) (NP-OBJ (NN 银行法 /banking law)))))

8 Penn The Same Sentence, PropBanked 通过 (f2) (pass) arg0 argM arg1 国会 最近 银行 法 (law) (congress) (IP (NP-SBJ arg0 (NN 国会 )) (VP argM (ADVP (ADV 最近 )) (VP f2 (VV 通过 ) (AS 了 ) arg1 (NP-OBJ (NN 银行法 )))))

9 Penn Annotation procedure  PTB II – Extract all sentences of a verb  Create Frame File for that verb Paul Kingsbury (3400+ lemmas, 4700 framesets,120K predicates)  1 st pass: Automatic tagging Joseph Rosenzweig  2 nd pass: Double blind hand correction by verb Inter-annotator agreement 84% (87% Arg#’s)  3 rd pass: Adjudication Olga Babko-Malaya  4 th pass: Train automatic semantic role labellers Dan Gildea, Sameer Pradhan, Nianwen Xue, Szuting Yi, …. CoNLL-04 shared task, 2004, 2005, ….

10 Penn Propbank Kappa Statistics P(A)P(E)Kappa Role identify Role classify combined Role identification classifying tree nodes as argument vs. non-argument Role classification classifying arguments as Arg1 vs. Arg2 vs ArgM-LOC vs. etc… Kappa = P(A) - P(E) / 1 - P(E)

11 Penn Throughput  Framing: approximately verbs/week  Annotation: approximately 70 instances/hour  Solomonization: approximately 100 instances per hour  100K words (last summer)  ~4 months (hardly any new frame files)  4-6 part-time annotators (100hrs a week),  half-time programmer,  half-time project manager,  half-time adjudicator, frame file creator

12 Penn Applications  IE – slot filling  Question Answering:  What do lobsters like to eat?  Answer is NOT people!  Machine Translation  Reconciling event descriptions across languages - See Parallel Prop II

13 Penn Word Senses in PropBank  Orders to ignore word sense not feasible for 700+ verbs  Mary left the room  Mary left her daughter-in-law her pearls in her will Frameset leave.01 "move away from": Arg0: entity leaving Arg1: place left Frameset leave.02 "give": Arg0: giver Arg1: thing given Arg2: beneficiary How do these relate to traditional word senses in WordNet?

14 Penn Overlap between Senseval2 Groups and Framesets – 95% WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20 Frameset1 Frameset2 develop

15 Penn Sense Hierarchy (Palmer, et al, SNLU04 - NAACL04)  PropBank Framesets – ITA >90% coarse grained distinctions 20 Senseval2 verbs w/ > 1 Frameset Maxent WSD system, 73.5% baseline, 90% accuracy  Sense Groups (Senseval-2) - ITA 82% Intermediate level (includes Levin classes) – 69%  WordNet – ITA 71% fine grained distinctions, 60.2% Tagging w/groups, ITA 89%,

16 Penn PropBank II – English/Chinese (100K) We still need relations between events and entities:  Event ID’s with event coreference  Selective sense tagging  Tagging nominalizations w/ WordNet sense  Grouped WN senses - selected verbs and nouns  Nominal Coreference  not names  Clausal Discourse connectives – selected subset Level of representation that reconciles many surface differences between the languages

17 Penn Event IDs – Parallel Prop II (1)  Aspectual verbs do not receive event IDs:  今年 /this year 中国 /China 继续 /continue 发挥 /play 其 /it 在 /at 支持 /support 外商 /foreign business 投资 /investment 企业 /enterprise 方面 /aspect 的 /DE 主 /main 渠道 /channel 作用 /role “This year, the Bank of China will continue to play the main role in supporting foreign- invested businesses.”

18 Penn Event IDs – Parallel Prop II (2)  Nominalized verbs do:  He will probably be extradited to the US for trial. done as part of sense-tagging (all 7 WN senses for “trial” are events.)  随着 /with 中国 /China 经济 /economy 的 /DE 不断 /continued 发展 /development… “With the continued development of China’s economy…” The same events may be described by verbs in English and nouns in Chinese, or vice versa. Event IDs help to abstract away from POS tag

19 Penn Event reference – Parallel Prop II  Pronouns (overt or covert) that refer to events: [This] is gonna be a word of mouth kind of thing. 这些 /these 成果 /achivements 被 /BEI 企业 /enterprise 用 /apply (e15) 到 /to 生产 /production 上 /on 点石成金 /spin gold from straw , *pro*-e15 大大 /greatly 提高 /improve 了 /le 中 国 /China 镍 /nickel 工业 /industry 的 /DE 生产 /production 水 平 /level 。 “These achievements have been applied (e15) to production by enterprises to spin gold from straw, which-e15 greatly improved the production level of China’s nickel industry.”  Prerequisites:  pronoun classification  free trace annotation

20 Penn Chinese PB II : Sense tagging  Much lower polysemy than English  Avg of 3.5 (Chinese) vs (English) Dang, Chia, Chiou, Palmer, COLING-02  More than 2 Framesets 62/4865 (250K) Ch vs. 294/3635 (1M) English  Mapping Grouped English senses to Chinese (English tagging - 93 verbs/168 nouns, instances)  Selected 12 polysemous English words (7 verbs/5 nouns)  For 9 (6 verbs/3 nouns), grouped English senses map to unique Chinese translation sets (synonyms)

21 Penn Mapping of Grouped Sense Tags to Chinese increase 提高 / ti2gao1 lift, elevate, orient upwards 仰 / yang3 Collect, levy 募集 / mu4ji2 筹措 / chou2cuo4 筹... / chou2… invoke, elicit, set off 提 / ti4 raise – translations by group

22 Penn Discourse connectives: The Penn Discourse TreeBank  WSJ corpus (~1M words, ~2400 texts) Miltsakaki, Prasad, Joshi and Webber, LREC-04, NAACL-04 Frontiers Prasad, Miltsakaki, Joshi and Webber ACL-04 Discourse Annotation  Chinese: 10 explicit discourse connectives that include subordination conjunctions, coordinate conjunctions, and discourse adverbials.  Argument determination, sense disambiguation [arg1 学校 /school 不 /not 教 /teach 理财 /finance management] , [conn 结果 /as a result] [arg2 报章 /newspaper 上 /on 的 /DE 各 /all 种 /kind 专栏 /column 就 /then 成为 /become 信息 /information 的 /DE 主要 /main 来源 /source] 。 “The school does not teach finance management. As a result, the different kinds of columns become the main source of information.”

23 Penn Summary of English PropBanks Olga Babko-Malaya, Ben Snyder GenreWordsFrames Files Frameset Tags ReleasedProp2 Wall Street Journal* (Penn TreeBank II) 1000K< March, 04 English Translation of Chinese TreeBank * 100K<1500Dec, 04Aug, 05 Xinhua News DOD funding 250K< Dec, 04Dec, 05 (100K) Sinorama NSF-ITR funding 150K< 4000July, 05 Sinorama, English corpus NSF-ITR funding 250K<2000Dec, 06 *DOD funding

24 Penn Annotation of free traces  Free traces – traces which are not linked to an antecedent in PropBank  Arbitrary Legislation to lift the debt ceiling is ensnarled in the fight over [*]–ARB cutting capital-gains taxes  Event The department proposed requiring (e4) stronger roofs for light trucks and minivans, [*]-e4 beginning with 1992 models  Imperative All right, [*]-IMP shoot.  1K instances of free traces in a 100K corpus

25 Penn Classification of pronouns  'referring' [John Smith] arrived yesterday. [He] said that...  ‘bound' [Many companies] raised [their] payouts by more than 10%  ‘event‘ [This] is gonna be a word of mouth kind of thing.  ‘generic' I like [books]. [They] make me smile.

26 Penn Mapping of Grouped Sense Tags to Chinese  Zhejiang| 浙江 zhe4jiang1 will| 将 jiang1 raise| 提高 ti2gao1 the level| 水平 shui3ping2 of| 的 de opening up| 开放 kai1fang4 to| 对 dui4 the outside world| 外 wai4. (浙江将提高对外开放的水平。)  I| 我 wo3 raised| 仰 yang3 my| 我的 wo3de head| 头 tou2 in expectation| 期望 qi1wang4. (我仰头望去。)  …, raising| 筹措 chou2cuo4 funds| 资金 zi1jin1 of| 的 de 15 billion|150 亿 yi1ban3wu3shi2yi4 yuan| 元 yuan2 (… 筹措资金 150 亿元。 )  The meeting| 会议 hui4yi4 passed| 通过 tong1guo4 the “decision regarding motions”| 议案 yi4an4 raised| 提 ti4 by 32 NPC| 人大 ren2da4 representatives| 代表 dai4biao3 (会议通过了 32 名人大代表所提的议案。)