Enriching Word Alignment with Linguistic Tags Linguistic Data Consortium, IBM Xuansong Li, Niyu Ge, Stephen Grimes, Stephanie M. Strassel, Kazuaki Maeda.

Slides:



Advertisements
Similar presentations
和你一起游菜博 自动播放 我的故乡是山东寿光,五月中旬, 我们又来到了这块生我养我的地方。 哇!变化太大了,不论是环境还 是人们的生活水平,与发达国家几 呼没有多大差别。 我们参观菜博会,会见家乡亲人, 品尝儿时曾经吃过的菜肴等等。 好香,好开心! 。
Advertisements

news sports weather business travel science ads programs education Columns.
Mid-term Exam Format Section 1: Translate the dictated sentences into English Section 2: Multiple choices Section 3: reorder scrambled sentences Section.
4/14/20051 ACE Annotation Ralph Grishman New York University.
Unit 1 How do you study for a test ?. 重申目标 ( 1 )学习单词 : specific, memorize, grammmar, differently, frustrate, frustrating, quickly, add ( 2 )掌握短语: ask.
Workshop on Treebanks, Rochester NY, April 26, 2007 The Penn Treebank: Lessons Learned and Current Methodology Ann Bies Linguistic Data Consortium, University.
报告人:黄磊 缓冲溶液的积分缓冲容量. 缓冲指数的概念是 Vanslyke 在 1922 年提出 的,意义是当缓冲溶液改变一个单位时需 加入酸碱物质的量 即 这里的缓冲指数指的是微分缓冲容量,是 加酸碱物质的量随着 pH 值的变化率 1 ,微分缓冲容量.
8B Unit 2 Integrated Skills Made by Liu Yang. Have you ever been to…? What can we do there?
Corrective Feedback in Students’ Writing Mushi Li Tufts University.
我 把 車 停 在 旁边.
Have you ever been to Disneyland?. 一般过去时:一般过去时只表示过去的 动作或状态,和现在不发生关系(即动 作或状态在现在已经结束),它可以和 表示过去的时间状语连用。 现在完成时句子通常有 recently , lately , since , for , in.
Unit 1 Remember to look out for the red light!. old radio modern radio digital radio What are these?
我们的课前宣言 I can’t stand my poor English! I want to improve my situation! I want to change my life! I don’t want to let my parents down! I don’t want to let.
Language notes: be proud of … “ 为 …… 而感到自豪 ” We are all proud of Liu Xiang. 4. He has a special chair to sit in. 5. not …any more/anymore “ 不再 ” He.
Using the Sketch Engine for second language learning: an experiment Simon Smith & Alice Chen |
Period 6 根据中文意思完成句子 1. 我没机会和她谈话。 I _____ ___ __________ to talk I _____ ___ __________ to talk with her. with her. 2. 每个人都有优点,我们应该互相习。 Everybody has.
A brief introduction to Li Bai
Enhanced Infrastructure for Creation & Collection of Translation Resources Zhiyi Song, Stephanie Strassel (speaker), Gary Krug, Kazuaki Maeda.
初中基础 (2244 期 ) 4-5 版 Gondola rides in Venice.
What a nice coat ! 桐黄中学 李秀君 raincoat coat overcoat.
Direct & Indirect Speech Command and Request
Unit 4 Amazing things Integrated skills Preview: 1, Read the new words on P , Go through P70-71, then finish the part A1. 3, Read the Speak up.
1 、如果 x + 5 > 4 ,那么两边都 可得 x >- 1 2 、在- 3y >- 4 的两边都乘以 7 可得 3 、在不等式 — x≤5 的两边都乘以- 1 可得 4 、将- 7x — 6 < 8 移项可得 。 5 、将 5 + a >- 2 a 移项可得 。 6 、将- 8x < 0.
Unit 6 I like music that I can dance to. Section A 2.
We’ve learnt about groups of people who need our help in this unit. Can you tell me who these people are and how we can help them?
Linguistic Resources for the 2013 TAC KBP Entity Linking Evaluation Joe Ellis (presenter), Justin Mott, Xuansong Li, Jeremy Getman, Jonathan Wright, Stephanie.
Unit 2 We wouldn’t know what to do. Work in pairs. Think about your daily routine. Write sentences which describe every day what you dowhat your parents.
Notes to the Text Experiencing English 1 Passage A So Much to Learn  Listen to paragraph 1.  Listen to paragraph 2.  Listen to paragraph 3.  Listen.
第七单元试题 Work hard,you will succeed.. I. 根据首字母提示补全单词 1. I am making a r_____ for my classroom. 2. Mary Green has written many p______ books on gardening.
Complete the sentences. 1. 她已经去火车站接她朋友了。 She ___ ____ to the train station to meet her friend. 2. 谢谢你所做的一切。 Thanks for all that you ____ ____. has gone.
Lesson 17 Who will buy it?. 教学过程 ( 一 ) 检查与导入  Think about the two questions : 1.Do you always carry money in your pocket ? 2.What would you do if you.
Unit One Integrated skills. I am a badminton fan. Lindan is my favorite badminton player. He plays for China National Team. If Lindan wins the game, I.
Unit 2 We wouldn’t know what to do. Unit 2 We wouldn’t know what to do.
Section B Period One Review Words and expressions.
Welcome to Class13 Grade 9 上派初中 梁昌平. Unit 2 We all own English. Module 7 English for you and me.
Unit2 What’s the matter, Mike? lesson4. How does she feel? She feels happy.
volcano eruption 火山爆发 What can be produced in a volcano eruption? lava n. 熔岩 ; 岩浆.
baseball bat ping-pong ball bat soccer ball.
What to do to practise writing? 石嘴山市师资培训中心 周淑英. Learn and Write.
翻译。 1. 钓鱼真有趣。 It is interesting to go fishing. 2. 对我来说帮助他是必须的。 It is necessary for me to help him. 3. 他真聪明,解决了这个难题。 It is clever of him to solve the.
Customer Lao She’s Teahouse At Lao She Teahouse , customers can drink tea and eat delicious Beijing food.
1 Mandarin Chinese Ab Initio NPCR I. Lesson 13 Your Name: _________.
Say and spell What other jobs do you know? He works in a restaurant, too. He can cook delicious food. What does he do? He is a cook.
Put the sentences into English. 1. 为了有更多的空闲时间看电视,我想买台机器人。 2. 这台电脑有点问题。 3. 父母对我的学习很满意。 4. 他晚到了,结果没买到票。 5. 计算机已经在很多方面改变了人们的生活。 In order to have more free.
It takes place in a teahouse. Unit 2 Module 10 Lao she’s Teahouse.
Section B(1a-2c). A: What is your favorite subject? B: My favorite subject is _____. A: Why do you like…? B: Because it’s…
Unit 8 Have you read Treasure Island yet? 1. What do you think ______ this dress? Do you think it looks good on me? 2. The little boy was so hungry that.
平平 13 、平 平 搭 积 木 dā jī 教师:王婧婷 翘舌音: zhè zhù 这 住 后鼻音: pín ɡ 平 三拼音: jiān 间 轻声: ne ɑ 呢 啊 爸爸 妈妈 爷爷 奶奶.
七年级下册 Unit 3 Language in use.. — How soon will you be back? — In a month. Revision.
1. What does Millie tell Daniel to buy for Simon? Simon? 2. Does Simon have a yo-yo? 3. What can Daniel buy for Simon? A football. No, he doesn’t. A yo-yo.
What kind of linking words are the underlined words in the text? When, because, so that, first, also, but, finally, such as, as addition contrast example.
Present Perfect Tense. Present perfect tense Present perfect tense 现在完成时  构成 :  用法一: 表示过去发生的某一动作 对现在造成的影响或产生 的结果. 现在完成时这一时 态强调是过去动作与现在 的联系, 也就是强调现在的.
人教版(新目标)( 2012 教材)初中八下 Unit 8 Have you read Treasure Island yet? Self Check.
Using the Sketch Engine for second language learning: an experiment Simon Smith & Alice Chen |
Welcome to the unit 上冈实验初级中学 沈正西. How do you celebrate your birthday?
Welcome to our class.
Function and grammar. Function Talk about obligation or lack of obligation.
Revision for Unit 1 & 2. undertake observe match predict seek disable inform reflect injure switch ignore tolerate retire complete disappoint update present.
A: Have you got a map? B: Yes, I have. I have got a map.
Lesson Thirty-one The Thirty-first Lesson 张家口市第二十中学 授课:耿利丽.
Unit 1 Will people have robots? Section A (1a-2c)
Traffic signs What does it mean? Turn left Turn rightGo straight No left turnNo right turnParking.
人教课标版 高一必修 2 Unit 1 Cultural relics Reading and writing.
Lesson 19 The Zoo Is Open! Su Wenya. petsquirrelgoose duckgorilla zebra.
The Internet makes the world smaller.
Unit 3 Finding your way.
Describing the past: Add 了 le either to the end
Chinese sight words translation &pinyin
Prof. Adam Meyers: Proteus Project
Unit 1 This is me! Integrated Skills.
Presentation transcript:

Enriching Word Alignment with Linguistic Tags Linguistic Data Consortium, IBM Xuansong Li, Niyu Ge, Stephen Grimes, Stephanie M. Strassel, Kazuaki Maeda {xuansong, sgrimes, strassel,

Outline  Motivations  Approaches and methodologies  Linguistic tags  Inter-annotator agreement  Conclusions

Motivations  To improve automatic word alignment quality  To reduce data amount needed for statistic models  Supervised models outperform traditional models  A part of GALE by DARPA: manually aligned and tagged data – Chinese-English WA

Unified Annotation Scheme alignment framework tagging framework minimum translation units linguistic tags attachment approach minimum match approach

Minimum Match Approach Minimum translation units: atomic 我 买 鲜 花 。 I buy fresh flowers. One to One Happy Many to One 快 乐 春 节 Many to Many Chinese New Year

Attachment Approach Unattach sentence-level/discourse- level unaligned words 我们也 没有想去伤害他 We didn’t want to hurt him Attach phrase-level unaligned words 他 带 了书他 带 了书 He brought the books unaligned attached unattached unaligned -- for unaligned words

Tagging Framework - Tag unaligned words - Tag aligned links Methodologies: using linguistic tags Goal: tackle insertion/deletion problems Tags for unattached words (2 types) Tags for attached words(12 types) Specific-feature links: Chinese-DE 的 (3) Context-free links (2) Context-dependent links (3)

Context-free Links 在 at 于 Links Function onTaihang Mountain Links Semantic 学 校 school 太 行 山太 行 山 …屹 立…屹 立 …standing tall

grammatically inferred link contextually inferred link 把这项成果变成 … turn this success into… 欢 迎 收 看 CCTV Welcome to CCTV Context-dependent Links

Specific Links: 的 (DE) 经 历 过 战 争 的 人 those who have experienced wars 新 技 术 的 实 质 the essence of the new technology 将 军 的 高 度 警 惕 great attention from the general DE-clause DE-modifier DE-possessive

Aligned Word Tags Omni-func-prepositionTense/Passive PossessiveMeasure word Clause markerRhetorical Sentence markerCo-reference DeterminerTO-infinitive DE-modifierLocal context Context-obligatory Non-context-obligatory & Unaligned

Examples: Word Tags Word TagExamples Possessivethe head of the branch Measure-word 一根 (one) 柱子 (pillar) [one pillar] Tense/Passive 提交 (submit) 的报告 (report) [report submitted] Context- obligatory 不 (not) 好 (easy) 掌握 (control), 凭 (by) 经验 (experience) [It is not easy to control, you do by experience] Non-context- obligatory 他 (he) 都已经 (already) 签 (sign) 合 同了 (contract) [He already signed a contract]

Inter-Annotator Agreement(1) Chinese-English Alignment Data Source Char- Count PrecisionRecallF-score NW %95.7%96.5% NW %96.2%95.7% NW %91.2%90.8% NW %92.6%91.2%

Inter-Annotator Agreement(2) Chinese-English Tagging Data Source Chi. Char Eng. Word Link Count Same Tag Agree NW % NW %

Conclusion  Unified annotation scheme  Manually aligned and tagged corpora at LDC  Annotation guidelines available at:  Annotation toolkit available soon  On-going project: more data in pipeline  Acknowledgements to GALE of DARPA

Thank You!

Chinese-English Aligned and Tagged Corpora at LDC GenreFileCharSegment Newswire Broadcast News Broadcast Conversation Weblog Total

Annotation Rate  First pass alignment: 10,000w/10h  Second pass alignment: 10,000w/6h  First pass tagging: 10,000w/7h  Second pass tagging: 10,000w/5h Average skill, speed and difficulty level