Word Translation Disambiguation Using Bilingial Bootsrapping Paper written by Hang Li and Cong Li, Microsoft Research Asia Presented by Sarah Hunter.

Slides:

Advertisements

Similar presentations

Chapter 5: Introduction to Information Retrieval

Advertisements

On-line learning and Boosting

Data Mining and Machine Learning

Data Mining Classification: Alternative Techniques

Machine Translation Course 9 Diana Trandab ă ț Academic year

Automatic Identification of Cognates, False Friends, and Partial Cognates University of Ottawa, Canada University of Ottawa, Canada.

NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.

A Maximum Coherence Model for Dictionary-based Cross-language Information Retrieval Yi Liu, Rong Jin, Joyce Y. Chai Dept. of Computer Science and Engineering.

Module 1 Dictionary skills Part 1

Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.

1 Lecture 35 Brief Introduction to Main AI Areas (cont’d) Overview  Lecture Objective: Present the General Ideas on the AI Branches Below  Introduction.

Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.

Taking the Kitchen Sink Seriously: An Ensemble Approach to Word Sense Disambiguation from Christopher Manning et al.

A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Presented by Iman Sen.

Distributed Representations of Sentences and Documents

Semi-Supervised Natural Language Learning Reading Group I set up a site at: ervised/

Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.

Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.

Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.

Active Learning for Class Imbalance Problem

Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.

1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods Oh-Woog Kwon KLE Lab. CSE POSTECH.

A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,

CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

A Language Independent Method for Question Classification COLING 2004.

1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee

Learning to Link with Wikipedia David Milne and Ian H. Witten Department of Computer Science, University of Waikato CIKM 2008 (Best Paper Award) Presented.

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

Classification Techniques: Bayesian Classification

ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.

For Wednesday No reading Homework –Chapter 23, exercise 15 –Process: 1.Create 5 sentences 2.Select a language 3.Translate each sentence into that language.

Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.

Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.

Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Presented By- Shahina Ferdous, Student ID – , Spring 2010.

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

1 CHUKWUEMEKA DURUAMAKU.  Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Classification Ensemble Methods 1

A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.

Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.

Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

An Adaptive Learning with an Application to Chinese Homophone Disambiguation from Yue-shi Lee International Journal of Computer Processing of Oriental.

University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G

Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

Trees, bagging, boosting, and stacking

Boosting Nearest-Neighbor Classifier for Character Recognition

Statistical NLP: Lecture 9

Special Topics in Text Mining

Supervised vs. unsupervised Learning

Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Statistical NLP : Lecture 9 Word Sense Disambiguation

Presentation transcript:

Word Translation Disambiguation Using Bilingial Bootsrapping Paper written by Hang Li and Cong Li, Microsoft Research Asia Presented by Sarah Hunter

Introduction ● Word translation disambiguation: Word sense disambiguation, dealing with the special case of translation Plant Flora ex: “plant and animal life” Translations: French = “flore” Chinese = “zhiwu” Factory ex: “Nissan car and truck plant” Translations: French = “usine” Chinese = “gongchang”

● Supervised learning? Expensive! ● Better: bootstrapping: Small amount of training data and as new data is classified, use it as training as well ● Monolingual bootstrapping: ● Train with a few English sentences containing an ambiguous word, and labeled with Chinese translation of this word ● Create a classifier, and classify new sentences ● Further train with these ● Bilingual bootstrapping: ● Similar to monolingual, but classifier for each language using classified data in BOTH languages ● how? We'll see!

● One-sense-per-discourse heuristic: when an ambiguous word appears in the same text many times, it usually has the same sense Step 1: Create the classifier context = words surrounding ambiguous word in a sentence for each (ambiguous word){ for each (possible sense){ use the classified data to create a binary classifier with the classes this sense, and not this sense * each class contains: (context, sense, probability) trios }} ● * naïve Bayesian ensemble: a linear combination of for each (word in context) calculate probability of this sense given this word Monolingual Bootstrapping

Step 2: Classify new data context = words surrounding ambiguous word in a sentence aWord = an ambiguous word C= aWord's classified data U= aWord's unclassified data for each (aWord){ for each (context in U){ calculate the most probable word sense given this context if (probability is above a threshold) Store this context and sense } C = C + (context, sense, probability) for the top b probabilities U = U - context }

Bilingual Bootstrapping ● Similar to monolingual bootstrapping ● Adds these extentions: ● repeatedly constructs classifiers in both languages in parallel ● boosts performance of classifiers by exchanging info between the languages

Initially: some classified data

After classifying some new data

The Classifier ● Appropriate Chinese classifications transformed (translated) to English and included in the corresponding English classifier aWord = an ambiguous word E = aWord's classified data for English Ce= Classified data for Chinese words that can be translations of aWord. ie, the links 1 and 2 in the diagram. ● Classifier is similar to monolingual bootstrapping, only Classified data = E + C

Monolingual VS Bilingual Bootstrapping ● Bilingual can always perform better ● Why? Asymmetric relationship (many-to-many mapping) between the ambiguous words in the two languages ● Classes A and D are equivalent ● can transform instances with D and use to boost performance on classification to A ● Misclassified in D ('x's) should be in C, which is not related to A. So little negative effect ● Monolingual can only use instances in A and B. When # of misclassified inscreases, performance will stop improving

Experiment Experimental Settings ● resolves ambiguities on only selectes ambiguous words such as line and interest ● Bilingual uses only pre classified data in English (this is ok, it will share these with the Chinese side) ● Consider two implementatoions of monolingual classifier: 1. using naïve Bayesian ensemble (MD-B) 2. using decision lists (MD-D)

Experiment ● Apply all three implementations on certain words (line, interest) using a benchmark data set (contains mostly Wall Street Journal data). Parts served for training data, others for test data ● Collect words that could be translations from HIT distionary ● For each sense, used their intuition to pick an English word to describe it (seed word) ● View these seed words as a classified “sentence” ● Unclassified data: from the web (news sites). The distribution of senses was roughly balances

Results: ● Used a baseline method (Major): always choose most frequent sense ● BB consistently and significantly out performs all other unsupervised methods ● BB performs well even against supervised methods, and has the additional plus of being unsupersivised and therefore less expensive

Conclusion ● Bilingual bootstrapping is pretty good! ● It has the advantages of being unsupervised, without the usual performance loss ● Future work ● theoretical analysis (ex: generalization error) ● extention to more complicated machine translation tasks