BILINGUAL CO-TRAINING FOR MONOLINGUAL HYPONYMY-RELATION ACQUISITION Jong-Hoon Oh, Kiyotaka Uchimoto, Kentaro Torisawa ACL 2009.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Automatic Identification of Cognates, False Friends, and Partial Cognates University of Ottawa, Canada University of Ottawa, Canada.
ISBN Chapter 3 Describing Syntax and Semantics.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
SVM Active Learning with Application to Image Retrieval
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Active Learning with Support Vector Machines
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Describing Syntax and Semantics
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Text Classification With Labeled and Unlabeled Data Presenter: Aleksandar Milisic Supervisor: Dr. David Albrecht.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
A Graph-based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields Yotaro Watanabe, Masayuki Asahara and Yuji Matsumoto.
Aiding WSD by exploiting hypo/hypernymy relations in a restricted framework MEANING project Experiment 6.H(d) Luis Villarejo and Lluís M à rquez.
Kuang Ru; Jinan Xu; Yujie Zhang; Peihao Wu Beijing Jiaotong University
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Active Learning for Class Imbalance Problem
Carmen Banea, Rada Mihalcea University of North Texas A Bootstrapping Method for Building Subjectivity Lexicons for Languages.
Presented by Tienwei Tsai July, 2005
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Named Entity Recognition based on Bilingual Co-training Li Yegang School of Computer, BIT.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Universit at Dortmund, LS VIII
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Practical Issues with SVM. Handwritten Digits:
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.
Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
Hendrik J Groenewald Centre for Text Technology (CTexT™) Research Unit: Languages and Literature in the South African Context North-West University, Potchefstroom.
A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Matwin Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Weakly Supervised Training For Parsing Mandarin Broadcast Transcripts Wen Wang ICASSP 2008 Min-Hsuan Lai Department of Computer Science & Information Engineering.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC
Machine Learning Concept Learning General-to Specific Ordering
Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context Pairs SHONOSUKE ISHIWATARI NOBUHIRO KAJI NAOKI YOSHINAGA.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
Learning Extraction Patterns for Subjective Expressions 2007/10/09 DataMining Lab 안민영.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Classification using Co-Training
Matching References to Headers in PDF Papers Tan Yee Fan 2007 December 19 WING Group Meeting.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
LACONEC A Large-scale Multilingual Semantics-based Dictionary
Machine Learning Week 1.
PEBL: Web Page Classification without Negative Examples
Using Uneven Margins SVM and Perceptron for IE
Machine Learning with Clinical Data
Hierarchical, Perceptron-like Learning for OBIE
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

BILINGUAL CO-TRAINING FOR MONOLINGUAL HYPONYMY-RELATION ACQUISITION Jong-Hoon Oh, Kiyotaka Uchimoto, Kentaro Torisawa ACL 2009

Outline  Goal  Motivation  Co-Training Concept  Task  Co-Training Algorithm  System Architecture  Experiment  Conclusion

Goal  Bilingual Co-training to improve monolingual semantic knowledge.  Hyponym ~ word whose semantic knowledge is contained in another word (is-a relationship).  Prior work hacking Wikipedia to acquire hyponymy relationship (2008, Sumida, Torisawa; Hacking Wikipedia for Hyponymy Relationship Acquisition).  Need to use a data set that has been manually labeled.

Motivation  Developing high-level NLP applications requires accurate semantic knowledge.  The relation can be seen as a classification task of semantic relationship.  This technique can inexpensively increase our semantic knowledge base.  Learning settings vary from language to language and the reliable data in one set may be unreliable in another set, so we can fix the unreliable data with the reliable data.

Bilingual Co-Training Concept

Task  Hyponym-relation acquisition from Wikipedia.  In Sumida 2008, original approach was recognize relation between words like 酵素 and 加水分解酵 素, but this cannot account for the English version enzyme and hydrolase because they do not share a sub-string.  Solution: borrow this data from Japanese and add it to English.  Continually swap back and forth to increase data.

Should we use Machine Translation?  NO!  Since we are only dealing with nouns we can simply use dictionary look up. Consistent results were achieved without Machine Translation.

Co-Training 1/5  S and T are two different languages.  CL is binary classification result {yes, no}.  X = X S U X T set of instances in languages S and T to be classified.  Classifier c assigns class label cl in CL and a confidence value r in R + for the assignment of the label ~ c(x) = (x, cl, r).  Use Support Vector Machine (SVM), the distance between a sample and the hyperplane determined by the SVMS are used as the confidence value r.

Co-training 2/5  L is subset of Cartesian Product of X and CL.  Classifier c trained by training data L, then c = LEARN(L).  S and T are manually prepared by LS and LT.  Bilingual dictionary D BI is translation pair of instances in X S and X T. DBI = {(s,t)} subset of cartesian product of X S and X T.  (s=(enzyme, hydrolase), t=( 酵素, 加水分解酵素 ))

Co-training 3/5  c 0 S and c 0 T are leaned with manually labeled instances L S and L T.  c i S c i T are applied to classify instances in X S and S T.  CR i S set of classification result of c i S on instances X S that is not in L i S and is registered in D BI.  Select from CR i S newly labeled instances to be added to a new training set T.  TopN(CR i S ) is a set of ciS(x), whose r S is top-N highest in CR i S.  c i S acts as teacher and c i T as student.

Co-training 4/5  The teacher instructs the student in the class label x T, which is a translation of x S through D BI, through cl S only if he has a certain level of confidence, rS > threshold, and r T < theta or cl S = cl T (avoid possibility that student has confidence but disagrees with the teacher).  Then roles are reversed.  Co-training is based on different features of the same instances, and this case they are divided by languages.

Co-Training 5/5

System Architecture 1/4

System Architecture - Candidate Extraction 2/4  Every English and Japanese Article constructed as:  Item: Subsection: List items  (Tiger, Siberian Tiger) is a Hyponym relation

System Architecture - Hyponymy- Relation Classification 3/4  hyper is a hyponym candidate.  hypo is hyper’s candidate.  (hyper, hypo) is the hyponym-relation candidate.

System Architecture - Bilingual Instance Dictionary Construction 4/4  Multi-lingual wikipedia articles are linked by cross- language links.  English and Japanese articles are extracted and their titles are regarded as translation pairs.  Use these pairs to build a dictionary.

Experiment 1/3  May 2008 English Wikipedia and June 2008 Japanese Wikipedia.  Use 24,000 randomly selected hyponym candidates.  8,000 relations were found in the manually checked data for both languages.  Use TinySVM ~ 100 iterations, Threshold = 1, and TopN = 900.

Experiment 2/3  Three experiments to show effects of bilingual co- training, training data size, and bilingual instance dictionary.  SYT = Sumida 2008, INIT = based on initial classifier with new training data, TRAN = based on the classifier, BICO = Bilingual co-training.

Experiment 3/3  Can we always improve performance through bilingual co-training with one strong and one weak classifier?  Use training data (20,000) for a strong classifier, and for the other language various weak classifiers (1,000; 5,000; 10,000; 15,000)

Conclusion  BICO showed 3.6% – 10. 9% improvement in F1.  Can help reduce the cost of preparing new training data in other languages.  Can be useful for any weak set if a strong set exists.