Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.

Slides:

Advertisements

Similar presentations

Latent Variables Naman Agarwal Michael Nute May 1, 2013.

Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Machine Learning on.NET F# FTW!. A few words about me  Mathias Brandewinder  Background: economics, operations research .NET developer.

Unsupervised Learning

Large-Scale Entity-Based Online Social Network Profile Linkage.

Learning for Structured Prediction Overview of the Material TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A.

Machine learning continued Image source:

Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.

Supervised learning Given training examples of inputs and corresponding outputs, produce the “correct” outputs for new inputs Two main scenarios: –Classification:

Personalized Search Result Diversification via Structured Learning

Cross-Language Name Search Raghavendra UdupaMicrosoft Research India Mitesh KhapraIIT Bombay NAACL-HLT 2010 June 3, 2010 Improving the Multilingual User.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Chapter 2: Algorithm Discovery and Design

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Distributed Representations of Sentences and Documents

MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.

Chapter 2: Algorithm Discovery and Design

Part I: Classification and Bayesian Learning

Chapter 2: Algorithm Discovery and Design

Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.

Crash Course on Machine Learning

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.

Active Learning for Class Imbalance Problem

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction Ming-Wei Chang and Scott Wen-tau Yih Microsoft Research 1.

Chapter 2: Algorithm Discovery and Design Invitation to Computer Science, C++ Version, Third Edition.

Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.

The Necessity of Combining Adaptation Methods Cognitive Computation Group, University of Illinois Experimental Results Title Ming-Wei Chang, Michael Connor.

Application of Data Mining Algorithms in Atmospheric Neutrino Analyses with IceCube Tim Ruhe, TU Dortmund.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.

EVALUATING TRANSFER LEARNING APPROACHES FOR IMAGE INFORMATION MINING APPLICATIONS Surya Durbha*, Roger King, Nicolas Younan, *Indian Institute of Technology(IIT),

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

Transfer Learning for Image Classification Group No.: 15 Group member : Feng Cai Sauptik Dhar Sauptik.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.

HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.

Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.

Machine Learning Concept Learning General-to Specific Ordering

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

NTU & MSRA Ming-Feng Tsai

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Unsupervised Streaming Feature Selection in Social Media

More Symbolic Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.

Page 1 July 2008 ICML Workshop on Prior Knowledge for Text and Language Constraints as Prior Knowledge Ming-Wei Chang, Lev Ratinov, Dan Roth Department.

Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.

Tommy Messelis * Stefaan Haspeslagh Burak Bilgin Patrick De Causmaecker Greet Vanden Berghe *

1 Machine Learning in Natural Language More on Discriminative models Dan Roth University of Illinois, Urbana-Champaign

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Lecture 7: Constrained Conditional Models

Semi-Supervised Clustering

Machine Learning overview Chapter 18, 21

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Constrained Clustering -Semi Supervised Clustering-

Introduction to Data Science Lecture 7 Machine Learning Overview

K-means and Hierarchical Clustering

Junheng, Shengming, Yunsheng 11/09/2018

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Presentation transcript:

Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu

What I am going to do today… Goal 1 : Present the transliteration work  Get feedback! Goal 2: Think about this work with CCM  Tutorial ….  I will try to present this work in a slightly different way Some of them are my personal comment Different than our yesterday discussion  Please give us comment about this Make this work more general (not only transliteration)

Wait a sec! What is CCM?

Constraints Driven Learning Why Constraints?  The Goal: Building a good system easily  We have prior knowledge at our hand Why not inject knowledge directly ? How useful are constraints?  Useful for supervised learning [Yih and Roth 04] [many others]  Useful for semi-supervised learning [Chang et.al. ACL 2007]  Some times more efficient than labeling data directly

Unsupervised Constraint Driven Learning In this work  We do not use any label instance  Achieve to good performance that competitive several supervised model Compared to [Chang et.al. ACL 2007]  In ACL 07, they use a small amount of dataset (5-20) Reason: Bad Models can not benefit from constraints!  For some applications, we have very good resource We do not need labeled instances at all!

6 In a nutshell: Traditional semi-supervised learning. Model can drift from the correct one. Model Unlabeled Data Prediction Label unlabeled data Feedback Learn from labeled data Unsupervised Learning Resource 

7 In a nutshell: CODL Use constraints to generate better training samples in unsupervised learning. Prediction+ Constraints Model Unlabeled Data Prediction Feedback More accurate labeling Better Model CODL Improves “Simple” Model Using Expressive Constraints

Outline Constraint Driven Learning (CoDL) Transliteration Discovery Algorithm Experimental Results

Transliteration Generation (Not our focus) Given a Source Transliteration; What is the target transliteration?  Bush  布希  Sushi  壽司 Issues  Ambiguity : For the same source word, many different transliteration Think about Chinese  What we want: find the most widely used transliteration

Transliteration Discovery (Our focus) Problem Settings  Give you two list of words, map them! Advantages  A relatively easy problem  Can find the most widely used transliteration Assumption:  Source: English  Each source entities has a transliteration in the target candidates  Target candidates might not be named entities

Outline Constraint Driven Learning (CoDL) Transliteration Discovery Algorithm Experimental Results

Algorithm Outline Prediction Model How to use existing resource to construct the Model? Constraints? Learning Algorithm

The Prediction Model How do we make prediction?  Given a source word, how to predict the best target ? Model 1 : Vs, Vt  Yes or No  Issue: Not many obvious constraints can be added  Not a structure prediction problem Model 2: Vs, Vt  Hidden variables  Yes or No  Predicting F is a structure prediction algorithm  We can add constraints more easily

The Prediction Model Score for a pair A CCM formulation A slightly different scoring function More on this point in the next few slides Hidden Variables Violation

Prediction Model: Another View The scoring function looks like weight times features! If there is a bad feature, score  - ∞ Our Hidden variable (Feature Vectors):  Character Mapping

Everything (a,a), (o,O), (w,_),……

Algorithm Outline Prediction Model How to use existing resource to construct the Model? Constraints? Learning Algorithm

Resource: Romanization Table Hebrew, Russian  How can you type Hebrew or Russian? Use English Keyboard, C maps to A similar character “C” or “S” in Hebrew or Russian  Very easy to get  Ambiguous Special Case: Chinese (Pin Yin)  壽司  shòu s ī (Low ambiguity)  Map Pin-Yin to English (sushi)  Romanization Table? a  a

Initialize the Table Every character pair in the Romanization Table  Weight = 0  Everything else, -1  Could have better way to do initialization Note: All (v_s,v_t) will get zero without constraints

Algorithm Outline Prediction Model How to use existing resource to construct the Model? Constraints? Learning Algorithm

Constraints General Constraints  Coverage: all character need to be mapped at least once  No crossing: character mappings can not cross each other Language Specific Constraints  General Restricted Mapping  Initial Restricted Mapping  Length Restriction

Constraints Pin-Yin to English Many other works use similar information as well!

Algorithm Outline Prediction Model How to use existing resource to construct the Model? Constraints? Learning Algorithm

High-Level Overview Model  Resource  While Converge Use Model + Constraints to get Labels (for both F, y) Update Model with newly labeled F and y (without Constraints) (details in the next slide) Similar to ACL 07  Update the model without Constraints Difference from ACL 07  We get feedback from the labels of both hidden variables and output

Training Predict hidden variables and the labels Update Algorithm

Outline Constraint Driven Learning (CoDL) Transliteration Discovery Algorithm Experimental Results

Experimental Setting Evaluation  ACC: Top candidate is (one of) the right answer Learning Algorithm  Linear SVM with C = 0.5 Dataset  English-Hebrew 300: 300  English-Chinese 581:681  English-Russian 727:50648 (Target includes all words)

Results - Hebrew

Results - Russian

Analysis A small Russian subset was used here 1) Without Constraints (on features), Romanization Table is useless! 2) General Constraints are more important! 4) Better Constraints Lead to Better Final Results 3) Learning has great impact here! But constraints are very important, too!

Related Works (Need more work here) Learning the score for Edit Distance Previous transliteration works Machine translation?

Conclusion ML: unsupervised constraint driven algorithm  Use hidden variable to find more constraints (e.g. co-ref)  Use constraints to find “cleaner” feature representation Transliteration:  Usage of Normalization Table as the starting point We can get good results without training data  Right constraints (modeling) is the key Future Work  Transliteration Model: Better Model, Quicker Inference  CoDL: Other applications for unsupervised CoDL

33 Constraint - Driven Learning (CODL) =learn(Tr) For N iterations do T=  For each x in unlabeled dataset y  Inference(x, ) T=T  {(x, y)} =  +(1-  )learn(T) Any supervised learning algorithm parametrized by Learn from new training data. Weight supervised and unsupervised model (Nigam2000*). Augmenting the training set (feedback). Any inference algorithm (with constraints). Inference(x,C, )

34 Unsupervised Constraint - Driven Learning =Construct(Resource) For N iterations do T=  For each x in unlabeled dataset y  Inference(x, ) T=T  {(x, y)} =  +(1-  )learn(T) Construct the model with Resources Learn from new training data.  = 0 in this work Augmenting the training set (feedback). Any inference algorithm (with constraints). Inference(x,C, )