Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov. 17-19, 2013 Discriminative Latent Variable Based Classifier.

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

Latent Variables Naman Agarwal Michael Nute May 1, 2013.

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Guiding Semi- Supervision with Constraint-Driven Learning Ming-Wei Chang,Lev Ratinow, Dan Roth.

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Large-Scale Entity-Based Online Social Network Profile Linkage.

Learning for Structured Prediction Overview of the Material TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A.

Support Vector Machines

Partitioned Logistic Regression for Spam Filtering Ming-wei Chang University of Illinois at Urbana-Champaign Wen-tau Yih and Christopher Meek Microsoft.

Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.

Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.

Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.

Boosting Applied to Tagging and PP Attachment By Aviad Barzilai.

Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.

Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, Bing Qin

Scalable Text Mining with Sparse Generative Models

Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.

Introduction to Machine Learning Approach Lecture 5.

Review: Probability Random variables, events Axioms of probability

Large-Scale Cost-sensitive Online Social Network Profile Linkage.

Real-Time Odor Classification Through Sequential Bayesian Filtering Javier G. Monroy Javier Gonzalez-Jimenez

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal.

Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.

2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.

Named Entity Recognition based on Bilingual Co-training Li Yegang School of Computer, BIT.

ACBiMA: Advanced Chinese Bi-Character Word Morphological Analyzer 1 Ting-Hao (Kenneth) Huang Yun-Nung (Vivian) Chen Lingpeng Kong

Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad

A Language Independent Method for Question Classification COLING 2004.

The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

BING: Binarized Normed Gradients for Objectness Estimation at 300fps

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.

Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪

Conditional Random Fields for ASR Jeremy Morris July 25, 2006.

Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Hedge Detection with Latent Features SU Qi CLSW2013, Zhengzhou, Henan May 12, 2013.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

Detecting New a Priori Probabilities of Data Using Supervised Learning Karpov Nikolay Associate professor NRU Higher School of Economics.

Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.

GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.

Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.

HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.

Data Mining and Decision Support

Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Zhaoxia Fu, Yan Han Measurement Volume 45, Issue 4, May 2012, Pages 650–655 Reporter: Jing-Siang, Chen.

Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.

Topic Modeling for Short Texts with Auxiliary Word Embeddings

Statistical Machine Translation Part II: Word Alignments and EM

Online Multiscale Dynamic Topic Models

Conditional Random Fields for ASR

Joint Training for Pivot-based Neural Machine Translation

Outline Background Motivation Proposed Model Experimental Results

Presentation transcript:

Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier for Translation Error Detection

Outline 1.Introduction 2. DPLVM for Translation Error Detection 3. Experiments and Analysis 4. Conclusions and Future Work

1. Introduction Problem 1.In localization industry, human is always involved in post-editing the MT results; 2. MT errors always increase human cost to obtain a reasonable translation; 3. Translation error detection or word confidence estimation can improve working efficiency of post- editors in some extent. 1.In localization industry, human is always involved in post-editing the MT results; 2. MT errors always increase human cost to obtain a reasonable translation; 3. Translation error detection or word confidence estimation can improve working efficiency of post- editors in some extent. Research Question: how to improve the detection accuracy of detecting translation errors?

Blatz et al. combined the neural network and a naive Bayes classifier 2004 Ueffing and Ney exhaustively explored various kinds of WPP features 2003/20 07 Specia et al. worked on confidence estimation in CAT field 2009/20 11 Xiong et al. used a MaxEnt-based classifier to predict translation errors Introduction Related Work

For same feature set, different classifiers show different performance, thus how to select/design a proper classifier is important Classifiers For a classifier, different features reflect different characteristics of problem, how to select/design a feature set is crucial Features 1. Introduction Key Factors

Title in here Feature set Title in here Comparison with SVM and MaxEnt Title in here Discriminative Latent Variable classifier 1. Introduction Our Work

2. DPLVM Algorithm  Conditions: a sequence of observations x = {x 1, x 2,…, x m } a sequence of labels y = {y 1, y 2,…, y m }  Assumption: a sequence of latent variables h = {h 1, h 2,…, h m }  Goal: to learn a mapping between x and y  Definition: (1)

Simplified Algorithm  Assumptions: the model is restricted to have disjoint sets of latent variables associated with each class label; h j H y j y j Each h j is a member in a set H y j of possible latent variables for the class label y j ; so sequences which have any will by definition have Equation (1) can be re-written as: where (2) (3)

Parameter Estimation  Decoding for test set:  Decoding algorithm: Sun and Tsujii (2009): a latent-dynamic inference (LDI) method based on A* search and dynamic programming;

DPLVM in Translation Error Detection Task  Prerequisites: Types of errors can be classified; Each class has a specific label; The classification task can be regarded as a labelling task;  2 Classes of word label C: correct Good words  label: c I: incorrect Bad words  label: i

Feature Set Word Posterior Probabilities Fixed position based WPP Flexible position based WPP Word alignment based WPP Lexical Features Part of speech (POS) word entity Syntactic Features word links from LG parser

Feature Representation

3. Experiments and Analysis Experimental Settings – SMT system Language pair: Chinese-English Training set: NIST data set,3.4m Devset: NIST MT 2006 current set Testset: NIST MT 2005,2008 sets SMT Performance

Experimental Settings for Error Detection Task Devset: translations of NIST MT-08 Testset: translations of NIST MT-05 Annotation: TER to determine the true labels for words, 37.99% ratio of correct words for MT-08, 41.59% RCW for MT-05 Data Set and Data Annotation Evaluation Metrics

Comparison (1) Classification Experiments based on Individual Features

(2) Classification Experiment on Combined Features

Observations The name entities are prone to be wrongly classified The prepositions, conjunctions, auxiliary verbs and articles are easier to be wrongly classified The proportion of the notional words that are wrongly classified is relatively small

4. Conclusions and Future Work Conclusions Presents a new classifier - DPLVM-based classifier -for translation error detection Introduces three different kinds of WPP features, three linguistic features Compares the MaxEnt classifier, SVM classifier and our DPLVM classifier The proposed classifier performs best compared to two other individual classifiers in terms of CER

introducing paraphrases to annotate the hypotheses introducing new useful features to further improve the detection capability performing experiments on more language pairs to verify our proposed method. 4. Conclusions and Future Work Future Work

Thanks for your attention!