Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Discriminative Training in Speech Processing Filipp Korkmazsky LORIA.

Retrieval Evaluation J. H. Wang Mar. 18, Outline Chap. 3, Retrieval Evaluation –Retrieval Performance Evaluation –Reference Collections.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.

Machine Learning & Data Mining CS/CNS/EE 155 Lecture 2: Review Part 2.

Bayesian Decision Theory

ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.

Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung.

Visual Recognition Tutorial

Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies.

Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.

Classification and risk prediction

Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.

A Probabilistic Model for Classification of Multiple-Record Web Documents June Tang Yiu-Kai Ng.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Learning From Data Chichang Jou Tamkang University.

Minimum Classification Error Networks Based on book chapter 9, by Shigeru Katagiri Jaakko Peltonen, 28 th February, 2002.

Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.

1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.

Sample-Separation-Margin Based Minimum Classification Error Training of Pattern Classifiers with Quadratic Discriminant Functions Yongqiang Wang 1,2, Qiang.

Minimum Error Rate Training in Statistical Machine Translation By: Franz Och, 2003 Presented By: Anna Tinnemore, 2006.

Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,

Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Collaborative Filtering Matrix Factorization Approach

Evaluating Classifiers

Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.

Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School.

1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.

Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:

Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.

Outline Classification Linear classifiers Perceptron Multi-class classification Generative approach Naïve Bayes classifier 2.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Discriminative Training and Acoustic Modeling for Automatic Speech Recognition - Chap. 4 Discriminative Training Wolfgang Macherey Von der Fakult¨at f¨ur.

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

Linear Discrimination Reading: Chapter 2 of textbook.

Non-Bayes classifiers. Linear discriminants, neural networks.

Chapter 23: Probabilistic Language Models April 13, 2004.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Considering Cost Asymmetry in Learning Classifiers Presented by Chunping Wang Machine Learning Group, Duke University May 21, 2007 by Bach, Heckerman and.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

I-SMOOTH FOR IMPROVED MINIMUM CLASSIFICATION ERROR TRAINING Haozheng Li, Cosmin Munteanu Pei-ning Chen Department of Computer Science & Information Engineering.

Ch 5b: Discriminative Training (temporal model) Ilkka Aho.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.

Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.

Logistic Regression William Cohen.

1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.

NTU & MSRA Ming-Feng Tsai

Minimum Rank Error Training for Language Modeling Meng-Sung Wu Department of Computer Science and Information Engineering National Cheng Kung University,

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.

This whole paper is about...

Machine Learning – Classification David Fenyő

LECTURE 05: THRESHOLD DECODING

Collaborative Filtering Matrix Factorization Approach

Open-Category Classification by Adversarial Sample Generation

Lecture 15: Data Cleaning for ML

Generally Discriminant Analysis

Roc curves By Vittoria Cozza, matr

Learning to Rank with Ties

Learning to Rank using Language Models and SVMs

Logistic Regression Geoff Hulten.

Presentation transcript:

Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling

Introduction Language model for information retrieval Minimum rank error model Experiments Conclusion Outline

the language model is useful for investigating the linguistic regularities in queries and documents for information retrieval But the accuracy of classifying queries into the relevant documents is not concerned with the ranks of the retrieved documents MCE training is also used in IR. In the MCE procedure, the expected loss function is minimized with probabilistic descent algorithm for optimal Bayes risk Introduction

With MCE, the rate of misclassification is reduced. But rank result is still not consist with the performance measure, i.e. AP The minimum rank error (MRE) language model is established by a gradient descent algorithm to obtain discriminative retrieval for training queries with minimum expected rank error loss Introduction

Language model for information retrieval

the document terms are often too few to train reliable ML model. Many words are unseen in the document, leading to zero probabilities in many n-gram events the smoothed language model is obtained by linear interpolation of the document and background models Language model for information retrieval

MCE is a training method based on Bayes decision theory. This method can reduce misclassification by minimize the expect loss with three step procedure First, a misclassification measure is defined Second, the misclassification measure is normalized as the classification error loss function ranging between 0 and 1 by the sigmoid function given as follows Minimum Classification Error Model

Receiver Operating Characteristic (ROC) is one kind of measure which consider the true positive rate and false positive rate; Area Under ROC Curve(AUC) gives a value for the ROC curve Information Retrieval Measures

Average Precision (AP) Versus Rank Error Minimum Rank Error (MRE) Model Implementation and Interpretation Minimum rank error model

The information retrieval model can be estimated by optimizing the AP, but the minimization of the expected AP loss function is mathematically intractable So we develop the rank error loss function instead of the classification error loss Average Precision (AP) Versus Rank Error

Minimum Rank Error (MRE) Model

The rank error loss function is calculated by substituting the misranking measure into sigmoid function. And the expect rank error is calculated over the entire training set including all query and their relevant documents Minimum Rank Error (MRE) Model

The document model is iteratively updated by the descent algorithm Considering a logarithm bigram in document model, the differentials are calculated by Minimum Rank Error (MRE) Model

The figure below shows the procedure of MRE language model training for information retrieval Implementation and Interpretation

MRE and MCE are derived as the discriminative learning algorithms from the same Bayes decision theory, but they are different by two aspects In performance metrics –MRE minimizes the Bayes rank risk based on the rank error loss function –MCE minimizes the Bayes risk due to classification errors In use of training data –The MRE model uses queries and their corresponding document lists as training samples –MCE considers all irrelevant documents in a rank list Implementation and Interpretation

Experiments

Most classification systems are based on minimization of classification errors, and thus do not reflect the ranking performance of retrieval systems This paper focuses on the ranking problem, and presents a new discriminative retrieval model. The experiment results also shows MRE retrieves more relevant documents with high ranks than MCE Conclusion