Taking the Kitchen Sink Seriously: An Ensemble Approach to Word Sense Disambiguation from Christopher Manning et al.

Slides:

Advertisements

Similar presentations

Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.

Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 13.

What is Statistical Modeling

Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.

Final review LING572 Fei Xia Week 10: 03/13/08 1.

Assuming normally distributed data! Naïve Bayes Classifier.

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

Ensemble Learning what is an ensemble? why use an ensemble?

2D1431 Machine Learning Boosting.

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

Ensemble Learning: An Introduction

1 Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to S ENSEVAL -3 Saif Mohammad Ted Pedersen University of Toronto, Toronto.

Modeling Consensus: Classifier Combination for WSD Authors: Radu Florian and David Yarowsky Presenter: Marian Olteanu.

Distributional Clustering of English Words Fernando Pereira- AT&T Bell Laboratories, 600 Naftali Tishby- Dept. of Computer Science, Hebrew University Lillian.

Semi-Supervised Learning

Final review LING572 Fei Xia Week 10: 03/11/

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.

Towards Improving Classification of Real World Biomedical Articles Kostas Fragos TEI of Athens Christos Skourlas TEI of Athens

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Chapter 9 – Classification and Regression Trees

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

Computational Intelligence: Methods and Applications Lecture 30 Neurofuzzy system FSM and covering algorithms. Włodzisław Duch Dept. of Informatics, UMK.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee

Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB

Today Ensemble Methods. Recap of the course. Classifier Fusion

Ensemble Methods: Bagging and Boosting

CLASSIFICATION: Ensemble Methods

Classification Techniques: Bayesian Classification

Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.

Non-Bayes classifiers. Linear discriminants, neural networks.

1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.

Conditional Markov Models: MaxEnt Tagging and MEMMs

1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.

Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Second Language Learning From News Websites Word Sense Disambiguation using Word Embeddings.

Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819

Lazy Bayesian Rules: A Lazy Semi-Naïve Bayesian Learning Technique Competitive to Boosting Decision Trees Zijian Zheng, Geoffrey I. Webb, Kai Ming Ting.

CS Ensembles and Bayes1 Ensembles, Model Combination and Bayesian Combination.

Feasibility of Using Machine Learning Algorithms to Determine Future Price Points of Stocks By: Alexander Dumont.

Proximity based one-class classification with Common N-Gram dissimilarity for authorship verification task Magdalena Jankowska, Vlado Kešelj and Evangelos.

UCSpv: Principled Voting in UCS Rule Populations Gavin Brown, Tim Kovacs, James Marshall.

Graph-based WSD の続き DMLA /7/10 小町守.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Ensemble Classifiers.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Machine Learning: Ensemble Methods

Hierarchical Clustering: Time and Space requirements

Reading: R. Schapire, A brief introduction to boosting

Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :

Boosted Augmented Naive Bayes. Efficient discriminative learning of

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Lecture 21 Computational Lexical Semantics

Category-Based Pseudowords

Statistical NLP: Lecture 9

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

Sofia Pediaditaki and Mahesh Marina University of Edinburgh

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Statistical NLP : Lecture 9 Word Sense Disambiguation

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

Taking the Kitchen Sink Seriously: An Ensemble Approach to Word Sense Disambiguation from Christopher Manning et al.

Overview ● 23 student WSD projects combined in a 2-layer voting scheme (an ensemble of ensemble classifiers). ● Performed well on SENSEVAL-2: 4 th place out of 21 supervised systems on the English Lexical Sample task. ● Offers some valuable lessons for both WSD and ensemble methods in general.

System Overview ● 23 different "1 st order" classifiers. – Independently developed WSD systems. – Use a variety of algorithms (naïve bayes, n-gram, etc.). ● These 1 st order classifiers combined into a variety of 2 nd order classifiers/voting mechanisms. – 2 nd order classifiers vary with respect to: ● Algorithms used to combine 1 st order classifiers. ● Number of voters. Each takes the top k 1 st order, where k is one of {1,3,5,7,9,11,13,15}.

Voting Algorithms ● Majority vote (each vote has weight 1). ● Weighted voting, with weights determined by EM. – Tries to choose weights that maximize the likelihood of 2 nd order training instances, where the probability of a sense (given the votes) is defined as the sum of weighted votes for that sense. ● Maximum entropy using features derived from the votes of the 1 st order classifiers.

Classifier Construction Process ● For each word: – Train each 1 st order on ¾ of training data – Use remaining ¼ of data to rank performance of 1 st orders – For each 2 nd order classifier: ● Take the top k 1 st orders for this word ● Train the 2 nd order on ¾ of training data using this ensemble – Rank performance of 2 nd orders with ¼ of training data – Take the top 2 nd order as the classifier for this word. Retrain on all the training data.

Results ● 61.7% accuracy in SENSEVAL-2 competition (4 th place). ● After competition, improved performance: – Used global performance (i.e., over all words) as a tie breaker for rankings of both 1 st and 2 nd order. – Improved accuracy to 63.9% (would have been 2 nd ).

Results for 2 nd Order Classifiers ● Results are averaged over all words. ● Note MaxEnt's ability to resist dilution.

Evaluating Effects of Combination ● We want different classifiers to make different mistakes. ● We can measure this differentiation as the average (over all pairs of 1 st order classifiers) of the fraction of errors that are shared (error independence). ● When error independence and word difficulty grow, the advantage of combination grows.

Lessons for WSD ● Every word is a separate problem. – All 1 st and 2 nd order classifiers had some words on which they did the best. ● Implementation details: – Large or small window sizes work better than medium window sizes. – This suggests that senses are determined on both a very local, collocational level and a very general, topical level. – Smoothing is very important.

Lessons for Ensemble Methods ● Variety within the ensemble is desirable. – Qualitatively different approaches are better than minor perturbations in similar approaches. – We can measure the extent to which this ideal is achieved. ● Variety in combination algorithms helps as well. – In particular, it can help with overfitting (because different algorithms will start overtraining at different points).