Comparing Word Relatedness Measures Based on Google n-grams Aminul ISLAM, Evangelos MILIOS, Vlado KEŠELJ Faculty of Computer Science Dalhousie University,

Slides:



Advertisements
Similar presentations
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Advertisements

Improved TF-IDF Ranker
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,
Latent Semantic Analysis
Evaluating Search Engine
Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures Presenter: Cosmin Adrian Bejan Alexander Budanitsky and.
Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures Written by Alexander Budanitsky Graeme Hirst Retold by.
Introduction to Language Models Evaluation in information retrieval Lecture 4.
Using Information Content to Evaluate Semantic Similarity in a Taxonomy Presenter: Cosmin Adrian Bejan Philip Resnik Sun Microsystems Laboratories.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Learning to Rank for Information Retrieval
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
1 Query Operations Relevance Feedback & Query Expansion.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
A Word at a Time: Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky (Technion) Eugene Agichtein (Emory) Evgeniy Gabrilovich (Yahoo!
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
GOOGLE N-GRAMS ON AMAZON WEB SERVICES PART 2 Thomas Tiahrt, MA, PhD Computer Science 482 – Introduction to Text Analytics.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
Chapter 23: Probabilistic Language Models April 13, 2004.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Measuring Semantic Similarity between Words Using Web Search Engines WWW 07.
Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media.
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Learning a Monolingual Language Model from a Multilingual Text Database Rayid Ghani & Rosie Jones School of Computer Science Carnegie Mellon University.
NLP.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks EMNLP 2008 Rion Snow CS Stanford Brendan O’Connor Dolores.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Semantic Evaluation of Machine Translation Billy Wong, City University of Hong Kong 21 st May 2010.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
NLP.
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
INF 141: Information Retrieval
Presentation transcript:

Comparing Word Relatedness Measures Based on Google n-grams Aminul ISLAM, Evangelos MILIOS, Vlado KEŠELJ Faculty of Computer Science Dalhousie University, Halifax, Canada COLING 2012

Introduction ● Word-relatedness has a wide range of applications – IR: Image retrieval, Query extention… – Paraphrase recognition – Malapropism detection and correction – Automatic creation of thesauri – Speak recognition – …

Introduction ● Methods can be categorized into 3: – Corpus-based ● Supervised ● Unsupervised – Knowledge-based ● Semantic resources were used – Hybrid

Introduction ● This paper focus on unsupervised corpus-based measures ● 6 measures have been compared

Problem ● Unsupervised corpus-based measures usually use co-occurrence statistics, mostly word n- grams and frequencies – The co-occurrence are corpus-specific – Most of the corpura doesn't have co-occurrence stats, thus can't be used on-line – Some use web search result, but results vary from time to time

Motivation ● How to compare different measures fairly? ● Observation – Co-occurrence stats were used – A corpus with co-occurrence information, eg. Google n-grams, is probably a good resource

Google N-Grams ● A publicly available corpus with – Co-occurrence statistics (uni-gram to 5-gram) – A large volume of web text ● Digitalized books with over 5.2 million books published since 1500 – Data format: ● ngram year match_count volume_count ● eg: – analysis is often described as

Another Motivation ● To find a indirect mapping between Google n- grams and web search result – Thus, it might be used on-line

How About WordNet? ● In 2006, Budanitsky and Hirst evaluated 5 knowledge-based measures using WordNet – Create a resource like WordNet requires lots of efforts – Coverage of words is not enough for NLP tasks – Resource is language-specific, while Google n- grams consists more than 10 languages

Notations ● C(w1 … wn) – Frequency of the n-gram ● D(w1 … wn) – # of web docs (up to 5-grams) ● M(w1, w2) – C(w1 wi w2)

Notations ●  (w1, w2) – 1/2 [ C(w1 wi w2) + C(w2 wi w1) ] ● N – # of docs used in Google n-grams ● |V| – # of uni-grams in Google n-grams ● Cmax – max frequency in Google n-grams

Assumptions ● Some measures use web search results, and co- occurrence information not provided by Google n-gram, but – C(w1) ≥ D(w1) – C(w1 w2) ≥ D(w1 w2) ● It is because uni-grams and bi-grams might occurs multiple times in one document

Assumptions ● Considering the lower limits – C(w1) ≈ D(w1) – C(w1 w2) ≈ D(w1 w2)

Measures ● Jaccard Coefficient ● Simpson Coefficient

Measures ● Dice Coefficient ● Pointwise Mutual Information

Measures ● Normalized Google Distane (NGD) variation

Measures ● Relatedness based on Tri-grams (RT)

Evaluation ● Compare with human judgments – It is considered to be the upper limit ● Evaluate the measures with respect to a particular application – Evaluate relatedness of words ● Text Similarity

Compare With Human Judgments ● Rubenstein and Goodenough's 65 Word Pairs – 51 people rating 65 pairs of word (English) on the scale of 0.0 to 4.0 ● Miller and Charles' 28 Noun Pairs – Restricting R&G to 30 pairs, 38 human judges – Most of researchers use 28 pairs because 2 were omitted from early version of WordNet

Result

Application-based Evaluation ● TOEFL's 80 Synonym Questions – Given a problem word, infinite, and four alternative words limitless, relative, unusual, structural choose the most related word ● ESL's 50 Synonym Qeustions – Same as TOEFL's 80 synonym questions task – Expect the synonym questions are from English as a 2nd Language tests

Result

Text Similarity ● Find the similarity between two text items ● Use different measures on a single text similarity measure, and evaluate the results of the text similarity measure based on a standard data set ● 30 sentences pairs from one of most used data sets were used

Result

● Pearson correlation coefficient with mean human similarity ratings: – Ho et al. (2010) used one measure based-on WordNet and then applied those scores in Islam and Inkpen (2008) achieved – Tsatsaronis et al. (2010) achieved – Islam et al. (2012) achieved ● The improvement over Ho et al. (2010) is statistically significant at 0.05 level

Conclusion ● Any measures uses n-gram statistics can easily apply Google n-gram corpus, and be fairly evaluated with existing works on standard data sets of different tasks ● Find an indirect mapping of co-occurrence statistics between the Google n-gram corpus and a web search engine using some assumptions

Conclusion ● Measures based on n-gram are language- independent – Other languages can be implemented if it has a sufficiently large n-gram corpus