1 Measuring the Semantic Similarity of Texts Author : Courtney Corley and Rada Mihalcea Source : ACL-2005 Reporter : Yong-Xiang Chen.

Slides:



Advertisements
Similar presentations
Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Advertisements

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Improved TF-IDF Ranker
Large-Scale Entity-Based Online Social Network Profile Linkage.
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle
Using Semantic Similarity Measures in the Biomedical Domain for Computing Similarity between Genes based on Gene Ontology By : Elham Khabiri Adviser :
Word Sense Disambiguation for Machine Translation Han-Bin Chen
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Measures of Text Similarity
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures Presenter: Cosmin Adrian Bejan Alexander Budanitsky and.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
June 19-21, 2006WMS'06, Chania, Crete1 Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies.
WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Printing: This poster is 48” wide by 36” high. It’s designed to be printed on a large-format printer. Customizing the Content: The placeholders in this.
Personalisation Seminar on Unlocking the Secrets of the Past: Text Mining for Historical Documents Sven Steudter.
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
Carmen Banea, Rada Mihalcea University of North Texas A Bootstrapping Method for Building Subjectivity Lexicons for Languages.
Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM ASSOCIATION FOR COMPUTING MACHINERY.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Learning Phonetic Similarity for Matching Named Entity Translation and Mining New Translations Wai Lam, Ruizhang Huang, Pik-Shan Cheung ACM SIGIR 2004.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Automatic and Human Scoring of Word Definition Responses Kevyn Collins-Thompson and Jamie Callan Language Technologies Institute School of Computer Science.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.
Yuya Akita , Tatsuya Kawahara
Using Semantic Relatedness for Word Sense Disambiguation
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Subjectivity Recognition on Word Senses via Semi-supervised Mincuts Fangzhong Su and Katja Markert School of Computing, University of Leeds Human Language.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Semantic Evaluation of Machine Translation Billy Wong, City University of Hong Kong 21 st May 2010.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
A method for WSD on Unrestricted Text
Unsupervised Word Sense Disambiguation Using Lesk algorithm
Text-based User-kNN: Measuring user similarity based on text reviews
Presentation transcript:

1 Measuring the Semantic Similarity of Texts Author : Courtney Corley and Rada Mihalcea Source : ACL-2005 Reporter : Yong-Xiang Chen

2 Outline Introduction Semantic Similarity of Words Semantic Similarity of Texts A Walk-Through Example Evaluation Conclusion

3 Introduction Measures of text similarity have been used for IR, text classification, WSD, automatic evaluation of machine translation, text summarization The typical approach to use a simple lexical matching method, and produce a similarity score But most text similarity metrics will fail in these texts I own a dog I have an animal

4 Introduction (cont.) LSA measure similarity between texts by including Similar terms in large text collections In this paper, we explore a knowledge-based method for measuring the semantic similarity of texts There are several methods for finding the semantic similarity of words We combine these methods into a text-to-text semantic similarity method

5 Semantic Similarity of Words The Leacock & Chodorow (Leacock and Chodorow, 1998) similarity Length: the length of the shortest path between two concepts D: the maximum depth of the taxonomy The Wu and Palmer (Wu and Palmer, 1994) similarity

6 Semantic Similarity of Words (cont.) The information content (IC) of the LCS P(c): the probability of encountering an instance of concept c in a large corpus Lin’s metric(Lin, 1998) Jiang & Conrath (Jiang and Conrath, 1997)

7 Language Models Language models are used to account for the distribution of words in language We take into account the specificity of words For example, collie and sheepdog: higher weight go and be: give less importance TF does not always constitute a good measure of word importance The distribution of words across an entire collection can be a good indicator of the specificity of the words -- (IDF)

8 Semantic Similarity of Texts A directional measure of semantic similarity indicates the semantic similarity of a text segment T i with respect to a text segment T j Sets of open-class words—N, V, Adj, Adv Determine pairs of similar words across the sets corresponding to the same open-class in two text For nouns and verbs, we use a measure based on WordNet Apply lexical matching to the other word classes

9 Semantic Similarity of Texts (cont.) maxSim: the highest semantic similarity of the six methods The score is between 0 and 1 with respect to T i If this similarity measure results in a score greater than 0, then the word is added to the set of similar words for the corresponding word class WSpos A bidirectional similarity

10 A Walk-Through Example First, the text segments are tokenize, POS tagged The words are inserted into word class sets

11 A Walk-Through Example (cont.) We seek a WordNet-based semantic similarity for N and V Only lexical matching for Adj, Adv, and cardinals

12 A Walk-Through Example (cont.) We use The semantic similarity with respect to text 1 as With respect to text 2 as A bidirection measure of similarity:

13 Evaluation To test the effectiveness of the text semantic similarity metric Automatically identify if two text segments are paraphrases of each other Corpus: The Microsoft paraphrase corpus  4,076 training pairs and 1,725 test pairs PASCAL corpus  580 development pairs and 800 test pairs Two setting An unsupervised setting  threshold of 0.5 An supervised setting  the optimal threshold and weights associated with various similarity methods are determined through learning on training data

14 Evaluation (cont.) Three baseline Randomly choosing a true or false value for each text pair A lexical matching which counts the number of matching words Using tf * idf paraphrase identification 狗正在吃骨頭 -> 骨頭正在被狗吃 entailment identification 我能看見一條狗 -> 我能看見一隻動物

15 Evaluation (cont.)

16 Evaluation (cont.)

17 Conclusion The accuracy of text semantic similarity for paraphrase identification(68.8%, 71.5%) For the entailment data set, the accuracy 58.3 % is better than the PASCAL entailment evaluation (Dagan et al., 2005) Our method relies on a bag-of-words approach Improves significantly over the traditional methods But ignores many of important relationships in sentence structure