Asymmetric Word Similarity Behrad Assadian Trevor Martin Ben Azvine.

Slides:



Advertisements
Similar presentations
Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Advertisements

COMP 116: Introduction to Scientific Programming Lecture 37: Final Review.
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
A Neural Probabilistic Language Model Keren Ye.
CS 430 / INFO 430 Information Retrieval
CS 430 / INFO 430 Information Retrieval
Advance Information Retrieval Topics Hassan Bashiri.
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Vector Methods 1.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
IR Models: Review Vector Model and Probabilistic.
1 CS 430: Information Discovery Lecture 2 Introduction to Text Based Information Retrieval.
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
Typography K1066BI – Graphical Design Teppo Räisänen
Lesson 3 Text Basics Adapted From Source:
More on Regular Expressions Regular Expressions More character classes \s matches any whitespace character (space, tab, newline etc) \w matches.
IAT Text ______________________________________________________________________________________ SCHOOL OF INTERACTIVE ARTS + TECHNOLOGY [SIAT]
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Multi-Prototype Vector Space Models of Word Meaning __________________________________________________________________________________________________.
Kohonen Mapping and Text Semantics Xia Lin College of Information Science and Technology Drexel University.
Lecture 22 Word Similarity Topics word similarity Thesaurus based word similarity Intro. Distributional based word similarityReadings: NLTK book Chapter.
Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.
Automated Reassembly of Document Fragments DFRWS 2002.
COMP 116: Introduction to Scientific Programming Lecture 24: Strings in MATLAB.
SINGULAR VALUE DECOMPOSITION (SVD)
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining Farial Shahnaz.
Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Vector Space Models.
Outline Problem Background Theory Extending to NLP and Experiment
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 8. Text Clustering.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
Contextual Text Cube Model and Aggregation Operator for Text OLAP
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Introduction to Hypertext Markup Language James H. Harrison, Jr., M.D., Ph.D. Center for Biomedical Informatics University of Pittsburgh Medical Center.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Project Proposal Dan DeBlasio & Brad Mundt CAP 6133, Spring February 2008.
Language Model for Machine Translation Jang, HaYoung.
Plan for Today’s Lecture(s)
Vector Semantics Introduction.
Vector-Space (Distributional) Lexical Semantics
Synchronizing Text & Objects
A1 Student Posters Posters Print Services  Robinson Library  University of Newcastle  phone: Introduction The.
<ELLIIT Project Name>
Investigation and/or management Discussion or Conclusion
Poster Title Heading Heading Heading Heading Heading Heading
2016 REPORTING The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The Weirdest Case I Ever Saw Dr. AN Other, Dr. J Doe & Prof
CS 430: Information Discovery
Definición y tipos discriminación
A0 PowerPoint Poster Posters at Print Services Robinson Library, Newcastle University • • phone Introduction.
Lecture 22 Word Similarity
2016 REPORT.
Language Model Approach to IR
A1 Student Posters Posters at Print Services  Robinson Library  University of Newcastle  phone: Introduction.
Hash Tables: Associative Containers with Constant Time Operations --- On Average Consider the problem of computing the frequency of words.
201X REPORT.
目 录 The quick brown fox. 目 录 The quick brown fox.
KEYBOARDING: SPEED & ACCURACY
2016 REPORT.
CS 430: Information Discovery
Self-Balancing Search Trees
Presentation transcript:

Asymmetric Word Similarity Behrad Assadian Trevor Martin Ben Azvine

An approach to understanding of text documents Capture semantics of textual information Matrix of Word Similarity Applicable to a particular domain Use a corpus of textual documents Resolves issues encountered by other traditional methods Can use this to measure document similarity and clustering Introduction

It is deduced that it is possible to guess the meaning of an unknown word from its context (Pantal P, D Linn) A bottle of Tezguno is on the table. Everyone likes Tezguno. Tezguno makes you drunk. We make Tezguno out of corn Can be deduces using Distributional Hypothesis that “Tezguno” is a type of alcoholic drink

Asymmetric Word Similarity Matrix Based on Identifying frequencies of ngrams of context words e.g c1-x-c2 represented as x:([c1,c2]) Consider The quick brown fox jumps over the lazy dog. The quick brown cat jumps onto the active dog. The slow brown fox jumps onto the quick brown cat. The quick brown cat leaps over the quick brown fox.

Fuzzy set represents context of a word e.g for brown {(quick,cat):1,(quick, fox):0.833, (slow,fox):0.50} Convert frequencies to fuzzy sets

Mass assignment followed by Semantic Unification is carried out. Result given as a single value probability Two words W1 and W2 pr(w1|w2) degree to which w1 could replace w2 Performing every possible semantic unification gives word similarity matrix Many elements shall be zero

Document Clustering Can cluster documents using AWS matrix Other known methods Vector Space Model Limitation:- String matching Words such as taxi and cab could be ignored document similarity matrix Distance between two documents can be identified. Cluster files around starting file 

Results Film Description Reviews of movies Tested using WordNet & inspection Identified Synonyms/antonyms Close Hypernyms identified Exhaustive search Total antonyms/synonyms/hypernyms that exists but not identified Hit rate of 67%, 28% and 30%

Clustering results Movie corpus reviews Possible to compare clustered results Can set threshold value

Proposed a method for clustering documents using Asymmetric Word Similarity Results using WordNet prove encouraging Using context to determine semantics can be affective Must carry out further comparison with other common methods Performance issues for large corpuses must be addressed