Expressing Implicit Semantic Relations without Supervision ACL 2006.

Slides:



Advertisements
Similar presentations
Text Categorization.
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Latent Semantic Analysis
Generalised Inverses Modal Analysis and Modal Testing S. Ziaei Rad.
Multi-Label Prediction via Compressed Sensing By Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang (NIPS 2009) Presented by: Lingbo Li ECE, Duke University.
Dimensionality Reduction PCA -- SVD
Measuring Praise and Criticism: Inference of Semantic Orientation from Association Peter D. Turney National research Council Canada Michael L. Littman.
Hinrich Schütze and Christina Lioma
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
Scott Wen-tau Yih Joint work with Kristina Toutanova, John Platt, Chris Meek Microsoft Research.
A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations Peter D. Turney Institute for Information Technology National Research Council of.
Indexing by Latent Semantic Analysis Scot Deerwester, Susan Dumais,George Furnas,Thomas Landauer, and Richard Harshman Presented by: Ashraf Khalil.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
Previously Two view geometry: epipolar geometry Stereo vision: 3D reconstruction epipolar lines Baseline O O’ epipolar plane.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Dimension of Meaning Author: Hinrich Schutze Presenter: Marian Olteanu.
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Chapter 5: Information Retrieval and Web Search
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Chapter 2 Dimensionality Reduction. Linear Methods
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
Latent Semantic Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Automated Essay Grading Resources: Introduction to Information Retrieval, Manning, Raghavan, Schutze (Chapter 06 and 18) Automated Essay Scoring with e-rater.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Text mining.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Personalized Web Search by Mapping User Queries to Categories Fang Liu Presented by Jing Zhang CS491CXZ February 26, 2004.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
1 Query Operations Relevance Feedback & Query Expansion.
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
Generic text summarization using relevance measure and latent semantic analysis Gong Yihong and Xin Liu SIGIR, April 2015 Yubin Lim.
Chapter 6: Information Retrieval and Web Search
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
Text mining. The Standard Data Mining process Text Mining Machine learning on text data Text Data mining Text analysis Part of Web mining Typical tasks.
CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2013.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 5. Document Representation and Information Retrieval.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.
4.8 Rank Rank enables one to relate matrices to vectors, and vice versa. Definition Let A be an m  n matrix. The rows of A may be viewed as row vectors.
Instance-based mapping between thesauri and folksonomies Christian Wartena Rogier Brussee Telematica Instituut.
Measuring Semantic Similarity between Words Using Web Search Engines WWW 07.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.
Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Link Distribution on Wikipedia [0407]KwangHee Park.
10.0 Latent Semantic Analysis for Linguistic Processing References : 1. “Exploiting Latent Semantic Information in Statistical Language Modeling”, Proceedings.
Recuperação de Informação B Cap. 02: Modeling (Latent Semantic Indexing & Neural Network Model) 2.7.2, September 27, 1999.
Natural Language Processing Topics in Information Retrieval August, 2002.
Learning Analogies and Semantic Relations Nov William Cohen.
Vector Semantics Dense Vectors.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
From Frequency to Meaning: Vector Space Models of Semantics
CSE 554 Lecture 8: Alignment
Text & Web Mining 9/22/2018.
Control of Multiple-Input, Multiple-Output Processes
A method for WSD on Unrestricted Text
Concept Decomposition for Large Sparse Text Data Using Clustering
CS 430: Information Discovery
Latent Semantic Analysis
Presentation transcript:

Expressing Implicit Semantic Relations without Supervision ACL 2006

2 Abstract For a given input word pair X:Y with unspecified semantic relations –The corresponding output list of patterns is ranked according to how well each pattern p i expresses the relations between X and Y. For example, X =ostrich and Y =bird –X is the largest Y and Y such as X An unsupervised learning algorithm: –Mining large text corpora for patterns –The patterns are sorted by pertinence

3 Introduction Hearst (1992): Y such as the X –X is a hyponym (type) of Y –For building a thesaurus Berland and Charniak (1999) : Y’s X and X of the Y –X is a meronym (part) of Y –For building a lexicon or ontology, like WordNet This paper inverse of this problem: –Given a word pair X : Y with some unspecified semantic relations –Mining a large text corpus for lexico-syntactic patterns to express the implicit relations between X and Y.

4 Introduction A corpus of web pages : 5*10 10 English words –From co-occurrences of the pair ostrich: bird in this corpus 516 patterns of the form “X … Y” 452 patterns of the form “Y … X” Main challenge: –To find a way of ranking the patterns –To find a way to empirically evaluate the performance

5 Pertinence - 1/3 mason:stone vs. carpenter:wood –high degree of relational similarity Assumption: –There is a measure of the relational similarity between pairs of words, sim r (X 1 :Y 1, X 2 :Y 2 ) . –Let W={X 1 :Y 1,.., X n :Y n } : be a set of word pairs –Let P={P 1,..,P m } : be a set of patterns. The pertinence of pattern P i to a word pair X j :Y j is the expected relational similarity between a word pair X k :Y k

6 Pertinence - 2/3 Let f k,i be a number of occurrences –the word pair X k :Y k with the pattern P i conditional probabilityrelational similarity

7 Pertinence - 3/3 assume p(X j :Y j ) =1/n for all pairs in W p(X j :Y j ) =1/n : Laplace smoothing

8 The Algorithm Goal: –Input a set of word pairs W={X 1 :Y 1,…,X n :Y n } –Output ranked lists of patterns for each input pair 1. Find phrases: –Corpus: 5*10 10 English words –List of the phrases that begin with X i and end with Y i –And, list for the opposite order –One to three intervening words between X i and Y i

9 The Algorithm –The first and last words in the phrase do not need to exactly match X i and Y i (allowable different suffixes) 2. Generate patterns: –For example, the phrase “carpenter nails the wood” X nails the Y X nails * Y X * the Y X * * Y –X i first and Y i last or vice versa Do not allow duplicate patterns in a list. Pattern frequency (term frequency in IR)

10 The Algorithm 3. Count pair frequency: –Pair frequency (document frequency in IR) for a pattern is the number of lists contain the given pattern. 4. Map pairs to rows: –For each pair X i : Y i, create a row for X i : Y i and another row for Y i : X i 5. Map patterns to columns: –For each unique pattern of the form “X…Y” (in step2), create a column and another column X and Y swapped, ”Y.. X”

11 The Algorithm 6. Build a sparse matrix: –Build a matrix X. value x ij is the pattern frequency of the j-th patterns for the i-th word pair. 7. Calculate entropy: –log(x ij ) * H(P) H(P)= H(X) = -  x  X p(x)log 2 p(x) 8: Apply SVD (singular value decomposition): –SVD is used to reduce noise and compensate for sparseness

12 The Algorithm –X = U  V T, U,V are in column orthonormal form  is a diagonal matrix of singular value If X is of rank r, then  is also rank r. Let  k (k < r) be the diagonal matrix formed from top k singular values Let U k and V k be the matrices produced by selecting the corresponding columns from U and V. K = 300

13 The Algorithm 9. Calculate cosines: –sim r (X 1 :Y 1, X 2 :Y 2 ) is given by the cosine of the angle between their corresponding row vectors in the matrix U k  k V k 10. Calculate conditional probabilities: –Using Bayes’ theorem and the raw frequency data 11. Calculate pertinence:

14 Experiments with Word Analogies 374 college-level SAT test – word pair: ostrich: bird (a) lion:cat (b) goose:flock (c) ewe:sheep (d) cub:bear (e) primate:monkey –Row: 374*6*2=4488 Drop some pairs they do not co-occur in the corpus rows –Column: 1,706,845 patterns (3,413,690 columns) Drop all patterns with a frequency less than ten. 42,032 patterns (84,064 columns) –density is 0.91%

15

16

17 Skip 15 SAT questions f: pattern frequency F: maximun f n: pair frequency N: total number of word pairs

18 Experiments with Noun-Modifiers-1/3 600 noun-modifiers set 5 general classes of labels with 30 subclasses –flu virus : causality relation (the flu is caused by a virus) –causality (storm cloud), temporality (daily exercise), spatial (desert storm), participant (student protest), and quality (expensive book) Matrix: –1184 rows and 33,698 columns –density is 2.57%

19 Experiments with Noun-Modifiers-2/3 leave-one-out cross-validation –the testing set consists of a single noun-modifier pair and the training set consists of the 599 remaining noun-modifiers.

20 Experiments with Noun-Modifiers-3/3

21 Conclusion How word pairs are similar The main contribution of this paper is the idea of pertinence Although the performance on the SAT analogy questions (54.6%) is near the level of the average senior high school student (57%), there is room for improvement.