Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research.

Slides:



Advertisements
Similar presentations
1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
Advertisements

Scott Wen-tau Yih Joint work with Ming-Wei Chang, Chris Meek, Andrzej Pastusiak Microsoft Research The 51st Annual Meeting of the Association for Computational.
Scott Wen-tau Yih (Microsoft Research) Joint work with Vahed Qazvinian (University of Michigan)
Katrin Erk Distributional models. Representing meaning through collections of words Doc 1: Abdullah boycotting challenger commission dangerous election.
Latent Semantic Analysis
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Concepts and Categories. Functions of Concepts By dividing the world into classes of things to decrease the amount of information we need to learn, perceive,
Scott Wen-tau Yih Joint work with Kristina Toutanova, John Platt, Chris Meek Microsoft Research.
Indexing by Latent Semantic Analysis Scot Deerwester, Susan Dumais,George Furnas,Thomas Landauer, and Richard Harshman Presented by: Ashraf Khalil.
A probabilistic approach to semantic representation Paper by Thomas L. Griffiths and Mark Steyvers.
Probabilistic Latent Semantic Analysis
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Linguistic Regularities in Sparse and Explicit Word Representations
Latent Semantic Indexing Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.
Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Fractal Composition of Meaning: Toward a Collage Theorem for Language Simon D. Levy Department of Computer Science Washington and Lee University Lexington,
Latent Semantic Analysis Hongning Wang VS model in practice Document and query are represented by term vectors – Terms are not necessarily orthogonal.
Automated Essay Grading Resources: Introduction to Information Retrieval, Manning, Raghavan, Schutze (Chapter 06 and 18) Automated Essay Scoring with e-rater.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Asymmetric Word Similarity Behrad Assadian Trevor Martin Ben Azvine.
Thesaurus Everything you need to know about using it.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
Computational Linguistics WTLAB ( Web Technology Laboratory ) Mohsen Kamyar.
Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.
Speech Analysing Component in Automatic Tutoring Systems Presentation by Doris Diedrich and Benjamin Kempe.
UNIVERSITAS SCIENTIARUM SZEGEDIENSIS UNIVERSITY OF SZEGED D epartment of Software Engineering New Conceptual Coupling and Cohesion Metrics for Object-Oriented.
Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference.
Learn to Comment Lance Lebanoff Mentor: Mahdi. Emotion classification of text  In our neural network, one feature is the emotion detected in the image.
1 Query Operations Relevance Feedback & Query Expansion.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
1 Sentence-extractive automatic speech summarization and evaluation techniques Makoto Hirohata, Yosuke Shinnaka, Koji Iwano, Sadaoki Furui Presented by.
Expressing Implicit Semantic Relations without Supervision ACL 2006.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.
CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Efficient Estimation of Word Representations in Vector Space
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Scott Wen-tau Yih (Microsoft Research) Joint work with Hannaneh Hajishirzi (University of Illinois) Aleksander Kolcz (Microsoft Bing)
Progress Report (Concept Extraction) Presented by: Mohsen Kamyar.
1 Business Proprietary © 2009 Oculus Info Inc. Everyone’s a Critic: Memory Models and Uses for an Artificial Turing Judge W. Joseph MacInnes, Blair C.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
Link Distribution on Wikipedia [0407]KwangHee Park.
Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle
Of An Expert System.  Introduction  What is AI?  Intelligent in Human & Machine? What is Expert System? How are Expert System used? Elements of ES.
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Vector Semantics Dense Vectors.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
Latent Semantic Analysis (LSA) Jed Crandall 16 June 2009.
Thesaurus Everything you need to know about using it.
Distributed Representations for Natural Language Processing
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Sentiment analysis algorithms and applications: A survey
Deep learning David Kauchak CS158 – Fall 2016.
ArtsSemNet: From Bilingual Dictionary To Bilingual Semantic Network
Efficient Estimation of Word Representation in Vector Space
Web-Mining Agents Multi-Relational Latent Semantic Analysis
Word Embedding Word2Vec.
Ali Hakimi Parizi, Paul Cook
Word representations David Kauchak CS158 – Fall 2016.
Latent Semantic Analysis
Presentation transcript:

Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Build an intelligent system that can interact with human using natural language Research challenge Meaning representation of text Support useful inferential tasks Semantic word representation is the foundation Language is compositional Word is the basic semantic unit

A lot of popular methods for creating word vectors! Vector Space Model [Salton & McGill 83] Latent Semantic Analysis [Deerwester+ 90] Latent Dirichlet Allocation [Blei+ 01] Deep Neural Networks [Collobert & Weston 08] Encode term co-occurrence information Measure semantic similarity well

sunny rainy windy cloudy car wheel cab sad joy emotion feeling

Tomorrow will be rainy. Tomorrow will be sunny.

Can’t we just use the existing linguistic resources? Knowledge in these resources is never complete Often lack of degree of relations Create a continuous semantic representation that Leverages existing rich linguistic resources Discovers new relations Enables us to measure the degree of multiple relations (not just similarity)

Introduction Background: Latent Semantic Analysis (LSA) Polarity Inducing LSA (PILSA) Multi-Relational Latent Semantic Analysis (MRLSA) Encoding multi-relational data in a tensor Tensor decomposition & measuring degree of a relation Experiments Conclusions

Introduction Background: Latent Semantic Analysis (LSA) Polarity Inducing LSA (PILSA) Multi-Relational Latent Semantic Analysis (MRLSA) Encoding multi-relational data in a tensor Tensor decomposition & measuring degree of a relation Experiments Conclusions

Data representation Encode single-relational data in a matrix Co-occurrence (e.g., from a general corpus) Synonyms (e.g., from a thesaurus) Factorization Apply SVD to the matrix to find latent components Measuring degree of relation Cosine of latent vectors

Cosine Score Input: Synonyms from a thesaurus Joyfulness: joy, gladden Sad: sorrow, sadden joygladdensorrowsaddengoodwill Group 1: “joyfulness”11000 Group 2: “sad”00110 Group 3: “affection”00001 Target word: row-vectorTerm: column-vector

terms

LSA cannot distinguish antonyms [Landauer 2002] “Distinguishing synonyms and antonyms is still perceived as a difficult open problem.” [Poon & Domingos 09]

Data representation Encode two opposite relations in a matrix using “polarity” Synonyms & antonyms (e.g., from a thesaurus) Factorization Apply SVD to the matrix to find latent components Measuring degree of relation Cosine of latent vectors

joygladdensorrowsaddengoodwill Group 1: “joyfulness”11 0 Group 2: “sad” 110 Group 3: “affection”00001 Joyfulness: joy, gladden; sorrow, sadden Sad: sorrow, sadden; joy, gladden Inducing polarity Target word: row-vector

joygladdensorrowsaddengoodwill Group 1: “joyfulness”11 0 Group 2: “sad” 110 Group 3: “affection”00001 Joyfulness: joy, gladden; sorrow, sadden Sad: sorrow, sadden; joy, gladden Inducing polarity Target word: row-vector

Limitation of the matrix representation Each entry captures a particular type of relation between two entities, or Two opposite relations with the polarity trick Encoding other binary relations Is-A (hyponym) – ostrich is a bird Part-whole – engine is a part of car Encode multiple relations in a 3-way tensor (3-dim array)!

Data representation Encode multiple relations in a tensor Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base) Factorization Apply tensor decomposition to the tensor to find latent components Measuring degree of relation Cosine of latent vectors after projection

Data representation Encode multiple relations in a tensor Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base) Factorization Apply tensor decomposition to the tensor to find latent components Measuring degree of relation Cosine of latent vectors after projection

Data representation Encode multiple relations in a tensor Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base) Factorization Apply tensor decomposition to the tensor to find latent components Measuring degree of relation Cosine of latent vectors after projection

Data representation Encode multiple relations in a tensor Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base) Factorization Apply tensor decomposition to the tensor to find latent components Measuring degree of relation Cosine of latent vectors after projection

Data representation Encode multiple relations in a tensor Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base) Factorization Apply tensor decomposition to the tensor to find latent components Measuring degree of relation Cosine of latent vectors after projection

Represent word relations using a tensor Each slice encodes a relation between terms and target words joy gladden sadden feeling joyfulness gladden sad anger gladden joy sadden feeling joyfulness gladden sad anger Synonym layer Antonym layer Construct a tensor with two slices

Can encode multiple relations in the tensor gladden joy sadden feeling joyfulness gladden sad anger Hyponym layer

Data representation Encode multiple relations in a tensor Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base) Factorization Apply tensor decomposition to the tensor to find latent components Measuring degree of relation Cosine of latent vectors after projection

Derive a low-rank approximation to generalize the data and to discover unseen relations Apply Tucker decomposition and reformulate the results ~ ~ × × latent representation of words

~ ~ × × Derive a low-rank approximation to generalize the data and to discover unseen relations Apply Tucker decomposition and reformulate the results ~ ~ × × latent representation of words latent representation of a relation

Data representation Encode multiple relations in a tensor Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base) Factorization Apply tensor decomposition to the tensor to find latent components Measuring degree of relation Cosine of latent vectors after projection

Similarity Cosine of the latent vectors Other relation (both symmetric and asymmetric) Take the latent matrix of the pivot relation (synonym) Take the latent matrix of the relation Cosine of the latent vectors after projection

joy gladden sadden felling joyfulness gladden sad anger gladden joy sadden felling joyfulness gladden sad anger Synonym layer Antonym layer

joy gladden sadden felling joyfulness gladden sad anger gladden joy sadden felling joyfulness gladden sad anger Synonym layer Antonym layer

joy gladden sadden felling joyfulness gladden sad anger Synonym layer gladden joy sadden feeling joyfulness gladden sad anger Hypernym layer

Synonym layer The slice of the specific relation

Cos (, ) ~ ~ × × ~ ~ ××

Introduction Background: Latent Semantic Analysis (LSA) Polarity Inducing LSA (PILSA) Multi-Relational Latent Semantic Analysis (MRLSA) Encoding multi-relational data in a tensor Tensor decomposition & measuring degree of a relation Experiments Conclusions

Encarta Thesaurus Record synonyms and antonyms of target words V ocabulary of 50k terms and 47k target words WordNet Has synonym, antonym, hyponym, hypernym relations V ocabulary of 149k terms and 117k target words Goals: MRLSA generalizes LSA to model multiple relations Improve performance by combing heterogeneous data

TargetHigh Score Words inanimatealive, living, bodily, in-the-flesh, incarnate alleviateexacerbate, make-worse, in-flame, amplify, stir-up relishdetest, abhor, abominate, despise, loathe * Words in blue are antonyms listed in the Encarta thesaurus.

Task: GRE closest-opposite questions Which is the closest opposite of adulterate ? (a) renounce (b) forbid (c) purify (d) criticize (e) correct Accuracy

TargetHigh Score Words birdostrich, gamecock, nighthawk, amazon, parrot automobileminivan, wagon, taxi, minicab, gypsy cab vegetablebuttercrunch, yellow turnip, romaine, chipotle, chilli

Accuracy

Relational domain knowledge – the entity type Relation can only hold between the right types of entities Words having is-a relation have the same part-of-speech For relation born-in, the entity types are: (person, location) Leverage type information to improve MRLSA Idea #3: Change the objective function

Continuous semantic representation that Leverages existing rich linguistic resources Discovers new relations Enables us to measure the degree of multiple relations Approaches Better data representation Matrix/Tensor decomposition Challenges & Future Work Capture more types of knowledge in the model Support more sophisticated inferential tasks