Natural Language Processing Lab, Tsinghua University

Slides:



Advertisements
Similar presentations
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Deep Learning in NLP Word representation and how to use it for Parsing
Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Representation learning for Knowledge Bases LivesIn BornIn LocateIn Friendship Nationality Nicole Kidman PerformIn Nationality Sydney Hugh Jackman Australia.
Cao et al. ICML 2010 Presented by Danushka Bollegala.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Deep Learning for Speech and Language Yoshua Bengio, U. Montreal NIPS’2009 Workshop on Deep Learning for Speech Recognition and Related Applications December.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.
Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.
Omer Levy Yoav Goldberg Ido Dagan Bar-Ilan University Israel
Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.
Deep Visual Analogy-Making
Semantic Compositionality through Recursive Matrix-Vector Spaces
Second Language Learning From News Websites Word Sense Disambiguation using Word Embeddings.
Ganesh J, Soumyajit Ganguly, Manish Gupta, Vasudeva Varma, Vikram Pudi
Ganesh J1, Manish Gupta1,2 and Vasudeva Varma1
A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning Ronan Collobert Jason Weston Presented by Jie Peng.
RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji Oct13, 2015 Acknowledgement: distributional semantics slides from.
Word Sense Disambiguation Algorithms in Hindi
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Learning to Answer Questions from Image Using Convolutional Neural Network Lin Ma, Zhengdong Lu, and Hang Li Huawei Noah’s Ark Lab, Hong Kong
Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.
DeepWalk: Online Learning of Social Representations
A Review of Relational Machine Learning for Knowledge Graphs CVML Reading Group Xiao Lin.
Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.
Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs Zhilin Yang 12, Jie Tang 1, William W. Cohen 2 1 Tsinghua University 2 Carnegie Mellon.
Unsupervised Sparse Vector Densification for Short Text Similarity
Fill-in-The-Blank Using Sum Product Network
Distributed Representations for Natural Language Processing
Deep Learning for Bacteria Event Identification
Wenhan Xiong, Thien Hoang, William Wang Department of Computer Science
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Multimodal Learning with Deep Boltzmann Machines
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
Deep learning and applications to Natural language processing
Vector-Space (Distributional) Lexical Semantics
Deep Learning based Machine Translation
convolutional neural networkS
Distributed Representation of Words, Sentences and Paragraphs
Jun Xu Harbin Institute of Technology China
Variational Knowledge Graph Reasoning
Lei Sha, Jing Liu, Chin-Yew Lin, Sujian Li, Baobao Chang, Zhifang Sui
Web-Mining Agents Multi-Relational Latent Semantic Analysis
convolutional neural networkS
Knowledge Base Completion
Knowledge Graph Embedding
Seminar Topics and Projects
Word embeddings based mapping
Word embeddings based mapping
Introduction to Natural Language Processing
Socialized Word Embeddings
Vector Representation of Text
Word embeddings Text processing with current NNs requires encoding into vectors. One-hot encoding: N words encoded by length N vectors. A word gets a.
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
Word embeddings (continued)
Human-object interaction
3D Point Capsule Networks Lifting Capsule Networks to Raw 3D Data
Peng Cui Tsinghua University
Vector Representation of Text
Improving Cross-lingual Entity Alignment via Optimal Transport
CS249: Neural Language Model
Huawei CBG AI Challenges
Presentation transcript:

Representation Learning for Word, Sense, Phrase, Document and Knowledge Natural Language Processing Lab, Tsinghua University Yu Zhao, Xinxiong Chen, Yankai Lin, Yang Liu Zhiyuan Liu, Maosong Sun

Contributors Yu Zhao Xinxiong Chen Yankai Lin Yang Liu

ML = Representation + Objective + Optimization

Good Representation is Essential for Good Machine Learning

Representation Learning Machine Learning Systems Raw Data Yoshua Bengio. Deep Learning of Representations. AAAI 2013 Tutorial.

NLP Tasks: Tagging/Parsing/Understanding Document Representation Knowledge Representation Phrase Representation Sense Representation Word Representation Unstructured Text

NLP Tasks: Tagging/Parsing/Understanding Document Representation Knowledge Representation Phrase Representation Sense Representation Word Representation Unstructured Text

Typical Approaches for Word Representation 1-hot representation: basis of bag-of-word model star [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …] sun [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, …] sim(star, sun) = 0

Typical Approaches for Word Representation Count-based distributional representation Issues: (1) Involves a large number of design choices (what weighting scheme? what similarity measure?) (2) Going from word to sentence representations is non-trivial, and no clear intuitions exist.

Distributed Word Representation Each word is represented as a dense and real-valued vector in a low-dimensional space

Typical Models of Distributed Representation Neural Language Model Yoshua Bengio. A neural probabilistic language model. JMLR 2003.

Typical Models of Distributed Representation word2vec Tomas Mikolov et al. Distributed representations of words and phrases and their compositionality. NIPS 2003.

Word Relatedness

Semantic Space Encode Implicit Relationships between Words W(‘‘China“) − W(‘‘Beijing”) ≃ W(‘‘Japan“) − W(‘‘Tokyo")

Applications: Semantic Hierarchy Extraction Fu, Ruiji, et al. Learning semantic hierarchies via word embeddings. ACL 2014.

Applications: Cross-lingual Joint Representation Zou, Will Y., et al. Bilingual word embeddings for phrase-based machine translation. EMNLP 2013.

Applications: Visual-Text Joint Representation Richard Socher, et al. Zero-Shot Learning Through Cross-Modal Transfer. ICLR 2013.

Re-search, Re-invent word2vec ≃ MF Neural Language Models Distributional Representation SVD Levy and Goldberg. Neural word embedding as implicit matrix factorization. NIPS 2014.

NLP Tasks: Tagging/Parsing/Understanding Document Representation Knowledge Representation Phrase Representation Sense Representation Word Representation Unstructured Text

Word Sense Representation Apple

Multiple Prototype Methods J. Reisinger and R. Mooney. Multi-prototype vector-space models of word meaning. HLT-NAACL 2010. E Huang, et al. Improving word representations via global context and multiple word prototypes. ACL 2012.

Nonparametric Methods Neelakantan et al. Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space. EMNLP 2014.

Joint Modeling of WSD and WSR ? ? WSR Jobs Founded Apple Chen Xinxiong, et al. A Unified Model for Word Sense Representation and Disambiguation. EMNLP 2014.

Joint Modeling of WSD and WSE

Joint Modeling of WSD and WSE WSD on Two Domain Specific Datasets

NLP Tasks: Tagging/Parsing/Understanding Document Representation Knowledge Representation Phrase Representation Sense Representation Word Representation Unstructured Text

Phrase Representation For high-frequency phrases, learn phrase representation by regarding them as pseudo words: Log Angeles  log_angeles Many phrases are infrequent and many new phrases generate We build a phrase representation from its words based on the semantic composition nature of languages

Semantic Composition for Phrase Represent. + neural network neural network 28

Semantic Composition for Phrase Represent. Heuristic Operations Tensor-Vector Model Zhao Yu, et al. Phrase Type Sensitive Tensor Indexing Model for Semantic Composition. AAAI 2015.

Semantic Composition for Phrase Represent. Model Parameters

Evaluation with Phrase Similarity Evaluation on phrase similarity Compare system ranking with human judgment via spearman correlation coefficient Our model (Tensor Index Model, TIM) achieves best correlation

Visualization for Phrase Representation

NLP Tasks: Tagging/Parsing/Understanding Document Representation Knowledge Representation Phrase Representation Sense Representation Word Representation Unstructured Text

Document as Symbols for DR

Semantic Composition for DR: CNN

Semantic Composition for DR: RNN

Document Representation Models Replicated Softmax: an Undirected Topic Model (NIPS 2010) A Deep Architecture for Matching Short Texts (NIPS 2013) Modeling Documents with a Deep Boltzmann Machine (UAI 2013) A Convolutional Neural Network for Modeling Sentences (ACL 2014) Distributed Representations of Sentences and Documents (ICML 2014) Convolutional Neural Network Architectures for Matching Natural Language Sentences (NIPS 2014)

Topic Model Collapsed Gibbs Sampling Assign each word in a document with an approximately topic

Topical Word Representation Liu Yang, et al. Topical Word Embeddings. AAAI 2015.

Context-Aware Word Similarity Measure word similarities in specific contexts SCWS: 2, 003 pairs of words with contexts

Text Classification Multi-class text classification on 20NewsGroup (20K docs)

NLP Tasks: Tagging/Parsing/Understanding Document Representation Knowledge Representation Phrase Representation Sense Representation Word Representation Unstructured Text

Knowledge Bases and Knowledge Graphs Knowledge is structured as a graph Each node = an entity Each edge = a relation A relation = (head, relation, tail): head = subject entity relation = relation type tail = object entity Typical knowledge bases WordNet: Linguistic KB Freebase: World KB

Research Issues KG is far from complete, we need relation extraction Relation extraction from text: information extraction Relation extraction from KG: knowledge graph completion Issues: KGs are hard to manipulate High dimensions: 10^5~10^8 entities, 10^7~10^9 relation types Sparse: few valid links Noisy and incomplete How: Encode KGs into low-dimensional vector spaces

Typical Models - NTN Neural Tensor Network (NTN) Energy Model

TransE: Modeling Relations as Translations For each (head, relation, tail), relation works as a translation from head to tail

TransE: Modeling Relations as Translations For each (head, relation, tail), make h + r = t

Link Prediction Performance On Freebase15K:

The Issue of TransE Have difficulties for modeling many-to-many relations

Modeling Entities/Relations in Different Space Encode entities and relations in different space, and use relation-specific matrix to project Lin Yankai, et al. Learning Entity and Relation Embeddings for Knowledge Graph Completion. AAAI 2015.

Modeling Entities/Relations in Different Space For each (head, relation, tail), make h x W_r + r = t x W_r head relation tail + =

Cluster-based TransR (CTranR)

Evaluation: Link Prediction Which genre is the movie WALL-E? WALL-E _has_genre ?

Evaluation: Link Prediction Which genre is the movie WALL-E? WALL-E _has_genre Animation Computer animation Comedy film Adventure film Science Fiction Fantasy Stop motion Satire Drama Connecting

Evaluation Datasets

Performance

Performance (FB15K)

Performance on Triple Classification

Research Challenge: KG + Text for RL Incorporate KG embeddings with text-based relation extraction

Power of KG + Text for RL

Research Challenge: Relation Inference Current models consider each relation independently There are complicate correlations among these relations predecessor predecessor father father predecessor grandfather

Reference Learning Structured Embeddings of Knowledge Bases. A. Bordes, J. Weston, R. Collobert & Y. Bengio. AAAI, 2011. Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing. A. Bordes, X. Glorot, J. Weston & Y. Bengio. AISTATS, 2012. A Latent Factor Model for Highly Multi-relational Data. R. Jenatton, N. Le Roux, A. Bordes & G. Obozinski. NIPS, 2012. A Semantic Matching Energy Function for Learning with Multi- relational Data. A. Bordes, X. Glorot, J. Weston & Y. Bengio. MLj, 2013. Irreflexive and Hierarchical Relations as Translations. A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston & O. Yakhnenko. ICML Workshop on Structured Learning, 2013

NLP Tasks: Tagging/Parsing/Understanding Document Representation Knowledge Representation Phrase Representation Sense Representation Word Representation Unstructured Text

Take Home Message Distributed representation is a powerful tool to model semantics of entries in a dense low-dimensional space Distributed representation can be used as pre-training of deep learning to build features of machine learning tasks, especially multi-task learning as a unified model to integrate heterogeneous information (text, image, …) Distributed representation has been used for modeling word, sense, phrase, document, knowledge, social network, text/images, etc.. There are still many open issues Incorporation of prior human knowledge Representation of complicated structure (trees, network paths)

Everything Can be Embedded (given context) Everything Can be Embedded (given context). (Almost) Everything Should be Embedded.

Publications Xinxiong Chen, Zhiyuan Liu, Maosong Sun. A Unified Model for Word Sense Representation and Disambiguation. The Conference on Empirical Methods in Natural Language Processing (EMNLP'14). Yu Zhao, Zhiyuan Liu, Maosong Sun. Phrase Type Sensitive Tensor Indexing Model for Semantic Composition. The 29th AAAI Conference on Artificial Intelligence (AAAI'15). Yang Liu, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun. Topical Word Embeddings. The 29th AAAI Conference on Artificial Intelligence (AAAI'15). Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, Xuan Zhu. Learning Entity and Relation Embeddings for Knowledge Graph Completion. The 29th AAAI Conference on Artificial Intelligence (AAAI'15).

More Information: http://nlp.csai.tsinghua.edu.cn/~lzy Thank You! More Information: http://nlp.csai.tsinghua.edu.cn/~lzy Email: liuzy@tsinghua.edu.cn