Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

Slides:

Advertisements

Similar presentations

Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.

Advertisements

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December

Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.

Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, Bing Qin

Distributed Representations of Sentences and Documents

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

CSCI 5417 Information Retrieval Systems Jim Martin Lecture 6 9/8/2011.

A Privacy Preserving Efficient Protocol for Semantic Similarity Join Using Long String Attributes Bilal Hawashin, Farshad Fotouhi Traian Marius Truta Department.

A Survey of ICASSP 2013 Language Model Department of Computer Science & Information Engineering National Taiwan Normal University 報告者：郝柏翰 2013/06/19.

Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.

Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining

Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.

Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.

2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.

Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,

From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.

Cold Start Problem in Movie Recommendation JIANG CAIGAO, WANG WEIYAN Group 20.

Semi-automatic Product Attribute Extraction from Store Website

Application of latent semantic analysis to protein remote homology detection Wu Dongyin 4/13/2015.

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.

Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding Xu Linhe 14S

1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.

CS791 - Technologies of Google Spring A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.

1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.

Vector Semantics Dense Vectors.

1 Predicting Answer Location Using Shallow Semantic Analogical Reasoning in a Factoid Question Answering System Hapnes Toba, Mirna Adriani, and Ruli Manurung.

Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.

Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.

Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta

DeepWalk: Online Learning of Social Representations

Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.

Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.

Convolutional Sequence to Sequence Learning

Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Syntax-based Deep Matching of Short Texts

Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD

Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.

Hybrid computing using a neural network with dynamic external memory

Compact Query Term Selection Using Topically Related Text

Distributed Representation of Words, Sentences and Paragraphs

Presentation 王睿.

Word Embeddings with Limited Memory

Weakly Learning to Match Experts in Online Community

Exploring Matching Networks for Cross Language Information Retrieval

Word Embedding Word2Vec.

MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.

Block Matching for Ontologies

Using Multilingual Neural Re-ranking Models for Low Resource Target Languages in Cross-lingual Document Detection Using Multilingual Neural Re-ranking.

Word embeddings (continued)

Topic: Semantic Text Mining

Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.

Latent Semantic Analysis

Presentation transcript:

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China NLPCC 2014 A Short Texts Matching Method Using Shallow Features and Deep Features 報告 :2015/02/05 陳思澄

Outline Introduction Matching Short Texts Using Both Shallow and Deep Features: 1. Shallow Matching Features 2.Deep Matching Feature 3.Other Matching Features Ranking and Combination of Features Experiment Conclusion

Introduction Many natural language processing (NLP) tasks (such as, paraphrase identification, information retrieval ) can be reduced to semantic matching problems. Semantic matching is widely used in many natural language processing tasks. In this paper, we focus on the semantic matching between short texts and design a model to generate deep features, which describe the semantic relevance between short “text object”. Furthermore, we design a method to combine shallow features of short texts (i.e., LSI, VSM and some other handcraft features) with deep features of short texts (i.e., word embedding matching of short text).

It’s a simplified task of modeling a complete dialogue session such as Turing test. For the convenience of description, we give the following example: The post P (post), R+ (positive response), R1- (negative response) and R2- (negative response).

The semantic matching is a challenging problem, since it aims to find the semantic relevance between two “text objects”. Wang et al describe a retrieval-based model, Although the retrieval-based model performs reasonably well on the post- response matching task, it can only capture the word-by-word matching and latent semantic matching features between posts and responses. For retrieval-based model, it is difficult to recognize R1– as a negative response. However, it is easy to recognize R2– as a negative response, only if we add a feature describing whether the named entities from the post and response are the same.

Distributed representation (word embedding) of text induced from deep learning is well suited for this task, because it contains rich semantic information of text and can model the relevance between different words or phrases. However, they are not powerful enough at handling the subtlety for specific task. For embeddings of “ 上海 ” (Shanghai) and “ 深圳 ”( Shenzhen) are very close, it is difficult for embedding based method to recognize R2- as negative response to P. In this paper, we study the shallow features and deep features for short text matching and try to combine them to improve the performance.

Matching Short Texts Using Both Shallow and Deep Features In our method, we build three new matching models and learn both shallow features and deep features between posts and responses. Then we build a ranking model to evaluate the candidate responses with both shallow features and deep features. For shallow matching features, we first introduce a variant of vector space model to measure post-response similarity by a word-by-word matching. Then we use LSI (Latent semantic indexing) model to capture the latent semantic matching features, which may not be well captured by a word-by-word matching. For deep matching features, we construct a matching models based on neural networks and word embedding.

Shallow Matching Features Post-Response Similarity: To measuring the similarity between a post and a response by a word-by-word matching, we use a simple vector space model. For example, given a post and a response, the sparse representations of them in vector space model are x and y, then the Post-Response Similarity can be measured by the cosine similarity.

Post-Response Latent Semantic Similarity: It learns a mapping from the original sparse representation to a low-dimensional and dense representation for text. We represent post as vector x and response as vector y. Using LSI model, we map x and y to low-dimensional vector representations x lsi and y lsi. Post-Post Latent Semantic Similarity: We also consider the semantic similarity between a post and a post. The intuition is that a response y is suitable for a post x if its original post x’ is similar to x. Here we also use the LSI model to measure the post-post semantic similarity.

Deep Matching Feature : To learn the deep matching features between posts and responses, we propose a matching model based on neural networks and word embedding. With the learned word embeddings, we first map every word of posts and responses to a unique vector. As shown below, given a post x or a response y, we first convert them into a set of vectors. By measuring cosine similarity for each vector pair in X and Y, we can get a correlation matrix Mcor.

The size of this correlation matrix is variable for variable-length posts and responses. In order to use neural networks and prevent the dimension of the feature space becoming too large, we need to convert variable-length posts and responses to fixed length. We sort all the words in post x and response y by their TF-IDF weights in descending order, then we choose the top m words in x and top n words in y. Finally, we can get a correlation matrix with size m*n.

Then we build neural networks with single hidden layer, which uses values of the feature vector as input features and outputs a matching score. The whole model is shown in Figure 1.

Other Matching Features: In addition to the matching features generated from the matching models above, we also use some handcraft features for this task, which can describe the relevance between post and response for some special cases. Common_Entity(x,y): This feature measures whether post x and response y have same entity words such as first and last names, geographic locations, addresses, companies and organization name. The intuition here is that a post and a response in the nature conversation usually contain some common key words. Common_Number(x,y): This feature measures whether post x and response y have same number such as date and money.

Common_Word_length(x,y): This feature indicates the length of the longest common string between a post and a response. In social media such as Micro-blog, a response can be forwarded as a new post. We may find a response which is very similar to the given post, but it’s not a suitable response. With this feature, we can filter this kind of responses. Length_Ratio(x,y): This feature indicates the ratio of the length of post x to the length of response y.

Ranking and Combination of Features: As shown in Figure 2, given a post, the ranking model gives a matching score for each candidate response. Then we pick the response with the highest matching score from the candidate set as suitable response for the post. The ranking function of the ranking model is defined as following, which is a linear score function and trained with RankSVM.

Experiment: In our experiments, we made use of the dataset of short-text conversation based on the real-world instances from Sina Weibo. The unlabeled post-response pairs are training set, which contains 38,016 posts and 618,104 responses. The labeled post-response pairs are test dataset, which contains 422 posts and 12,402 responses, and there are about 30 candidate responses for each post. The vocabulary of this dataset contains 125,817 words.

An example of unlabeled post-response pair is shown below: In order to train word embedding, we also build a Weibo dataset of 33,405,212 posts. We use the method introduced by Mikolov et al. for learning word embedding. In our experiments, we use a particular implementation of this model and the length of word embedding is set to 100. After training, we learned a set of word embeddings with a vocabulary of size 810,214.

Experiments and Benchmark: Our model will return the response with the highest matching score. So the evaluation of our models is based on the This measures the precision of the response returned by model.

Performance: The results of experiments are shown in Table 2. Our competitor methods include retrieval-based model and DeepMatch model. For the DeepMatch model, we re-implement it and train it on the unlabeled training dataset. VSM stands for the matching features generated by vector space model based on TF-IDF. The deep feature is generated by embedding match model.

Conclusion: In this paper, we design an embedding match model to generate deep features for short text matching and study the effect of shallow features and deep features on the performance. Experiments show that the deep features cover rich semantic relevance information between post and response, which the shallow features cannot capture. Combining the shallow features and deep feature generated by embedding match model, we get the state-of-the-art performance on the dataset released by Wang et al.