Syntax-based Deep Matching of Short Texts

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

Neural networks Introduction Fitting neural networks
Random Forest Predrag Radenković 3237/10
CSC321: Neural Networks Lecture 3: Perceptrons
Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University.
Simple Neural Nets For Pattern Classification
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
Deep Belief Networks for Spam Filtering
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Kumar Srijan ( ) Syed Ahsan( ). Problem Statement To create a Neural Networks based multiclass object classifier which can do rotation,
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Protein motif extraction with neuro-fuzzy optimization Bill C. H. Chang and Author : Bill C. H. Chang and Saman K. Halgamuge Saman K. Halgamuge Adviser.
Neural Networks 2nd Edition Simon Haykin
Robodog Frontal Facial Recognition AUTHORS GROUP 5: Jing Hu EE ’05 Jessica Pannequin EE ‘05 Chanatip Kitwiwattanachai EE’ 05 DEMO TIMES: Thursday, April.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
DeepWalk: Online Learning of Social Representations
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
Big data classification using neural network
Neural Network Architecture Session 2
Learning Deep Generative Models by Ruslan Salakhutdinov
Applying Deep Neural Network to Enhance EMPI Searching
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Deep Feedforward Networks
Deep Learning for Bacteria Event Identification
Summary of “Efficient Deep Learning for Stereo Matching”
Deep Learning Amin Sobhani.
Data Mining, Neural Network and Genetic Programming
ECE 5424: Introduction to Machine Learning
Recursive Neural Networks
Article Review Todd Hricik.
Many slides and slide ideas thanks to Marc'Aurelio Ranzato and Michael Nielson.
Real Neurons Cell structures Cell body Dendrites Axon
Intro to NLP and Deep Learning
Multimodal Learning with Deep Boltzmann Machines
Neural Networks CS 446 Machine Learning.
Supervised Learning Seminar Social Media Mining University UC3M
Neural networks (3) Regularization Autoencoder
Deep learning and applications to Natural language processing
Generalization ..
Pattern Recognition Pattern recognition is:
Convolutional Neural Networks for sentence classification
Result of Ontology Alignment with RiMOM at OAEI’06
General Aspects of Learning
Scale-Space Representation of 3D Models and Topological Matching
Learning Emoji Embeddings Using Emoji Co-Occurrence Network Graph
INF 5860 Machine learning for image classification
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
network of simple neuron-like computing elements
Word embeddings based mapping
EE513 Audio Signals and Systems
Word embeddings based mapping
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Outline Background Motivation Proposed Model Experimental Results
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Neural networks (3) Regularization Autoencoder
Word embeddings (continued)
CSSE463: Image Recognition Day 17
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Introduction to Neural Networks
Sanguthevar Rajasekaran University of Connecticut
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Learning and Memorization
Presentation transcript:

Syntax-based Deep Matching of Short Texts Presented by Jiaxing Tan

Outline Introduction Direct Product of Graphs(PoG) Mining of Matching Patterns Deep Matching Experiment and Result

Outline Introduction Direct Product of Graphs(PoG) Mining of Matching Patterns Deep Matching Experiment and Result

Matching Question: How is the size of the code measured? Correct Answer: Lines of code (LOC), thousands of lines of code (KLOC), Millions of lines of code (MLOC) Student Answer: LOC, KLOC, MLOC

More… Judge the appropriateness of response : -- “I have to work during the weekend!” --“You should rest more.”

Contribution Proposal of an algorithm for mining dependency tree matching patterns on large scale Proposal of an algorithm for learning a deep matching model for using mined matching patterns Empirical validation of the efficacy and efficiency of the proposed method using large scale real datasets.

Outline Introduction Direct Product of Graphs(PoG) Mining of Matching Patterns Deep Matching Experiment and Result

Direct Product of Graph Represent the matching of a pair of sentences (in general short-texts) with the direct product between the dependency trees of them Treat subgraphs of this product graph as matching patterns.

Dependency Tree The dependency tree provides a representation of grammatical relations between words in a sentence. They have been designed to be easily understood and effectively used by people who want to extract textual relations.

Dependency Tree

Direct Product of Graphs The direct product of graphs (PoG) GX = {VX, EX} and GY = {VY , EY }, is a graph GX×Y Given two sentences SX and SY , their interaction relation is represented by the direct product of their dependency trees GX×Y .

“Worked all night” and “Have a good rest”

Abstraction Define two types of abstraction: Same Entity Similar Word Enhance the generalization ability of matching pattern mining described next.

Abstraction- Same Entity Same Entity: We replace the vertex (nei , nei) in GX×Y representing the same entity with a general vertex SAMEENTITY. (How is the weather in Paris?) (Haven’t seen such a sunny day in Paris for a while!) The vertex (Paris, Paris) after the abstraction will be treated as the same vertex as (Boston, Boston) after the same type of abstraction. Graph with this type of abstraction is named

Abstraction- Similar Word Cluster words based on their word2vectors [Mikolov et al., 2013] using the K-means algorithm. For a vertex (wi, wj ) in the product graph, if wi and wj belong to the same word cluster Ck, then the vertex will be replaced with a new vertex SIMWORDk. Graph with this type of abstraction is named .

Sub-graphs of PoG as Matching Patterns Denote = the PoG for sentence pair (SX, SY ) as well as its variants after two types of abstraction. For a sentence pair (SX, SY ), any sub-graph in the corresponding G¯X×Y describes part of the interaction between the two sentences and therefore can contribute to the matching between the two. (weather, sunny) ←→ SAMEENTITY: describ the matching between two sentences in a conversation about weather

Outline Introduction Direct Product of Graphs(PoG) Mining of Matching Patterns Deep Matching Experiment and Result

Mining of Matching Patterns The mining of sub-graphs in PoG of trees can be reduced to jointly selecting the sub-trees on two sides.

Mining without Abstraction It starts with the simplest pattern (1,1), standing for one-word tree on both X side and Y side and grows the mined trees recursively In each growing step (LeftExtend() and RightExtend()), the size of sub- trees is increased by one on either X side or the Y side, followed by a filtering step to remove the found pairs with discriminative ability less than a threshold.

Mining without Abstraction Limit the search for patterns of (m, n + 1) from the candidates Sounds Familiar?

Mining with Abstraction Replace each named entity with a vertex having the same ID

Advantage of Tree Pattern Mining Provide better correspondence between the two sentences than word co-occurrences on two sides Allows us to find patterns representing deep and long-distance relationship within two short-texts to be matched

Outline Introduction Direct Product of Graphs(PoG) Mining of Matching Patterns Deep Matching Experiment and Result

The Deep Matching Model

Work Process Given a pair of short-texts, first obtain their dependency trees, form the direct product of them, and then perform abstraction on them (if suitable). Convert the input text-pair into a binary vector, where an element is one if the corresponding pattern can apply to the input textpair, otherwise it is zero. The binary vector, which is of 10Mdimension and is sparse with typically 10∼50 ones in our experiments, is then fed into the deep neural network for the final match decision.

Architecture Learning 10M raw features and sparsity Each input node is connected to approximately K hidden nodes

Overall Architecture 1,000 units in the first hidden layer (sigmoid active function) 400 in the second hidden layer 30 in the third hidden layer one in the output layer.

Parameter Learning Suppose that we are given the following triples (x, y+, y−) from the oracle, with x (∈ X ) matched with y + better than with y− (both ∈ Y) We have the following pairwise loss as objective: R(W) is the regularization term, and eW(xi , y+i , y−i ) is the error for triple (xi , y+i , y−i ), given by the following large margin form: Training: generic back-propagation algorithm Regularization: dropout [Hinton et al., 2012] early stopping [Caruana et al., 2001]

Outline Introduction Direct Product of Graphs(PoG) Mining of Matching Patterns Deep Matching Experiment and Result

Original-vs-Random: DataOrignal, consists of 4.8 million (tweet, response) pairs. Randomly select ten responses as negative examples (contrastive sampling of negative examples), rendering 45 million triples. Use 485,282 original (tweet, response) pairs not used in the training for testing. For each pair, get nine random responses, and testing the performance of each matching model to pick the correct response.

Original-vs-Random: This model outperforms all the competitor models with large margins. The contribution of deep architectures is manifested by the differences between the deep architectures and shallow ones with the same mined patterns.

Retrieval-based Conversation DataLabeled, consists of 422 tweets and around 30 labeled responses for each tweet Test how different matching models enhance the performance of the retrieval-based conversation model [Wang et al., 2013] on finding a suitable response for a given tweet.

Retrieval-based Conversation Clearly DEEPMATCHtree can greatly improve the performances of retrieving a suitable response from the pool, with significantly better accuracies over the competitor models

Deep vs. Shallow Patterns When trying to find a matched response for T1 , the “deep” pattern {work → weekend ⊗ rest} plays a determining role in picking R1A over R1B, while shallower patterns as {work ⊗ job} and {work ⊗ resume} favor R1B. Original-vs-Random data, P@1 decreases from 0.889 to 0.871 (1v1) and 0.768 to 0.688 (1v9) after removing the deep features.

The effect of abstraction The abstraction step helps improve the generalization ability of the matching model, by improving P@1 on Original-vs-Random from 0.876 to 0.889 (1v1) and 0.694 to 0.708 (1v9).

Thanks