Ali Hakimi Parizi, Paul Cook

Slides:

Advertisements

Similar presentations

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Advertisements

Deep Belief Networks for Spam Filtering

Deep Learning Neural Network with Memory (1)

Analysis of a Neural Language Model Eric Doi CS 152: Neural Networks Harvey Mudd College.

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

C - IT Acumens. COMIT Acumens. COM. To demonstrate the use of Neural Networks in the field of Character and Pattern Recognition by simulating a neural.

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.

Convectional Neural Networks

A Simple Approach for Author Profiling in MapReduce

Attention Model in NLP Jichuan ZENG.

R-NET: Machine Reading Comprehension With Self-Matching Networks

Convolutional Sequence to Sequence Learning

Unsupervised Learning of Video Representations using LSTMs

CNN-RNN: A Uniﬁed Framework for Multi-label Image Classiﬁcation

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.

RNNs: An example applied to the prediction task

Convolutional Neural Network

Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :

End-To-End Memory Networks

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

Deep Learning for Bacteria Event Identification

Deep Learning Amin Sobhani.

Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel

Recursive Neural Networks

Recurrent Neural Networks for Natural Language Processing

WSRec: A Collaborative Filtering Based Web Service Recommender System

Neural Machine Translation by Jointly Learning to Align and Translate

Intro to NLP and Deep Learning

ICS 491 Big Data Analytics Fall 2017 Deep Learning

Intelligent Information System Lab

Intro to NLP and Deep Learning

Vector-Space (Distributional) Lexical Semantics

RNNs: Going Beyond the SRN in Language Prediction

Attention-based Caption Description Mun Jonghwan.

A critical review of RNN for sequence learning Zachary C

A First Look at Music Composition using LSTM Recurrent Neural Networks

Final Presentation: Neural Network Doc Summarization

klinické neurofyziologie

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

Word Embedding Word2Vec.

MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.

The Big Health Data–Intelligent Machine Paradox

Neural Networks Geoff Hulten.

Other Classification Models: Recurrent Neural Network (RNN)

Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B

Visualizing and Understanding Convolutional Networks

Recurrent Encoder-Decoder Networks for Time-Varying Dense Predictions

Machine Translation(MT)

Learning linguistic structure with simple recurrent neural networks

RNNs: Going Beyond the SRN in Language Prediction

Graph Neural Networks Amog Kamsetty January 30, 2019.

Neural networks (3) Regularization Autoencoder

Word embeddings (continued)

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Attention for translation

RNNs and Sequence to sequence models

Neural Joint Model for Transition-based Chinese Syntactic Analysis

CSE 291G : Deep Learning for Sequences

Automatic Handwriting Generation

Presented by: Anurag Paul

Building Dynamic Knowledge Graphs From Text Using Machine Reading Comprehension Rajarshi Das, Tsendsuren Munkhdalai, Xingdi Yuan, Adam Trischler, Andrew.

Using Clustering to Make Prediction Intervals For Neural Networks

Topic: Semantic Text Mining

Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.

Question Answering System

Baseline Model CSV Files Pandas DataFrame Sentence Lists

Bidirectional LSTM-CRF Models for Sequence Tagging

Neural Machine Translation by Jointly Learning to Align and Translate

LSTM Practical Exercise

Presentation transcript:

Do Character-level Language Models Capture Knowledge of Multiword Expression Compositionality? Ali Hakimi Parizi, Paul Cook Faculty of Computer Science, University of New Brunswick ahakimi@unb.ca, paul.cook@unb.ca Multiword Expressions Training Data and Evaluation Dataset Multiword expressions (MWEs) are word combinations that display some form of idiomaticity. For all experiments, we train our models over raw text Wikipedia corpora for either English or German Compositionality refers to the degree to which the meaning of a MWE can be predicted by combining the meanings of its components. 1 Preprocessing steps: Use the WP2TXT toolbox to eliminate XML and HTML tags and hyperlinks. Select a portion of data randomly. Remove all other characters which are not in the language. End User End + User Compositional Its meaning is related to “game” to some degree Game Plan Semi Compositional We evaluate our methods over two datasets: English noun compounds, “ENCs”: contains 90 noun compounds, annotated on a continuous [0,5] scale. The “GNC” dataset consists of 244 German noun compounds, annotated on a continuous [1,7] scale. We cannot infer its meaning from its components Couch Potato Non- Compositional Character Level Language Models Results The probability of the next character is determined based on previous characters. Pearson’s r Model Comp1 Comp2 ENC RNN(1%) -0.01 p=0.92 -0.11 p=0.28 GRU(1%) 0.25 p=0.016 0.20 p=0.04 0.23 p=0.02 0.18 p=0.08 0.27 p=0.011 0.27 p=0.009 Word-Embedding English 0.717 0.736 GNC German LSTM (5%) 0.03 p=0.618 0.03 p=0.590 Word Embedding German 0.371 0.349 Can estimate the probability of out-of-vocabulary words, unlike word-level language models. Target chars “e” “l” “l” “o” 1 2.2 -3.0 4.1 0.5 0.3 -1 1.2 0.1 0.5 1.9 -1.1 0.2 -1.5 -0.1 2.2 Output layer W-hy W-hh Hidden layer 0.3 -0.1 9 1 0.3 0.1 1 -0.5 -0.3 -0.3 0.7 1 W-hx 1 1 1 1 Input layer Input chars “h” “e” “l” “l” Conclusions and Future Work http://karpathy.github.io/2015/05/21/rnn-effectiveness Character-level language models capture (some) knowledge about MWE compositionality. Compositionality Equations Word embedding models achieve higher correlation, but can't represent out-of-vocabulary or low-frequency words. To calculate the compositionality, we use two different methods: Future work: Explore parameter settings of character-level language models: e.g., embedding size, batch size, learning rate, dropout. MWE, C1 and C2 are vector representations for a multiword expression, its first component and its second componen,t respectively. Each vector is the hidden state of the network after reading the whole word. Consider other neural network architectures such as a bidirectional LSTM and sub-word level language models. α is set to 0.7, and we use cosine similarity as our similarity measure.