Ali Hakimi Parizi, Paul Cook

Slides:



Advertisements
Similar presentations
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Advertisements

Deep Belief Networks for Spam Filtering
Deep Learning Neural Network with Memory (1)
Analysis of a Neural Language Model Eric Doi CS 152: Neural Networks Harvey Mudd College.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
C - IT Acumens. COMIT Acumens. COM. To demonstrate the use of Neural Networks in the field of Character and Pattern Recognition by simulating a neural.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
Convectional Neural Networks
A Simple Approach for Author Profiling in MapReduce
Attention Model in NLP Jichuan ZENG.
R-NET: Machine Reading Comprehension With Self-Matching Networks
Convolutional Sequence to Sequence Learning
Unsupervised Learning of Video Representations using LSTMs
CNN-RNN: A Unified Framework for Multi-label Image Classification
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
RNNs: An example applied to the prediction task
Convolutional Neural Network
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
End-To-End Memory Networks
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Deep Learning for Bacteria Event Identification
Deep Learning Amin Sobhani.
Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel
Recursive Neural Networks
Recurrent Neural Networks for Natural Language Processing
WSRec: A Collaborative Filtering Based Web Service Recommender System
Neural Machine Translation by Jointly Learning to Align and Translate
Intro to NLP and Deep Learning
ICS 491 Big Data Analytics Fall 2017 Deep Learning
Intelligent Information System Lab
Intro to NLP and Deep Learning
Vector-Space (Distributional) Lexical Semantics
RNNs: Going Beyond the SRN in Language Prediction
Attention-based Caption Description Mun Jonghwan.
A critical review of RNN for sequence learning Zachary C
A First Look at Music Composition using LSTM Recurrent Neural Networks
Final Presentation: Neural Network Doc Summarization
klinické neurofyziologie
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Word Embedding Word2Vec.
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
The Big Health Data–Intelligent Machine Paradox
Neural Networks Geoff Hulten.
Other Classification Models: Recurrent Neural Network (RNN)
Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B
Visualizing and Understanding Convolutional Networks
Recurrent Encoder-Decoder Networks for Time-Varying Dense Predictions
Machine Translation(MT)
Learning linguistic structure with simple recurrent neural networks
RNNs: Going Beyond the SRN in Language Prediction
Graph Neural Networks Amog Kamsetty January 30, 2019.
Neural networks (3) Regularization Autoencoder
Word embeddings (continued)
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Attention for translation
RNNs and Sequence to sequence models
Neural Joint Model for Transition-based Chinese Syntactic Analysis
CSE 291G : Deep Learning for Sequences
Automatic Handwriting Generation
Presented by: Anurag Paul
Building Dynamic Knowledge Graphs From Text Using Machine Reading Comprehension Rajarshi Das, Tsendsuren Munkhdalai, Xingdi Yuan, Adam Trischler, Andrew.
Using Clustering to Make Prediction Intervals For Neural Networks
Topic: Semantic Text Mining
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Question Answering System
Baseline Model CSV Files Pandas DataFrame Sentence Lists
Bidirectional LSTM-CRF Models for Sequence Tagging
Neural Machine Translation by Jointly Learning to Align and Translate
LSTM Practical Exercise
Presentation transcript:

Do Character-level Language Models Capture Knowledge of Multiword Expression Compositionality? Ali Hakimi Parizi, Paul Cook Faculty of Computer Science, University of New Brunswick ahakimi@unb.ca, paul.cook@unb.ca Multiword Expressions Training Data and Evaluation Dataset Multiword expressions (MWEs) are word combinations that display some form of idiomaticity. For all experiments, we train our models over raw text Wikipedia corpora for either English or German Compositionality refers to the degree to which the meaning of a MWE can be predicted by combining the meanings of its components. 1 Preprocessing steps: Use the WP2TXT toolbox to eliminate XML and HTML tags and hyperlinks. Select a portion of data randomly. Remove all other characters which are not in the language. End User End + User Compositional Its meaning is related to “game” to some degree Game Plan Semi Compositional We evaluate our methods over two datasets: English noun compounds, “ENCs”: contains 90 noun compounds, annotated on a continuous [0,5] scale. The “GNC” dataset consists of 244 German noun compounds, annotated on a continuous [1,7] scale. We cannot infer its meaning from its components Couch Potato Non- Compositional Character Level Language Models Results The probability of the next character is determined based on previous characters.   Pearson’s r Model Comp1 Comp2 ENC RNN(1%)  -0.01 p=0.92 -0.11 p=0.28  GRU(1%)   0.25 p=0.016 0.20 p=0.04    0.23 p=0.02 0.18 p=0.08   0.27 p=0.011 0.27 p=0.009  Word-Embedding English 0.717  0.736  GNC German LSTM (5%)   0.03 p=0.618   0.03 p=0.590 Word Embedding German 0.371 0.349 Can estimate the probability of out-of-vocabulary words, unlike word-level language models. Target chars “e” “l” “l” “o” 1 2.2 -3.0 4.1 0.5 0.3 -1 1.2 0.1 0.5 1.9 -1.1 0.2 -1.5 -0.1 2.2 Output layer W-hy W-hh Hidden layer 0.3 -0.1 9 1 0.3 0.1 1 -0.5 -0.3 -0.3 0.7 1 W-hx 1 1 1 1 Input layer Input chars “h” “e” “l” “l” Conclusions and Future Work http://karpathy.github.io/2015/05/21/rnn-effectiveness Character-level language models capture (some) knowledge about MWE compositionality. Compositionality Equations Word embedding models achieve higher correlation, but can't represent out-of-vocabulary or low-frequency words. To calculate the compositionality, we use two different methods:   Future work: Explore parameter settings of character-level language models: e.g., embedding size, batch size, learning rate, dropout. MWE, C1 and C2 are vector representations for a multiword expression, its first component and its second componen,t respectively. Each vector is the hidden state of the network after reading the whole word. Consider other neural network architectures such as a bidirectional LSTM and sub-word level language models. α is set to 0.7, and we use cosine similarity as our similarity measure.