Authors : Ramon F. Astudillo, Silvio Amir, Wang Lin, Mario Silva, Isabel Trancoso Learning Word Representations from Scarce Data By: Aadil Hayat (13002)

Slides:

Advertisements

Similar presentations

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California

Advertisements

On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.

Neural networks Introduction Fitting neural networks

Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department.

AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.

Lecture 14 – Neural Networks

An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.

Deep Belief Networks for Spam Filtering

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Radford M. Neal and Jianguo Zhang the winners.

Neural Networks Chapter Feed-Forward Neural Networks.

Prediction of a nonlinear time series with feedforward neural networks Mats Nikus Process Control Laboratory.

Distributed Representations of Sentences and Documents

Radial Basis Function Networks

Neuro-fuzzy Systems Xinbo Gao School of Electronic Engineering Xidian University 2004,10.

C. Benatti, 3/15/2012, Slide 1 GA/ICA Workshop Carla Benatti 3/15/2012.

Curriculum Learning Yoshua Bengio, U. Montreal Jérôme Louradour, A2iA

Chapter 9 Neural Network.

A shallow introduction to Deep Learning

Artificial Intelligence Techniques Multilayer Perceptrons.

Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.

Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.

LeCun, Bengio, And Hinton doi: /nature14539

CSC321: Lecture 7:Ways to prevent overfitting

Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.

Introduction to Deep Learning

CS 2750: Machine Learning The Bias-Variance Tradeoff Prof. Adriana Kovashka University of Pittsburgh January 13, 2016.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山助教: 熊信寬

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Jingyuan Zhang 1, Bokai Cao 1, Sihong Xie 1, Chun-Ta Lu 1, Philip S. Yu 1,2, Ann B. Ragin 3 Identifying Connectivity Patterns for Brain Diseases via Multi-side-view.

Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.

Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.

A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.

Neural network based hybrid computing model for wind speed prediction K. Gnana Sheela, S.N. Deepa Neurocomputing Volume 122, 25 December 2013, Pages 425–429.

Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.

Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Evolvable dialogue systems

Big data classification using neural network

Convolutional Neural Network

Deep Learning Amin Sobhani.

Matt Gormley Lecture 16 October 24, 2016

کاربرد نگاشت با حفظ تنکی در شناسایی چهره

Are End-to-end Systems the Ultimate Solutions for NLP?

Unsupervised Learning and Autoencoders

Efficient Estimation of Word Representation in Vector Space

Generalization ..

CSC 578 Neural Networks and Deep Learning

Convolutional Neural Networks for sentence classification

Artificial Intelligence Methods

Deep Learning Hierarchical Representations for Image Steganalysis

INF 5860 Machine learning for image classification

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

Chap 8: Adaptive Networks

Supervised vs. unsupervised Learning

Biased Random Walk based Social Regularization for Word Embeddings

Socialized Word Embeddings

Unsupervised Pretraining for Semantic Parsing

View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.

Twitter Stance Detection with Bidirectional Conditional Encoding

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Deep Interest Network for Click-Through Rate Prediction

An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,

Lecture 09: Introduction Image Recognition using Neural Networks

Introduction to Neural Networks

CSC 578 Neural Networks and Deep Learning

Artificial Neural Network learning

Presentation transcript:

Authors : Ramon F. Astudillo, Silvio Amir, Wang Lin, Mario Silva, Isabel Trancoso Learning Word Representations from Scarce Data By: Aadil Hayat (13002)

Learning Word Representations from Scarce and Noisy Data with Embedding Sub-spaces 1Introduction 2Theory 3Results

Introduction Unsupervised word embedding for scarce and noisy data 1

Abstract  A technique to adapt unsupervised word embeddings to specific applications, when only small and noisy labeled datasets are available.  Current methods use pre-trained embeddings to initialize model parameters, and then use the labeled data to tailor them for the intended task.  But this approach is prone to overfitting when the training is performed with scarce and noisy data.  To overcome this issue here the supervised data to find an embedding subspace that fits the task complexity.  All the word representations are adapted through a projection into this task-specific sub-space.  This approach was recently used in the SemEval 2015 Twitter sentiment analysis challenge, attaining state-of-the-art results.

2 Theory

Unsupervised Structured Skip-Gram

Adapting Embedding with Sub-space Projections  Word embeddings are useful unsupervised techniques to attain initial model values or features prior to supervised learning. These models can be then retrained using the available labeled data.  Embedding provide a compact real valued representations of each word in a vocabulary.  Even then the total number of parameters in the model can be rather high. Very often a small amount of supervised data is available which can lead to severe overfitting.  Even if regularization is used to reduce the overfitting only a reduced subset of words will actually be present in the labeled dataset. Words not seen during training will never get their embeddings updated.  In the following slides, simple solution to this problem is explained.

Embedding Sub-space

Non-Linear Sub-space Embedding Model  The concept of embedding sub-space can be applied to log-linear classifiers or deep learning architecture that uses embeddings.  The NLSE can be interpreted as a simple feed-forward neural network model with one single hidden layer utilizing the embedding sub-space approach.

3 Results

Twitter Sentiment Analysis Average F-measure on the SemEval test sets varying with embedding sub-space size s. Sub-space size 0 used to denote the baseline (log-linear model)

Comparison of two baselines with two variations Performance of state-of-the-art systems for Twitter sentiment prediction

Thank You