Comparison with other Models Exploring Predictive Architectures

Slides:



Advertisements
Similar presentations
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Advertisements

Fabio Massimo Zanzotto and Danilo Croce University of Rome “Tor Vergata” Roma, Italy Reading what Machines ‘Think’
Probabilistic inference in human semantic memory Mark Steyvers, Tomas L. Griffiths, and Simon Dennis 소프트컴퓨팅연구실오근현 TRENDS in Cognitive Sciences vol. 10,
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
Distributed Representations of Sentences and Documents
Yang-de Chen Tutorial: word2vec Yang-de Chen
New Bulgarian University 9th International Summer School in Cognitive Science Simplicity as a Fundamental Cognitive Principle Nick Chater Institute for.
CS365 Course Project Billion Word Imputation Guide: Prof. Amitabha Mukherjee Group 20: Aayush Mudgal [12008] Shruti Bhargava [13671]
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.
Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.
2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.
Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.
What is modularity good for? Michael S. C. Thomas, Neil A. Forrester, Fiona M. Richardson
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Efficient Estimation of Word Representations in Vector Space
Matwin Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa
Omer Levy Yoav Goldberg Ido Dagan Bar-Ilan University Israel
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Ganesh J, Soumyajit Ganguly, Manish Gupta, Vasudeva Varma, Vikram Pudi
Vector Semantics Dense Vectors.
RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji Oct13, 2015 Acknowledgement: distributional semantics slides from.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Michael.
Data Science Dimensionality Reduction WFH: Section 7.3 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall.
Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.
Latent Semantic Analysis John Martin Small Bear Technologies, Inc.
Intrinsic Subspace Evaluation of Word Embedding Representations Yadollah Yaghoobzadeh and Hinrich Schu ̈ tze Center for Information and Language Processing.
Distributed Representations for Natural Language Processing
Some PubMed search tips that you might not already know
Big data classification using neural network
Independent Components in Text
Effects of Reading on Word Learning
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Deep learning David Kauchak CS158 – Fall 2016.
Syntax-based Deep Matching of Short Texts
A neurocomputational mechanism for parsing:
Neural Machine Translation by Jointly Learning to Align and Translate
Document Classification Method with Small Training Data
Can Computer Algorithms Guess Your Age and Gender?
A Deep Learning Technical Paper Recommender System
Vector-Space (Distributional) Lexical Semantics
Efficient Estimation of Word Representation in Vector Space
Word2Vec CS246 Junghoo “John” Cho.
Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules
Distributed Representation of Words, Sentences and Paragraphs
Jun Xu Harbin Institute of Technology China
Deutero-Isaiah and Latent Semantic Analysis
Word Embeddings with Limited Memory
Learning Emoji Embeddings Using Emoji Co-Occurrence Network Graph
network of simple neuron-like computing elements
Word Embedding Word2Vec.
A neurocomputational mechanism for parsing:
Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B
Socialized Word Embeddings
Natural Language to SQL(nl2sql)
MTBI Personality Predictor using ML
Vector Representation of Text
Ali Hakimi Parizi, Paul Cook
Word embeddings (continued)
Word representations David Kauchak CS158 – Fall 2016.
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Vector Representation of Text
Presentation transcript:

Comparison with other Models Exploring Predictive Architectures Comparing Predictive and Co-occurrence Based Models of Lexical Semantics Trained on Child-Directed Speech EXPERIMENT 2 Comparison with other Models W2V: Skipgram with negative sampling, hidden layer size 200, context 12. PCA: Singular value decomposition of PMI matrix, 30 principle components, context 12. NDL: Naïve Discrimination Learner, Rescorla-Wagner learning, context 12. RVA: Sparse Random Vector Accumulator, 8000 environment size, context 12. Fatemeh Torabi Asr1, Jon A. Willits2, Michael N. Jones3 Cognitive science program, Indiana University, Bloomington Department of Psychology, University of California, Riverside Department of Psychological and Brain Sciences, Indiana University, Bloomington *** INTRODUCTION Distributional Semantic Models have been successful at predicting many semantic behaviors. The aim of this paper is to compare two major classes of these models (co-occurrence-based and prediction error-driven models) in learning semantic categories from child-directed speech. Co-occurrence models have gained more attention in cognitive research [1], while research from computational linguistics on big datasets has found more success with prediction-based models [3]. We explore differences between these types of lexical semantic models (as representatives of Hebbian vs. reinforcement learning mechanisms, respectively) within a more cognitively relevant context: the acquisition of semantic categories (e.g., apple and orange as fruit vs. soap and shampoo as bathroom items) from linguistic data available to children. EXPERIMENT 1 Exploring Predictive Architectures Word2Vec parameter setup Architecture: continuous bag-of-words (cbow), and skipgram with negative sampling [3] Context: 2 or 12 words from each side Hidden layer size: 30/50/100/200/300 *** Overall accuracy of the four DSM models with their best parameter settings METHOD Training English Child-directed Speech portion of CHILDES [2] 36,170 types and 8,323,266 tokens Only 10,000 most frequent words used Evaluation 30 categories of 1,244 high freq. nouns Pairwise similarity: shoe-sock in clothing Similarity threshold ~ balanced accuracy With abstraction Without abstraction Predictive W2V NDL Co-occurrence based PCA RVA Per-category accuracy of the four DSM models with their best parameter settings Overall classification accuracy of w2v models with different parameter settings CONCLUSION Parameter setting, and in particular, the right amount of abstraction seems to be the key to a better overall categorization performance. Predictive learning is not necessarily superior to co-occurrence-based learning. Different models excel at different categories, providing evidence for complimentary learning systems in category acquisition. Future research should look at the correspondence between models’ performances and behavioral data. REFERENCES Jones, M. N., Willits, J., Dennis, S., & Jones, M. (2015). Models of semantic memory. Oxford Handbook of Mathematical and Computational Psychology, 232-254 MacWhinney, B. (2000). The CHILDES project: The database (Vol. 2). Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013). Efficient estimation of word representations in vector space. In ICLR Per-category classification accuracy of w2v models with best parameter settings This research was funded by grant R305A140382 from the Institute of Education Sciences, US Department of Education. The first author’s conference travel was supported by IU Provost’s Women in Science & Cognitive Science Society awards.