Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: 2010.08.12 From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological.

Slides:



Advertisements
Similar presentations
Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Advertisements

Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Robust Extraction of Named Entity Including Unfamiliar Word Masatoshi Tsuchiya, Shinya Hida & Seiichi Nakagawa Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi.
Deep Learning in NLP Word representation and how to use it for Parsing
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Neural Network Homework Report: Clustering of the Self-Organizing Map Professor : Hahn-Ming Lee Student : Hsin-Chung Chen M IEEE TRANSACTIONS ON.
Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball
Introduction to Data Mining Engineering Group in ACL.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Unsupervised Domain Adaptation: From Practice to Theory John Blitzer TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date:
Deep Learning for Speech and Language Yoshua Bengio, U. Montreal NIPS’2009 Workshop on Deep Learning for Speech Recognition and Related Applications December.
Text mining.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
The Necessity of Combining Adaptation Methods Cognitive Computation Group, University of Illinois Experimental Results Title Ming-Wei Chang, Michael Connor.
Information Retrieval by means of Vector Space Model of Document Representation and Cascade Neural Networks Igor Mokriš, Lenka Skovajsová Institute of.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Text mining. The Standard Data Mining process Text Mining Machine learning on text data Text Data mining Text analysis Part of Web mining Typical tasks.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
1 Helping Editors Choose Better Seed Sets for Entity Set Expansion Vishnu Vyas, Patrick Pantel, Eric Crestan CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/05/10.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Word representations: A simple and general method for semi-supervised learning Joseph Turian with Lev Ratinov and Yoshua Bengio Goodies:
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining Farial Shahnaz.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Latent Dirichlet Allocation
Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Intelligent Database Systems Lab Presenter : JIAN-REN CHEN Authors : Wen Zhang, Taketoshi Yoshida, Xijin Tang 2011.ESWA A comparative study of TF*IDF,
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Wei Xu,
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Tao Liu, Zheng Chen, Benyu Zhang, Wei-ying Ma, Gongyi Wu 2004.ICDM. Improving Text.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Second Language Learning From News Websites Word Sense Disambiguation using Word Embeddings.
Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Understanding unstructured texts via Latent Dirichlet Allocation Raphael Cohen DSaaS, EMC IT June 2015.
From Frequency to Meaning: Vector Space Models of Semantics
Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.
Unsupervised Sparse Vector Densification for Short Text Similarity
Distributed Representations for Natural Language Processing
Cross-lingual Dataless Classification for Many Languages
Recent Trends in Text Mining
Sentiment analysis algorithms and applications: A survey
Cross-lingual Dataless Classification for Many Languages
On Dataless Hierarchical Text Classification
Deep learning and applications to Natural language processing
Vector-Space (Distributional) Lexical Semantics
Unsupervised Learning and Autoencoders
Efficient Estimation of Word Representation in Vector Space
Distributed Representation of Words, Sentences and Paragraphs
Michal Rosen-Zvi University of California, Irvine
Source: Pattern Recognition Vol. 38, May, 2005, pp
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Introduction to Sentiment Analysis
Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee
Presentation transcript:

Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological Review 2007

Outlines Introduction Word representations Experimental Comparisons (ACL 2010) Chunking, Named Entity Recognition Conclusions 2

Introduction A word representation A mathematical object associated with each word, often a vector. Examples dog: animal, pet, four-leg,... cat: animal, pet, four-leg,... bird: animal, two-leg, fly,... Questions How do we build this matrix? Are there other representations except matrix? Vocabulary 3

Word Representations Categorizing word representations by sources From human Feature list, semantic networks, ontology (WordNet, SUMO, FrameNet,...) From texts Frequency-based Distributional Representation, Latent Semantic Indexing Model-based Clustering (Brown clustering), Latent Dirichlet Allocation, embedding (Neural Language Model, Hierarchical Log-Bilinear model) Operations-based Random indexing (quantum informatics), holographic lexicon 4

Word Representations Some important considerations Dimension size Distributional representations: > 5000 HLBL, random indexing, LSI: <500 Format Vector Network Encoded knowledge/relations/information World knowledge: ontology Word semantics Word similarity/distance/proximity Most important question in word representations What is meaning? 5

Word Representations – Distributional Representations From texts, frequency-based Row-column Token-event Event= Same document, Window-size within 5 words 6

Word Representations – Distributional Representations The event can be some patterns. A door is a part of a house. Token  door:house Event  is_a_part_of Some procedures applying on the matrix [From Frequency to Meaning: Vector Space Model of Semantics, 2010, JAIR] Preprocess of texts (tokenization, annotation...) Normalization/weighting Smoothing of the matrix (using SVD) Latent meaning, noise reduction, high-order co-occurrence, sparsity reduction 7

Word Representations – Brown Clustering The Brown algorithm is a hierarchical clustering algorithm which clusters words to maximize the mutual information of bigrams (Brown et al., 1992). is a class-based bigram language model. runs in time O(V·K 2 ), where V is the size of the vocabulary and K is the number of clusters. 8

Word Representations – Embedding Collobert and Weston embedding (2008) Neural language model Discriminative and non-probabilistic Hierarchical log-bilinear embedding (HLBL) (2009) Neural language model Distributed representation 9

Experimental Comparisons (ACL2010) Chunking CoNLL-2000 shared task Linear CRF chunker (Sha and Pereira 2003) Data From Penn Treebank, 7936 sentences(training), 1ooo sentences (development) Name Entity Recognitions The regularized averaged perceptron model (Ratinov and Roth 2009) CoNLL03 shared task 204k words for training, 51k words for development, 46K words for testing Evaluating out-of-domain dataset: MUC7 formal run (59K words) 10

Experimental Comparisons— Features ChunkingNER 11

Experimental Comparisons— Results 12

Experimental Comparisons— Results 13

Experimental Comparisons— Results 14

Conclusions Word features can be learned in advance in an unsupervised, task- inspecific, and model-agnostic manner. The disadvantage is that accuracy might not be as high as a semi- supervised method that includes task-specific information and that jointly learns the supervised and unsupervised tasks. (Ando & Zhang, 2005-ASO; Suzuki & Isozaki, 2008; Suzuki et al., 2009) Future work is inducing phrase representations. 15

Q&A 16