Memoryless Document Vector Dongxu Zhang Advised by Dong Wang 2016.1.18.

Slides:



Advertisements
Similar presentations
A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)
Advertisements

International Conference on Program Comprehension (ICPC) 2008 A Traceability Technique for Specifications Aharon Abadi, Mordechai Nisenson and Yahalomit.
Information retrieval – LSI, pLSI and LDA
Modern Information Retrieval Chapter 1: Introduction
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
Farag Saad i-KNOW 2014 Graz- Austria,
INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING Crista Lopes.
Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI) Jasminka Dobša Faculty of organization and informatics,
Person Name Disambiguation by Bootstrapping Presenter: Lijie Zhang Advisor: Weining Zhang.
Data Mining and Machine Learning Lab Document Clustering via Matrix Representation Xufei Wang, Jiliang Tang and Huan Liu Arizona State University.
An Introduction to Latent Semantic Analysis
Indexing by Latent Semantic Analysis Written by Deerwester, Dumais, Furnas, Landauer, and Harshman (1990) Reviewed by Cinthia Levy.
Distributional Clustering of Words for Text Classification L. Douglas Baker Andrew Kachites McCallum SIGIR’98.
Semantic (Language) Models: Robustness, Structure & Beyond Thomas Hofmann Department of Computer Science Brown University Chief Scientist.
Indexing by Latent Semantic Analysis Scot Deerwester, Susan Dumais,George Furnas,Thomas Landauer, and Richard Harshman Presented by: Ashraf Khalil.
OCFS: Optimal Orthogonal Centroid Feature Selection for Text Categorization Jun Yan, Ning Liu, Benyu Zhang, Shuicheng Yan, Zheng Chen, and Weiguo Fan et.
Adding Semantics to Information Retrieval By Kedar Bellare 20 th April 2003.
Probabilistic Latent Semantic Analysis
Distributed Representations of Sentences and Documents
1 Collaborative Filtering: Latent Variable Model LIU Tengfei Computer Science and Engineering Department April 13, 2011.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
 Fatemeh Lashkari UNB University May 7 th  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance.
1 A Discriminative Approach to Topic- Based Citation Recommendation Jie Tang and Jing Zhang Presented by Pei Li Knowledge Engineering Group, Dept. of Computer.
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
CONCLUSION & FUTURE WORK Normally, users perform triage tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Knowledge management system based on Latent Semantic Analysis.
Which of the two appears simple to you? 1 2.
Qual Presentation Daniel Khashabi 1. Outline  My own line of research  Papers:  Fast Dropout training, ICML, 2013  Distributional Semantics Beyond.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Semantic Embedding Space for Zero ­ Shot Action Recognition Xun XuTimothy HospedalesShaogang GongAuthors: Computer Vision Group Queen Mary University of.
Deformable Part Model Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 11 st, 2013.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
MODEL ADAPTATION FOR PERSONALIZED OPINION ANALYSIS MOHAMMAD AL BONI KEIRA ZHOU.
CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Dual Transfer Learning Mingsheng Long 1,2, Jianmin Wang 2, Guiguang Ding 2 Wei Cheng, Xiang Zhang, and Wei Wang 1 Department of Computer Science and Technology.
The 1st Global Tech Mining Conference, Atlanta, USA Analyzing Technology Evolution of Graphene Sensor Based on Patent Documents Fang Shu 1, Hu Zhengyin.
1 CS 430: Information Discovery Lecture 11 Latent Semantic Indexing.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences Lu Bai,
Web-Mining Agents Topic Analysis: pLSI and LDA
Multilingual Search Shibamouli Lahiri
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Tao Liu, Zheng Chen, Benyu Zhang, Wei-ying Ma, Gongyi Wu 2004.ICDM. Improving Text.
Ganesh J, Soumyajit Ganguly, Manish Gupta, Vasudeva Varma, Vikram Pudi
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Word Sense Disambiguation Algorithms in Hindi
Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta
Study on Deep Learning in Speaker Recognition Lantian Li CSLT / RIIT Tsinghua University May 26, 2016.
Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.
Latent Semantic Analysis John Martin Small Bear Technologies, Inc.
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Sentiment analysis algorithms and applications: A survey
Information Management for Digital Humanities and Diplomatics
Reading Notes Wang Ning Lab of Database and Information Systems
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
A Deep Learning Technical Paper Recommender System
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
convolutional neural networkS
Distributed Representation of Words, Sentences and Paragraphs
convolutional neural networkS
Word embeddings based mapping
Word embeddings based mapping
Classification Boundaries
Michal Rosen-Zvi University of California, Irvine
prerequisite chain learning and the introduction of LectureBank
Latent semantic space: Iterative scaling improves precision of inter-document similarity measurement Rie Kubota Ando. Latent semantic space: Iterative.
Latent Semantic Analysis
Presentation transcript:

Memoryless Document Vector Dongxu Zhang Advised by Dong Wang

Introduction What is “Memory”? Why do we want “Memoryless”? How to achieve that goal?

Introduction

Why “Memoryless”?

Introduction How to achieve? Word vector pooling [4] Shortcoming: Does not involve pooling in model training, which leads to mismatch between word vector learning and document vector producing. Memoryless Document Vector

Experiments Datasets Webkb, reuters 8, 20 newsgroup SST, IMDB Setup Topic classification tasks with 50 and 200 dimensions. Sentiment classification tasks with 100 and 400 dimensions. Logistic regression classifier.

Results

Conclusion Raise up the memory issue of conventional document representation methods and then propose a simple yet effective method for document representation to nail it. Reference [1] S.C. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas, and R.A. Harshman Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391–407. [2] Hofmann, Thomas. "Probabilistic latent semantic indexing." Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, [3] Quoc Le and Tomas Mikolov Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1188–1196. [4] Chao Xing, Dong Wang, Xuewei Zhang, and Chao Liu Document classification based on i-vector distributions. In APSIPA 2014.

Thank you.