KDD 2011 Summary of Text Mining sessions Hongbo Deng.

Slides:



Advertisements
Similar presentations
Recommender System A Brief Survey.
Advertisements

Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.
ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
Introduction to ReviewMiner Hongning Wang Department of Computer Science University of Illinois at Urbana-Champaign
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
Information retrieval – LSI, pLSI and LDA
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
Optimizing search engines using clickthrough data
One Theme in All Views: Modeling Consensus Topics in Multiple Contexts Jian Tang 1, Ming Zhang 1, Qiaozhu Mei 2 1 School of EECS, Peking University 2 School.
Developing and Evaluating a Query Recommendation Feature to Assist Users with Online Information Seeking & Retrieval With graduate students: Karl Gyllstrom,
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Fast Query Execution for Retrieval Models based on Path Constrained Random Walks Ni Lao, William W. Cohen Carnegie Mellon University
Latent Aspect Rating Analysis without Aspect Keyword Supervision Hongning Wang, Yue Lu, ChengXiang Zhai Department of.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Scalable Text Mining with Sparse Generative Models
Best Practices Using Enterprise Search Technology Aurelien Dubot Consultant – Media and Entertainment, Fast Search & Transfer (FAST) British Computer Society.
Topic Modeling with Network Regularization Qiaozhu Mei, Deng Cai, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
Modeling Scientific Impact with Topical Influence Regression James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine.
MINING MULTI-FACETED OVERVIEWS OF ARBITRARY TOPICS IN A TEXT COLLECTION Xu Ling, Qiaozhu Mei, ChengXiang Zhai, Bruce Schatz Presented by: Qiaozhu Mei,
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
Personal Information Management Vitor R. Carvalho : Personalized Information Retrieval Carnegie Mellon University February 8 th 2005.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Search Engine Architecture
Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Sharad Oberoi Carnegie Mellon University DesignWebs: Learning in Engineering Project Teams.
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Topic Modeling using Latent Dirichlet Allocation
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Latent Dirichlet Allocation
Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, Kevin.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
Automatic Labeling of Multinomial Topic Models
A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.
A code-centric cluster-based approach for searching online support forums for programmers Christopher Scaffidi, Christopher Chambers, Sheela Surisetty.
Relevance Feedback Hongning Wang
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
A Study of Poisson Query Generation Model for Information Retrieval
Collaborative Deep Learning for Recommender Systems
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Recommendation in Scholarly Big Data
The topic discovery models
Search User Behavior: Expanding The Web Search Frontier
Statistical Learning Methods for Natural Language Processing on the Internet 徐丹云.
Search Engine Architecture
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
The topic discovery models
Relevance Feedback Hongning Wang
Unsupervised Extraction of Template Structure in Web Search Queries www 2012 – Session: search Qingxia Liu.
Matching Words with Pictures
The topic discovery models
Learning Literature Search Models from Citation Behavior
Michal Rosen-Zvi University of California, Irvine
GhostLink: Latent Network Inference for Influence-aware Recommendation
Presentation transcript:

KDD 2011 Summary of Text Mining sessions Hongbo Deng

3 Text Mining Sessions, 9 Papers Beyond Keyword Search: Discovering Relevant Scientific Literature – Khalid El-Arini (Carnegie Mellon University), Carlos Guestrin Collaborative Topic Modeling for Recommending Scientific Articles – Chong Wang (Princeton University), David M. Blei Partially Labeled Topic Models for Interpretable Text Mining – Daniel Ramage (Stanford University), Christopher D. Manning, Susan Dumais Refining Causality: Who Copied from Whom? – Tristan Snowsill (University of Bristol), Nick Fyson, Tijl De Bie, Nello Cristianini Conditional Topical Coding: An Efficient Topic Model Conditioned on Rich Features – Jun Zhu (Carnegie Mellon University), Ni Lao, Ning Chen, Eric P. Xing Tracking Trends: Incorporating Term Volume into Temporal Topic Models – Liangjie Hong (Lehigh University), Dawei Yin, Jian Guo, Brian D. Davison Latent Topic Feedback for Information Retrieval – David Andrzejewski (Lawrence Livermore National Laboratory), David Buttler Localized Factor Models for Multi-Context Recommendation – Deepak Agarwal (Yahoo! Labs), Bee-Chung Chen, Bo Long Latent Aspect Rating Analysis without Aspect Keyword Supervision – Hongning Wang (University of Illinois at Urbana-Champaign), Yue Lu, ChengXiang Zhai Topic Model Recommendation Topic models are widely used in other sessions, e.g., user modeling, query log analysis, ad …

Collaborative Topic Modeling for Recommending Scientific Articles Problem: – To recommend scientific articles to users of an online community Input: – Userss libraries from CiteULike – The content of the articles Output: – Find articles relevant to their interests Three traditional ways – Follow citations in other articles they are interested in – Keyword search – Using recommendation methods (CiteULike) Several criteria – Recommending older articles is important – Recommending new articles is also important – Exploratory variables can be valuable in online scientific archives and communities Collaborative Filtering + Topic Modeling

Collaborative Topic Modeling for Recommending Scientific Articles Two types of data – The other users libraries [collaborative filtering] Like latent factor models, use information from other users libraries For a particular user, it can recommend articles from other users who liked similar articles Latent factor models work well for recommending known articles, but cannot generalize to previously unseen articles – The content of the articles [topic modeling] To generalize to unseen articles, the authors uses topic modeling Can recommend articles that have similar content to other articles that a user likes

Collaborative Topic Modeling for Recommending Scientific Articles Intuition: Combine collaborative filtering and probabilistic topic modeling for recommending scientific articles The key property in CTR lies in how the item latent vector $v_j$ is generated We assume the item latent vector $v_j$ is close to topic proportion $\theta_j$, but could diverge from it if it has to

Latent Topic Feedback for Information Retrieval Problem: a user navigation an unfamiliar corpus of text documents where document metadata is limited or unavailable Intuition: To augment keyword search with user feedback on latent topics Key point: A new method for obtaining and exploiting user feedback at the latent topic level

Latent Topic Feedback for Information Retrieval Method: – To learn latent topics from the corpus and construct meaningful representations of these topics – At query time, decide which latent topics are potentially relevant and present the appropriate topic representations alongside keyword search results – When a user selects a latent topic, the vocabulary terms most strongly associated with that topic are then used to augment the original query

Beyond Keyword Search: Discovering Relevant Scientific Literature Problem: As the number of publications has grown, difficult for scientists to find relevant prior work for their particular research Input: a set of papers as a query Output: a set of highly relevant articles Method: – Modeling scientific influence between documents: optimize an objective function – Select a set of papers A with maximum influence to/from the query set Q – Incorporate trust and personalization: as scientists trust some authors more than others, results can be personalized to individual preferences

Partially Labeled Topic Models for Interpretable Text Mining Problem: make use of the unsupervised learning of topic modeling, with constrains that align some learned topics with a human-provided label Input: a collection of documents, partial labels Graphical model for PLDA Observed: each documents words w and labels Λ per-doc label distribution per-topic word distribution per-doc-label topic distribution Output: θ, Φ, ψ Extend the generative story of LDA to incorporate labels, and of Labeled LDA to incorporate per-label latent topics a multinomial distribution over words $V$ that tend to co-occur with each other and some label

Latent Aspect Rating Analysis without Aspect Keyword Supervision Reviews + overall ratingsAspect segments location:1 amazing:1 walk:1 anywhere: nice:1 accommodating:1 smile:1 friendliness:1 attentiveness:1 Term WeightsAspect Rating room:1 nicely:1 appointed:1 comfortable: Aspect Segmentation Latent Rating Regression Aspect Weight Gap ???

Latent Aspect Rating Analysis without Aspect Keyword Supervision LARAM Jointly model aspects and aspect rating/weights LRR (Wang et al., 2010) Segmented aspects from previous step

Some Observations Text mining is very hot Topic modeling has been widely used in text analysis or many other applications, e.g., query understanding, advertisement … – Combine topic modeling with other models, e.g., collaborative filtering – Integrate more information into topic modeling, e.g., labeled and unlabeled information (partially labeled) – Two-step solution -> unified way

Thanks!