Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
Struggling or Exploring? Disambiguating Long Search Sessions
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic Jin Young Kim*, Kevyn Collins-Thompson, Paul Bennett and Susan.
Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,
Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Date: 2012/8/13 Source: Luca Maria Aiello. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Behavior-driven Clustering of Queries into Topics.
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Toward Whole-Session Relevance: Exploring Intrinsic Diversity in Web Search Date: 2014/5/20 Author: Karthik Raman, Paul N. Bennett, Kevyn Collins-Thompson.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date:
Fast Webpage classification using URL features Authors: Min-Yen Kan Hoang and Oanh Nguyen Thi Conference: ICIKM 2005 Reporter: Yi-Ren Yeh.
On Sparsity and Drift for Effective Real- time Filtering in Microblogs Date : 2014/05/13 Source : CIKM’13 Advisor : Prof. Jia-Ling, Koh Speaker : Yi-Hsuan.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Topics and Transitions: Investigation of User Search Behavior Xuehua Shen, Susan Dumais, Eric Horvitz.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.
A Statistical Comparison of Tag and Query Logs Mark J. Carman, Robert Gwadera, Fabio Crestani, and Mark Baillie SIGIR 2009 June 4, 2010 Hyunwoo Kim.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
 Who Uses Web Search for What? And How?. Contribution  Combine behavioral observation and demographic features of users  Provide important insight.
Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.
Post-Ranking query suggestion by diversifying search Chao Wang.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Concept-based Short Text Classification and Ranking
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet, Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker:
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS Date: 2012/11/22 Author: Anna Shtok, Gideon Dror, Yoelle Maarek, Idan Szpektor Source:
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
1 Context-Aware Ranking in Web Search (SIGIR 10’) Biao Xiang, Daxin Jiang, Jian Pei, Xiaohui Sun, Enhong Chen, Hang Li 2010/10/26.
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
QUERY-PERFORMANCE PREDICTION: SETTING THE EXPECTATIONS STRAIGHT Date : 2014/08/18 Author : Fiana Raiber, Oren Kurland Source : SIGIR’14 Advisor : Jia-ling.
Customized of Social Media Contents using Focused Topic Hierarchy
Click Through Rate Prediction for Local Search Results
Improving Search Relevance for Short Queries in Community Question Answering Date: 2014/09/25 Author : Haocheng Wu, Wei Wu, Ming Zhou, Enhong Chen, Lei.
Topics and Transitions: Investigation of User Search Behavior
Source: Procedia Computer Science(2015)70:
A Large Scale Prediction Engine for App Install Clicks and Conversions
Measuring the Latency of Depression Detection in Social Media
Identifying Decision Makers from Professional Social Networks
Struggling and Success in Web Search
Date : 2013/1/10 Author : Lanbo Zhang, Yi Zhang, Yunfei Chen
Probabilistic Visitor Stitching on Cross-Device Web Logs
Wiki3C: Exploiting Wikipedia for Context-aware Concept Categorization
Deep Interest Network for Click-Through Rate Prediction
Preference Based Evaluation Measures for Novelty and Diversity
Presentation transcript:

Characterizing Web Content , User Interest, and Search Behavior by Reading Level and Topic Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson, Paul Bennett and Susan Dumais Source: WSDM 2012 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang

outline Introduction Reading Level & Topic Profiles Characterizing the web Applications Conclusion

Introduction web search user interest Topic Reading level

Introduction Estimate probabilistic profiles to describe users, queries or websites and analyze user behavior Topic Reading level

outline Introduction Reading Level & Topic Profiles Characterizing the web Applications Conclusion

Reading Level & Topic Profiles entity :website(s), user(u), query(q) reading level(R), topic(T), reading level and topic(RT) profile :a probability distribution of reading level and topic (RLT profile) EX a reading level and topic profile of a user:P(RT | u) a reading level and topic profile of a query:P(RT | q)

Predicting Reading Level and Topic for URL Represent the reading difficulty of a document as a random variable Rd taking values in the range 1 - 12. Reading Level Classifier Based on language model Topic Classifier Training using URLs in each Open Directory Project category (ODP)

Building Reading Level and Topic Profiles Profiles based on the entity itself Given a sets of URLs associated with each entity, the joint of distribution of reading level and topic is built by aggregating the distributions of the individual URLs computed by URL-level classifiers To prevent the bias arising Choose 25 URLs to estimate the site-level or user-level profiles Use the top URLs as of the profile for the query

Building Reading Level and Topic Profiles Profiles based on the entity relationships Circular dependency using profiles based only on the entity itself Query Surface Issue Website User Visit

Characterizing and Comparing profiles Characterizing an Individual Entity E[R|e] : expectation of reading level for a given entity e H(R|e) : reading level entropy of the entity e

Characterizing and Comparing profiles Characterizing a Group of Entities Build the profile of an entity group by aggregating the distributions of individual weighted centroid of the individual distributions EX:reading level profile of U Characterize the group profile can represent the diversity in terms of its members

Characterizing and Comparing profiles Comparing Entities and Groups Simplest metric of comparison

Characterizing and Comparing profiles Comparing Entities and Groups Similarity between the full probability distribution of two entities Kullback-Leibler(KL) Divergence Jensen-Shannon(JS) Divergence

outline Introduction Reading Level & Topic Profiles Characterizing the web Applications Conclusion

Data Set Session Log Data Web content dataset Contain the anonymized logs of URL visited by user Web pages visits from users who visited at least 25 pages During 10 weeks (2010.8) Web content dataset Reading level and ODP topic predictions 8 billion web document from 2011.4.18

Characterizing web content

Characterizing websites Topic-specific analysis

Characterizing web queries

Characterizing websites Joint analysis of reading level and topic

Characterizing web users Users’ Deviation from Their Own Profiles Stretch reading Future work

outline Introduction Reading Level & Topic Profiles Characterizing the web Applications Conclusion

Application Compare expert v.s non-expert URLs

Application Predict expert websites Result

outline Introduction Reading Level & Topic Profiles Characterizing the web Applications Conclusion

Conclusion Provide novel characterizations for websites, users and queries by combining distribution of reading level and topic. Can be used for a variety of search-related tasks and predicting the content of a URL or site is targeted at domain experts or non-experts. Use features derived from RLT profiles to predict a user’s preference for Websites in search results. .

Conclusion The divergence metrics developed in this paper can be evaluated for their effectiveness as features for personalized re- ranking. The techniques developed for expert v.s notice site classification can be applied both for recommendation and ranking purposes.

~Thank you for your listening~