Characterizing Web Content , User Interest, and Search Behavior by Reading Level and Topic Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson, Paul Bennett and Susan Dumais Source: WSDM 2012 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang
outline Introduction Reading Level & Topic Profiles Characterizing the web Applications Conclusion
Introduction web search user interest Topic Reading level
Introduction Estimate probabilistic profiles to describe users, queries or websites and analyze user behavior Topic Reading level
outline Introduction Reading Level & Topic Profiles Characterizing the web Applications Conclusion
Reading Level & Topic Profiles entity :website(s), user(u), query(q) reading level(R), topic(T), reading level and topic(RT) profile :a probability distribution of reading level and topic (RLT profile) EX a reading level and topic profile of a user:P(RT | u) a reading level and topic profile of a query:P(RT | q)
Predicting Reading Level and Topic for URL Represent the reading difficulty of a document as a random variable Rd taking values in the range 1 - 12. Reading Level Classifier Based on language model Topic Classifier Training using URLs in each Open Directory Project category (ODP)
Building Reading Level and Topic Profiles Profiles based on the entity itself Given a sets of URLs associated with each entity, the joint of distribution of reading level and topic is built by aggregating the distributions of the individual URLs computed by URL-level classifiers To prevent the bias arising Choose 25 URLs to estimate the site-level or user-level profiles Use the top URLs as of the profile for the query
Building Reading Level and Topic Profiles Profiles based on the entity relationships Circular dependency using profiles based only on the entity itself Query Surface Issue Website User Visit
Characterizing and Comparing profiles Characterizing an Individual Entity E[R|e] : expectation of reading level for a given entity e H(R|e) : reading level entropy of the entity e
Characterizing and Comparing profiles Characterizing a Group of Entities Build the profile of an entity group by aggregating the distributions of individual weighted centroid of the individual distributions EX:reading level profile of U Characterize the group profile can represent the diversity in terms of its members
Characterizing and Comparing profiles Comparing Entities and Groups Simplest metric of comparison
Characterizing and Comparing profiles Comparing Entities and Groups Similarity between the full probability distribution of two entities Kullback-Leibler(KL) Divergence Jensen-Shannon(JS) Divergence
outline Introduction Reading Level & Topic Profiles Characterizing the web Applications Conclusion
Data Set Session Log Data Web content dataset Contain the anonymized logs of URL visited by user Web pages visits from users who visited at least 25 pages During 10 weeks (2010.8) Web content dataset Reading level and ODP topic predictions 8 billion web document from 2011.4.18
Characterizing web content
Characterizing websites Topic-specific analysis
Characterizing web queries
Characterizing websites Joint analysis of reading level and topic
Characterizing web users Users’ Deviation from Their Own Profiles Stretch reading Future work
outline Introduction Reading Level & Topic Profiles Characterizing the web Applications Conclusion
Application Compare expert v.s non-expert URLs
Application Predict expert websites Result
outline Introduction Reading Level & Topic Profiles Characterizing the web Applications Conclusion
Conclusion Provide novel characterizations for websites, users and queries by combining distribution of reading level and topic. Can be used for a variety of search-related tasks and predicting the content of a URL or site is targeted at domain experts or non-experts. Use features derived from RLT profiles to predict a user’s preference for Websites in search results. .
Conclusion The divergence metrics developed in this paper can be evaluated for their effectiveness as features for personalized re- ranking. The techniques developed for expert v.s notice site classification can be applied both for recommendation and ranking purposes.
~Thank you for your listening~