Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,

Characterizing Web Content , User Interest, and Search Behavior by Reading Level and Topic
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson, Paul Bennett and Susan Dumais Source: WSDM 2012 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang

outline Introduction Reading Level & Topic Profiles
Characterizing the web Applications Conclusion

Introduction web search user interest Topic Reading level

Introduction Estimate probabilistic profiles to describe users, queries or websites and analyze user behavior Topic Reading level

Reading Level & Topic Profiles
entity ：website(s), user(u), query(q) reading level(R), topic(T), reading level and topic(RT) profile ：a probability distribution of reading level and topic (RLT profile) EX a reading level and topic profile of a user：P(RT | u) a reading level and topic profile of a query：P(RT | q)

Predicting Reading Level and Topic for URL
Represent the reading difficulty of a document as a random variable Rd taking values in the range Reading Level Classifier Based on language model Topic Classifier Training using URLs in each Open Directory Project category (ODP)

Building Reading Level and Topic Profiles
Profiles based on the entity itself Given a sets of URLs associated with each entity, the joint of distribution of reading level and topic is built by aggregating the distributions of the individual URLs computed by URL-level classifiers To prevent the bias arising Choose 25 URLs to estimate the site-level or user-level profiles Use the top URLs as of the profile for the query

Building Reading Level and Topic Profiles
Profiles based on the entity relationships Circular dependency using profiles based only on the entity itself Query Surface Issue Website User Visit

Characterizing and Comparing profiles
Characterizing an Individual Entity E[R|e] ： expectation of reading level for a given entity e H(R|e) ： reading level entropy of the entity e

Characterizing a Group of Entities Build the profile of an entity group by aggregating the distributions of individual weighted centroid of the individual distributions EX：reading level profile of U Characterize the group profile can represent the diversity in terms of its members

Comparing Entities and Groups Simplest metric of comparison

Comparing Entities and Groups Similarity between the full probability distribution of two entities Kullback-Leibler(KL) Divergence Jensen-Shannon(JS) Divergence

Data Set Session Log Data Web content dataset
Contain the anonymized logs of URL visited by user Web pages visits from users who visited at least 25 pages During 10 weeks (2010.8) Web content dataset Reading level and ODP topic predictions 8 billion web document from

Characterizing web content

Characterizing websites
Topic-specific analysis

Characterizing web queries

Characterizing websites
Joint analysis of reading level and topic

Characterizing web users
Users’ Deviation from Their Own Profiles Stretch reading Future work

Application Compare expert v.s non-expert URLs

Application Predict expert websites Result

Conclusion Provide novel characterizations for websites, users and queries by combining distribution of reading level and topic. Can be used for a variety of search-related tasks and predicting the content of a URL or site is targeted at domain experts or non-experts. Use features derived from RLT profiles to predict a user’s preference for Websites in search results. .

Conclusion The divergence metrics developed in this paper can be evaluated for their effectiveness as features for personalized re- ranking. The techniques developed for expert v.s notice site classification can be applied both for recommendation and ranking purposes.

~Thank you for your listening~

Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,

Similar presentations

Presentation on theme: "Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,

Similar presentations

Presentation on theme: "Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,"— Presentation transcript:

Similar presentations

About project

Feedback