Download presentation
Presentation is loading. Please wait.
Published byJarkko Saaristo Modified over 5 years ago
1
Characterizing Web Content , User Interest, and Search Behavior by Reading Level and Topic
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson, Paul Bennett and Susan Dumais Source: WSDM 2012 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang
2
outline Introduction Reading Level & Topic Profiles
Characterizing the web Applications Conclusion
3
Introduction web search user interest Topic Reading level
4
Introduction Estimate probabilistic profiles to describe users, queries or websites and analyze user behavior Topic Reading level
5
outline Introduction Reading Level & Topic Profiles
Characterizing the web Applications Conclusion
6
Reading Level & Topic Profiles
entity :website(s), user(u), query(q) reading level(R), topic(T), reading level and topic(RT) profile :a probability distribution of reading level and topic (RLT profile) EX a reading level and topic profile of a user:P(RT | u) a reading level and topic profile of a query:P(RT | q)
7
Predicting Reading Level and Topic for URL
Represent the reading difficulty of a document as a random variable Rd taking values in the range Reading Level Classifier Based on language model Topic Classifier Training using URLs in each Open Directory Project category (ODP)
8
Building Reading Level and Topic Profiles
Profiles based on the entity itself Given a sets of URLs associated with each entity, the joint of distribution of reading level and topic is built by aggregating the distributions of the individual URLs computed by URL-level classifiers To prevent the bias arising Choose 25 URLs to estimate the site-level or user-level profiles Use the top URLs as of the profile for the query
9
Building Reading Level and Topic Profiles
Profiles based on the entity relationships Circular dependency using profiles based only on the entity itself Query Surface Issue Website User Visit
10
Characterizing and Comparing profiles
Characterizing an Individual Entity E[R|e] : expectation of reading level for a given entity e H(R|e) : reading level entropy of the entity e
11
Characterizing and Comparing profiles
Characterizing a Group of Entities Build the profile of an entity group by aggregating the distributions of individual weighted centroid of the individual distributions EX:reading level profile of U Characterize the group profile can represent the diversity in terms of its members
12
Characterizing and Comparing profiles
Comparing Entities and Groups Simplest metric of comparison
13
Characterizing and Comparing profiles
Comparing Entities and Groups Similarity between the full probability distribution of two entities Kullback-Leibler(KL) Divergence Jensen-Shannon(JS) Divergence
14
outline Introduction Reading Level & Topic Profiles
Characterizing the web Applications Conclusion
15
Data Set Session Log Data Web content dataset
Contain the anonymized logs of URL visited by user Web pages visits from users who visited at least 25 pages During 10 weeks (2010.8) Web content dataset Reading level and ODP topic predictions 8 billion web document from
16
Characterizing web content
17
Characterizing websites
Topic-specific analysis
18
Characterizing web queries
19
Characterizing websites
Joint analysis of reading level and topic
20
Characterizing web users
Users’ Deviation from Their Own Profiles Stretch reading Future work
21
outline Introduction Reading Level & Topic Profiles
Characterizing the web Applications Conclusion
22
Application Compare expert v.s non-expert URLs
23
Application Predict expert websites Result
24
outline Introduction Reading Level & Topic Profiles
Characterizing the web Applications Conclusion
25
Conclusion Provide novel characterizations for websites, users and queries by combining distribution of reading level and topic. Can be used for a variety of search-related tasks and predicting the content of a URL or site is targeted at domain experts or non-experts. Use features derived from RLT profiles to predict a user’s preference for Websites in search results. .
26
Conclusion The divergence metrics developed in this paper can be evaluated for their effectiveness as features for personalized re- ranking. The techniques developed for expert v.s notice site classification can be applied both for recommendation and ranking purposes.
27
~Thank you for your listening~
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.