Download presentation
Presentation is loading. Please wait.
Published byLisa Todd Modified over 9 years ago
1
Topics and Transitions: Investigation of User Search Behavior Xuehua Shen, Susan Dumais, Eric Horvitz
2
What’s next for the user?
3
Outline Problem Automatic Topic Tagging Predictive models Evaluation Experiments and analysis Conclusion and future directions
4
Problem Opportunity: Personalizing search Focus: What topics do users explore? How similar are users to each other, to special groups, and to the population at large? Data, data, data… –MSN search engine log –Query & clickthrough –87,449,277 rows, 36,895,634 URLs 5% sample from MSN logs, 05/29-06/29 Create predictive models of topic of queries and urls visited
5
Automatic Topic Tagging ODP (Open Directory Project) manually categorize URLs MSN extended methods with heuristics to cover more urls We develop a tool to automatically tag every URL in the log 15 top-level categories Arts, Business, Computers, Games, Health, Home, Kids_and_Teens, News, Recreation, Reference, Science, Shopping, Society, Sports, Adult
6
A Snippet multiple tagging Avg: 1.38 tags per URL ActionI D ClientIDElapedTimeActionValueTopCat 2000005b8210455Chttp://www.wwltv.comArts 3000005b8320149Chttp://www.interactclaims.com/shellUndefined 2000005b8549148Q Birth certificateNULL 8000008572240996Q yaho NULL 5000008572240843Chttp://www.nextag.comHome 4000013d1910382Chttp://tv.yahoo.com/news/ap/20040530/108595548000.htmlArts 5000013d1910392C french translatorNULL 2000013d1910351Chttp://tv.zap2it.com/tveditorial/tve_main/1,1002,271|88515|1|, 00.htm Arts 2000013d1910351Chttp://tv.zap2it.com/tveditorial/tve_main/1,1002,271|88515|1|, 00.htm Arts 6000013d1541972Chttp://www.nationalenquirer.com/stories/news.cfm?instanceid =6180 Society 12000018de2569530Chttp://www.framesdirect.comShopping 7000018de2568961Chttp://www.macrocap.com/Lower-Back-PainUndefined 10000018de2569174Chttp://www.coolrunning.com/engine/2/2_5/193.shtmlRegional 10000018de2569174Chttp://www.coolrunning.com/engine/2/2_5/193.shtmlSports
7
Predictive Model: User Perspective Individual model Use only individual clickthrough to build a model for each user’s predictions Group model Group similar users to build a model for each group’s prediction (e.g., group users with same ‘max topic’ clickthrough) Population model Use clickthrough data for all users to build a model for all users predictions
8
Predictive Model: Considering Time Dependence Marginal model –Base probability for topics Markov model –Probability of moving from one topic to another Time-interval-specific Markov model –U ser search behavior has two different patterns ? ? ?
9
Evaluation Metrics KL (Kullback-Leibler) Divergence Likelihood Top K Match the real top K topics and predicted top K’ topics
10
Experiment 5 weeks data (05/22-06/29) Build models based on different subsets of total data Do prediction for a “holdout set”: Other weeks data
11
Results from Basic Experiment Marginal model: Individual model has best performance Markov model: Consistently better than corresponding marginal model Markov model: Individual model has no best performance: Why?
12
Results: Training Data Size Greater amounts of training data Markov (same for Marginal) models improve But: Individual Markov model still can’t beat Population Markov model
13
Results: Smoothing Using population Markov model to smooth helps individual Markov model But: smoothed individual Markov model still can’t outperform population model
14
Results: Time Decay Effect When time of training data decays, the prediction accuracy decreases
15
Results: Time-Interval-Specific Markov Model Markov Models capture short time access pattern better
16
Conclusion Use ODP categorization to tag URLs visited by users Construct marginal and Markov models using tagged URLs Explore performance of marginal and Markov models to predict transitions among topics Set of results relating topic transition behaviors of population, groups, and specific users
17
Directions Study of reliability, failure modes of automated tagging process (use of expert human taggers) Combination of query and clickthrough topics Formulating and studying different groups of people Topic-centric evaluation Application of results in personalization of search experience – Interpretation of topics associated with queries –Ranking of results –Designs for client UI
18
Acknowledgement Susan and Eric for great mentoring and discussion Johnson and Muru for development support Haoyong for MSN Search Engine development environment
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.