Download presentation
Presentation is loading. Please wait.
Published byMaryann Fox Modified over 9 years ago
1
A Model of Information Foraging via Ant Colony Simulation Matthew Kusner
2
Information Foraging Theory Background – People search for information in roughly the same way that animals search for food in their surroundings. Information Scent – Ex: “the text associated with Web links” (Fu, 2007) – Background knowledge – Recommendations
3
Ant Colony Simulation Pheromone trails – Laid by ants who've found food. – Followed by other ants with probability p. – Path Evaporation Path Optimization Simulation specifics
4
AOL Data Set 21 million queries (March 1– May 31, 2006) 650k users19 million click-through events Quantities:querytime of query click URLuser IDclicked link rank
5
Information Foraging → Ant Colony user → ant clicked link → food information scent → pheromone path website importance → food distance where website importance is defined by: – 1. Rank – 2. Popularity of website – 3. Combination of above methods
6
Distancing Methods Ranking Popularity Combination [based on data in Joachims et al., 2005]
7
Results AOL user-visit per website vector – [numWvisits 1, numWvisits 2,..., numWvisits n ] Simulation ant-visit per food vector – [numAvisits 1, numAvisits 2,..., numAvisits n ] Pearson Correlation Score (PCS) Permutation Test → 95% Coverage Interval – (AOL_data i, simulation_data i ) selection with replacement Bootstrapping → p-value – Shuffle AOL vector
8
Query Type of distancing # of users # of clicked links # of distinct websites visited Average PCS Average 95% CI Start Average 95% CI End Significa nt p-val? ranking12559190.81820.32030.9364Yes vacationpopularity12559190.1296-0.17680.6624 combination12559190.1488-0.38190.3920 ranking392560.7631-0.47810.9854 rhinopopularity392560.3906-0.24840.9919 combination392560.2013-0.73890.9657 ranking536112-0.1825-0.54260.4706 zebrapopularity536112-0.0110-0.46670.5079 combination5361120.1558-0.36550.6754 ranking523990.6118-0.17970.9214 lionpopularity523990.0699-0.57760.7296 combination523990.0304-0.61700.6609 ranking19456210.5358-0.09520.9301 footballpopularity19456210.2693-0.15830.6722 combination19456210.4149-0.02230.7612 ranking22074160.7137-0.42250.9529 basketballpopularity22074160.2228-0.17550.6455 combination22074160.1415-0.34700.6661
9
Results Queries with significant p-values: – vacation” (ranking), “baseball” (ranking), “reebok” (ranking), “adidas” (ranking), “marbles” (ranking), “helicopter” (ranking), “car” (ranking), “potatoes” (ranking), “coffee” (ranking), “farming” (ranking), “rock” (popularity), “shirts” (ranking), “playstation” (ranking), “sega” (popularity), “tom cruise” (ranking), “mel gibson” (ranking), “burger king” (ranking), “chicago” (ranking), “los angeles” (ranking), and “paris” (ranking) Distancing methods without 95% CI overlap: – Ranking: “potatoes” - neither popularity, nor combination “shirts” - not popularity “playstation” - not popularity “burger king” - not combination
10
Discussion Disadvantages of popularity and combination methods – “vacation” example Possible reasons for 95% CI overlap – Randomness – Disregard of structure Significance of queries with low p-values – Search engine matching Future directions – Different Simulation – Other similarity metrics – Random beginnings
11
References Fu, W., & Pirolli, P. (2007). SNIF-ACT: a cognitive model of user navigation on the World Wide Web. Human-Computer Interaction, 22(4), 355-412. T. Joachims, L. Granka, B. Pang, H. Hembrooke, and G. Gay (2005). Accurately Interpreting Clickthrough Data as Implicit Feedback, Proceedings of the ACM Conference on Research and Development on Information Retrieval (SIGIR).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.