Ryen White, Susan Dumais, Jaime Teevan Microsoft Research {ryenw, sdumais,

Slides:



Advertisements
Similar presentations
Beliefs & Biases in Web Search
Advertisements

Predicting User Interests from Contextual Information
Ryen W. White Microsoft Research Steven M. Drucker Microsoft Live Labs
Struggling or Exploring? Disambiguating Long Search Sessions
Random Forest Predrag Radenković 3237/10
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Understanding by Design Planning Instruction Stage Three Prepared for Mercer University EDUC621 by Sherah B. Carr, Ph.D Information adapted from training.
Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.
Studies of the Onset & Persistence of Medical Concerns in Search Logs Ryen White and Eric Horvitz Microsoft Research, Redmond
Ryen White, Susan Dumais Microsoft Research {ryenw,
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
Machine Learning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Company LOGO B2C E-commerce Web Site Quality: an Empirical Examination (Cao, et al) Article overview presented by: Karen Bray Emilie Martin Trung (John)
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
Modeling Long-Term Search Engine Usage Ryen White, Ashish Kapoor & Susan Dumais Microsoft Research.
Section 2: Finding and Refinding Jaime Teevan Microsoft Research 1.
Information Re-Retrieval Repeat Queries in Yahoo’s Logs Jaime Teevan (MSR), Eytan Adar (UW), Rosie Jones and Mike Potts (Yahoo) Presented by Hugo Zaragoza.
SLOW SEARCH Jaime Teevan, Kevyn Collins-Thompson, Ryen White, Susan Dumais and Yubin Kim.
Jaime Teevan Microsoft Research Finding and Re-Finding Personal Information.
From Devices to People: Attribution of Search Activity in Multi-User Settings Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz Microsoft Research,
Fast Webpage classification using URL features Authors: Min-Yen Kan Hoang and Oanh Nguyen Thi Conference: ICIKM 2005 Reporter: Yi-Ren Yeh.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
Topics and Transitions: Investigation of User Search Behavior Xuehua Shen, Susan Dumais, Eric Horvitz.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Features and Algorithms Paper by: XIAOGUANG QI and BRIAN D. DAVISON Presentation by: Jason Bender.
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
Surviving the Information Explosion Christine Alvarado and Jaime Teevan.
Universit at Dortmund, LS VIII
Ryen W. White, Matthew Richardson, Mikhail Bilenko Microsoft Research Allison Heath Rice University.
@delbrians Transfer Learning: Using the Data You Have, not the Data You Want. October, 2013 Brian d’Alessandro.
Personalizing Search on Shared Devices Ryen White and Ahmed Hassan Awadallah Microsoft Research, USA Contact:
Improving Classification Accuracy Using Automatically Extracted Training Data Ariel Fuxman A. Kannan, A. Goldberg, R. Agrawal, P. Tsaparas, J. Shafer Search.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Ryen W. White, Dan Morris Microsoft Research, Redmond, USA {ryenw,
Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.
Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.
Adish Singla, Microsoft Bing Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
 Who Uses Web Search for What? And How?. Contribution  Combine behavioral observation and demographic features of users  Provide important insight.
Xinyu Xing, Wei Meng, Dan Doozan, Georgia Institute of Technology Alex C. Snoeren, UC San Diego Nick Feamster, and Wenke Lee, Georgia Institute of Technology.
Post-Ranking query suggestion by diversifying search Chao Wang.
Measuring the value of search trails in web logs Presentation by Maksym Taran & Scott Breyfogle Research by Ryen White & Jeff Huang.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
CS Fall 2016 (Shavlik©), Lecture 5
Assessing the Scenic Route: Measuring the Value of Search Trails in Web Logs Ryen W. White1 Jeff Huang2 1Microsoft Research 1University of Washington.
Personalizing Search on Shared Devices
Sangeeta Devadiga CS 157B, Spring 2007
Struggling and Success in Web Search
Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
Web Mining Research: A Survey
Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^
Presentation transcript:

Ryen White, Susan Dumais, Jaime Teevan Microsoft Research {ryenw, sdumais,

A cardiologist and a newly-diagnosed patient get the same results for the query “heart disease” If we could estimate their level of expertise we could tailor the search experience to each of them Cardiologist could get technical articles Patient could get tutorial information This paper is about characterizing and using such domain expertise to improve Web search Example to start

Background Domain expertise = knowledge of subject area Domain expertise ≠ search expertise Search expertise is knowledge of search process Previous research has highlighted differences between domain experts and domain non-experts Site selection and sequencing, task completion time, vocabulary and search expression, … Involve small numbers of subjects w/ controlled tasks We extend this work in breadth (  domains) and scale

Outline Studying Domain Expertise Study overview Log data Automatically identifying domain experts Differences between experts vs. non-experts Using Domain Expertise Predicting domain expertise based on search interaction Improving search experience via expertise information Conclusions

Studying Domain Expertise

Study Log-based study of Web search behavior Contrast strategies of experts and non-experts Large-scale analysis w/ greater diversity in vocabulary, web sites, and tasks than lab-based studies Four domains were studied Medical, Legal, Financial, Computer Science Large professional groups who use Web, of general interest Just focus on Medical in this talk for time…

Data Sources Logs w/ querying and browsing behavior of many users Three months from May 2007 through July 2007 > 10 billion URL visits from > 500K users Extracted browse trails and search sessions Browse trails = sequence of URLs per tab/browser instance Search sessions = sub-trails starting w/ search engine query and ending w/ 30 min. interaction timeout Search sessions let us compare domain experts and non- experts in and out of their domain of interest First need to differentiate experts from non-experts …

Identifying Domain Experts Two steps in identifying domain experts from logs: Step 1: Identify users with topical interest Ensures that behavior relates to users interested in domain and helped control for topic differences Step 2: Separate experts from non-experts From user group in Step 1, separate experts based on whether they visit specialist Websites Simple, broadly-applicable method Lets us extend lab studies to real-world settings

Topical Interest Classified browse trails using Open Directory Project Automatically assigned labels to URLs based on ODP with URL back-off as required Filtered outliers and computed % pages in each domain Medical = Health/Medicine Financial = Business/Financial_Services Legal = Society/Law/Legal_Information Computer Science = Computers/Computer_Science Domain# users# sessions# in-domain sessions Medical45,2141,918,72294,036 Financial194,4096,489,674279,471 Legal25,1411,010,86836,418 Computer Science2,427113,0373,706

Dividing Experts & Non-Experts Surveys, interviews, etc. not viable at scale Divided experts/non-experts using observable behavior Filtered users by whether they visited specialist sites Sites identified through discussion w/ domain experts Most sites require subscription; assume visitors have above average domain knowledge DomainExpert URL filtersExpertNon-expert Medical ncbi.nlm.nih.gov/pubmed, pubmedcentral.nih.gov 7,971 (17.6%)37,243 (82.4%) Financial bloomberg.com, edgar-online.com, hoovers.com, sec.gov 8,850 (4.6%)185,559 (95.4 %) Legal lexis.com, westlaw.com 2,501 (9.9%)22,640 (90.1 %) CS acm.org/dl, portal.acm.org 949 (39.1%)1,478 (60.9%)

Differences between Domain Experts and Non-Experts

Domain Expertise Differences Behavior of experts/non-experts differs in many ways Some are obvious: Queries (experts use more tech. vocab., longer queries) Source selection (experts utilize more tech. sources) URL-based analysis Content-based analysis (judges rated page technicality) Search success (experts more successful, based on CTR) Some are less obvious: Session features, e.g., Branchiness of the sessions Number of unique domains Session length (queries, URLs, and time)

Branchiness & Unique Domains Session branchiness = 1 + (# revisits to previous pages in the session followed by visit to new page) Expert sessions are more branchy and more diverse than non-experts Experts may have developed strategies to explore the space more broadly Session FeatureExpertNon-expert MSDM Branchiness # unique domains

Session Length Length measured in URLs, queries, time Greater investment in tasks by experts than non-experts Search targets may be more important to experts making them more likely to spend time and effort Session Length FeatureExpertNon-expert MSDM Page views (inc. result pages) Query iterations Time (seconds)

Other Considerations Expert/non-expert diffs. hold across all four domains Out of domain search sessions are similar: Similarities in other features (e.g., queries) Observed differences attributable to domain Session FeatureExpertNon-expert MSDM Branchiness Unique domains Page views (inc. result pages) Query iterations Time (seconds)

Using Domain Expertise

Predicting Domain Expertise Based on interaction behavior we can estimate a user’s level of domain expertise Rather than requiring offline tests Search experience can be tailored based on estimation Just like we needed with the cardiologist and the patient Three prediction challenges: In-session: After observing ≥ 1 action(s) in a session Post-session: After observing a single session User: After observing ≥ 1 sessions from same user

Within-Session Prediction Predicting domain expertise as the session proceeds Used maximum margin averaged perceptron Trained using features of queries, pages visited, both Five-fold cross validation and ten experimental runs e.g., for CS, our best-performing predictor: *,** = significant difference from maximal margin, always neg. (.566) Predict after just a few actions; Queries best – less noisy Action type Action numberFull session All.616*.625*.639**.651**.660**.718** Queries.616*.635**.651**.668**.683**.710** Pages *.608*.617*.634**.661**

Improving Search Experience Search engine or client-side application could bias results toward websites suitable for expertise level Reinforces behavior rather than encouraging learning Help domain non-experts become experts over time Provide non-expert definitions for related expert terms e.g., search for [cancer] includes definition of [malignancy] Help non-experts identify reliable expert sites or use the broader range of information that experts do

Conclusions Large-scale, log-based study of Web search behavior of domain experts and non-experts Showed that experts/non-experts search differently within their domain of expertise, and similarly otherwise Differences/similarities visible across four domains Extending previous lab studies in breadth and scale Developed models to predict domain expertise Can do this accurately for a user / post-session / in-session Domain expertise information can be used to tailor the search experience and help non-experts become experts

Common Tasks Observed differences may be related to task differences rather than expertise differences To address this concern we developed two methods to identify comparable tasks: Identified search sessions that began w/ same query Identified sessions that ended w/ same URL Extracted features of queries, pages, and sessions Between-group differences held true