Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining Anchor Text for Query Refinement

Similar presentations


Presentation on theme: "Mining Anchor Text for Query Refinement"— Presentation transcript:

1 Mining Anchor Text for Query Refinement
Reiner Kraft and Jason Zien IBM Almaden Research Center Mark Strohmaier

2 Problem Motivation 23% of search queries are single-term
Expanding the query can lead to more accurate searches Previous studies indicate that anchor text is statistically similar to search queries Can this similarity be exploited to improve search queries?

3 What is anchor text? <a href=”this is the website”> This is the anchor text </a> Destination pages can have multiple links pointing to them Collections of anchor text can give a view of the destination page Naïve approach: Find links whose anchor text is similar to the query Return the links destination pages to the user

4 Problems with naïve approach
High term frequency is not directly related to page quality Repeated terms may lead to unnatural queries IDF is not necessarily relevant Anchor text may appear multiple times

5 Methods of Query Refinement
Weighting the number of occurrences Weight based on the type of anchor text Number of terms in the anchor text Smaller terms is better Number of characters in the anchor text More concise queries are better

6 Benefits of the Anchor Text
There is much less anchor text than document text Pages can have many incoming links Refined anchor text can capture a degree of site popularity

7 Mining Anchor Text Initial web crawl covered 33 million links on IBM intranet Additionally, roughly 350,000 queries were analyzed Both categories showed a similar relationship between length and number of occurrences

8 Pre-processing Summaries
Query refinement is sensitive to the number of terms Too few may not lead to much improvement Too many may lead to overspecialization Best results were for MAXCOUNT = 3

9 Studies Performed Three different approaches were compared Anchor
Ranked Anchor Text refinement Doc.SW This ranked pages based on the most frequently occurring 2 and 3 term phrases DOC Similar to Doc.SW, but not counting stop words

10 Ranking Anchor Texts The results are ranked based on WCOUNT score
Number of terms in the anchor summary Number of characters in the anchor summary

11 Comparison of Methods Second comparison tested 22 different queries
QUERYLOG processes and dynamically updates user queries based on previous ones, in a similar manner as ANCHOR

12 Conclusions Using anchor text leads to better results than performing similar methods on document collections A similar approach can be used to refine user search queries as well

13 Future Directions Broadening search queries
Lexical analysis, rather than straight textual Pre- and Post- anchor text


Download ppt "Mining Anchor Text for Query Refinement"

Similar presentations


Ads by Google