Download presentation
Presentation is loading. Please wait.
1
Web queries classification Nguyen Viet Bang WING group meeting June 9 th 2006
2
What does it mean by “classification of queries by their goals”? A taxonomy by [Rosen & Levinson] – Navigational: locate a specific website Example: “Stanford University” – Informational: find out about a topic Example: “European history” – Resources: find a resource Example: “download Beatles lyrics” Note: there are further sub-categories. Also a similar taxonomy by [Broder]
3
What’s this research about? An outline: by [Rose and Levinson] (i) Determine a framework to classify queries according to goals (ii) Given queries, find a way to associate the goals determined (i) with the queries. (iii) With the queries being classified in (ii), try to exploit that information to enhance current search engines.
4
Outline: Problem (i) (i) Determine a classification framework according to goals of users’ queries (a taxonomy by [Rose and Levinson]) (ii) Given queries, find a way to associate the goals determined (i) with the queries. (iii) With the queries being classified in (ii), try to exploit that information to enhance current search engines.
5
Outline: Problem (ii). Associate the goals with the queries (i) Determine a classification framework according to goals of users’ queries (ii) Given queries, find a way to associate the goals determined (i) with the queries. (iii) With the queries being classified in (ii), try to exploit that information to enhance current search engines.
6
Outline: Problem (ii). Associate the goals with the queries (1) Manually ask users (present a user interface) (2) Automated classification 2.1. Use others extra information (others than the queries) – Clickthrough data (user click history) [Lee, Liu and Cho] – Link (anchor text distribution) [Lee, Liu and Cho] – Many others features: Distribution of queries, PageRank, mutual information 2.2.Machine learning 2.3. How about looking at the queries only?
7
An example: click distribution Intuitive: for “navigational”, users tend to click on 1 single result. Algorithm: – Sort the results of a search descending to the number of clicks (yield a distribution) – Calculate a statistics description of the distribution) (for.e.g, mean) – If the mean value > some threshold, classify as “navigational”
8
Automated classification (contd) Combination of features: yield higher accuracy [Lee, Liu and Cho] Machine learning – Unsupervised (clustering) – Supervised (possibly lack of training data)
9
Problem (iii): retrieve results after classification Need different strategies for each category [Kang and Kim] Information to analyize: – Content information (the webpage itself) – Link information (topology of links in the web) – URL information (for e.g. to decide whether a webpage is a “root” (site entry) More techniques: boolean combination (“and” or “or”)
10
Our challenge Try to achieve accurate classification by looking at features of the queries only – POS – Relationship between queries – Features of URL returned by search engines (Meurlin?) Enhance search retrieval
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.