Download presentation
Presentation is loading. Please wait.
Published byRoxanne Heath Modified over 8 years ago
1
1 Clustering Web Queries John S. Whissell, Charles L.A. Clarke, Azin Ashkan CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/08/31
2
2 Outline Introduction Experimental Setup Similarity to Manual Labelings Classification Quality Metric Split Discoveries Clickthrough Analysis Based on Detected Query Categories General Web Query Clustering Concluding Discussion
3
3 Introduction Clustering methods suffer from notable problems, including the evaluation of results. ground truth labelings objective functions Goal: evaluate the quality of clustering results not require comparison to ground truth not use a specific clustering algorithm ’ s objective function
4
4 Introduction Clustering Web Queries: navigational/informational queries commercial/non-commercial queries
5
5 Experimental setup Data Set Weighting Methods Clustering Algorithms
6
6 Data Set Microsoft adCenter Includes a record of queries entered, ads displayed and ads clicked. Personally identifying information was removed. Commercially-oriented: 1700 queries were selected for which the ad click frequency of the query was above 10.
7
7 Data Set For each query, two types of features available: search engine result page (SERP) query-specific features
8
8 Weighting Methods
9
9 Clustering Algorithms K-means clustering using Lloyd ’ s method (kmeans) Normalized-Cut Spectral clustering (spect) UPGMA clustering (upgma) Single Link clustering (slink) Complete Link clustering (clink) Document clustering algorithms from Zhao and Karypis: e1, i1, i2, g1, g1p, and h1 objective functions
10
10 Similarity to Manual Labelings
11
11 Similarity to Manual Labelings
12
12 Similarity to Manual Labelings
13
13 Classification Quality Metric Train a classifier to recognize clusters in a clustering. Classification accuracy (acc c ): using crossfold validation
14
14 Classification Quality Metric Illustrate a correlation between N a using a linear SVM and internal similarity.
15
15 Classification Quality Metric
16
16 Split Discoveries
17
17 Split Discoveries
18
18 Clickthrough Analysis Based on Detected Query Categories Clustering+SVM Clickthrough rate: percentage of queries in that set that had an ad click
19
19 General Web Query Clustering
20
20 Concluding Discussion Cluster objects using multiple representations and algorithms. Classification accuracy is used to measure the quality of a clustering. Future work: extend metric to select the number of clusters
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.