Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Clustering Web Queries John S. Whissell, Charles L.A. Clarke, Azin Ashkan CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/08/31.

Similar presentations


Presentation on theme: "1 Clustering Web Queries John S. Whissell, Charles L.A. Clarke, Azin Ashkan CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/08/31."— Presentation transcript:

1 1 Clustering Web Queries John S. Whissell, Charles L.A. Clarke, Azin Ashkan CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/08/31

2 2 Outline Introduction Experimental Setup Similarity to Manual Labelings Classification Quality Metric Split Discoveries Clickthrough Analysis Based on Detected Query Categories General Web Query Clustering Concluding Discussion

3 3 Introduction Clustering methods suffer from notable problems, including the evaluation of results. ground truth labelings objective functions Goal: evaluate the quality of clustering results not require comparison to ground truth not use a specific clustering algorithm ’ s objective function

4 4 Introduction Clustering Web Queries: navigational/informational queries commercial/non-commercial queries

5 5 Experimental setup Data Set Weighting Methods Clustering Algorithms

6 6 Data Set Microsoft adCenter Includes a record of queries entered, ads displayed and ads clicked. Personally identifying information was removed. Commercially-oriented: 1700 queries were selected for which the ad click frequency of the query was above 10.

7 7 Data Set For each query, two types of features available: search engine result page (SERP) query-specific features

8 8 Weighting Methods

9 9 Clustering Algorithms K-means clustering using Lloyd ’ s method (kmeans) Normalized-Cut Spectral clustering (spect) UPGMA clustering (upgma) Single Link clustering (slink) Complete Link clustering (clink) Document clustering algorithms from Zhao and Karypis: e1, i1, i2, g1, g1p, and h1 objective functions

10 10 Similarity to Manual Labelings

11 11 Similarity to Manual Labelings

12 12 Similarity to Manual Labelings

13 13 Classification Quality Metric Train a classifier to recognize clusters in a clustering. Classification accuracy (acc c ): using crossfold validation

14 14 Classification Quality Metric Illustrate a correlation between N a using a linear SVM and internal similarity.

15 15 Classification Quality Metric

16 16 Split Discoveries

17 17 Split Discoveries

18 18 Clickthrough Analysis Based on Detected Query Categories Clustering+SVM Clickthrough rate: percentage of queries in that set that had an ad click

19 19 General Web Query Clustering

20 20 Concluding Discussion Cluster objects using multiple representations and algorithms. Classification accuracy is used to measure the quality of a clustering. Future work: extend metric to select the number of clusters


Download ppt "1 Clustering Web Queries John S. Whissell, Charles L.A. Clarke, Azin Ashkan CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/08/31."

Similar presentations


Ads by Google