Download presentation
Presentation is loading. Please wait.
Published byLewis Chase Modified over 9 years ago
1
Advisor: Koh Jia-Ling Nonhlanhla Shongwe 2010-09-28 EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09
2
Preview Introduction AdSearch Bid phrase clustering Index structure for efficient ad search Query processing Experimental evaluation Conclusion
3
Introduction Web has become an important venue for advertising e.g Google, Yahoo Mainly two kinds of advertising channels Contextual advertising Sponsored advertising Ranking: derived from relevance to the user query page content
4
Introduction cont’s Ad’s are characterized by bid phrases keywords the advertisers choose for their ads Syntactic approaches suffer low recallrecall Example Query: “job training” Ad: career college Ad does not have a syntactic match and is not proposed
5
Introduction cont’s The problem is even worse because Shorter lengths of ads Sparsity of the bid phrases Propose an efficient adsearch solution Tackle the issues with query expansion
6
AdSearch Overview
7
AdSearch cont’s Bid phrase clustering Bipartite Graph Construction for Bid Phrase and Ads Agglomerative Iterative Clustering
8
Bipartite Graph Construction for Bid Phrase and Ads A, B, C Ad0, Ad1, Ad2, Ad3, Ad4 1. B = 2. A = 3. G = v ba, v bb, v bc 4. G = v a0, v a1, v a2, v a3, v a4 Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4
9
Agglomerative Iterative Clustering Jaccard Similarity Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4 (A,B) = 1/4 (B,C) = 2/4
10
Agglomerative Iterative Clustering cont’s Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4 A, B, C Ad0, Ad1, Ad2, Ad3, Ad4 Bid-phrases Ads
11
Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4 A, B, C Ad0, Ad1, Ad2, Ad3, Ad4 Bid-phrases (A, B) = 0.25 (A, C) = 0.25 (B, C) = 0.5 Bipartite graph Ads Ad0 = A, Ad1 = B, Ad2 = B, C Ad3 = B, A, C Ad4 = C Ad0, Ad1 = 0 Ad0, Ad2 = 0 Ad0, Ad3 = 0.33 Ad0, Ad4 = 0 Ad1, Ad2 = 0.5 Ad1, Ad3 = 0.33 Ad1, Ad4 = 0 Ad2, Ad3 = 0.66 Ad2, Ad4 =0.5 Ad3, Ad4 =0.33 Merge: Ad2, Ad3 Ad2, Ad4 Ad1, Ad2 Ad0, Ad3 Merge B to C Then A A B, C Ad0 Ad1, Ad4 Ad2, Ad3
12
AdSearch cont’s Index structure for efficient adsearch Mapping clusters of Bid Phrases to Index Terms Block-based Index Structure Dictionaries
13
Mapping clusters of Bid Phrases to Index Terms Clusters B A C D E
14
Block-based Index Structure 3 inverted lists Contains: Index =bid phrase List = ad 1 inverted list Contains: Index =3 bid phrases List = ad and bid phrase Query =B
15
Block-based Index Structure cont’s Advantages over the traditional method Similar bit phrases and their corresponding ads are placed together Merge operations become fewer or even can be avoided Expanding phrase B with phrase A and C, in the traditional method is not efficient.
16
Dictionaries Dictionary D used to record the mapping Bid phrase to its corresponding artificial words Locate corresponding block to a bid phrase Bid phrase artificial words (path) A6:0 B6_5:1 C6_5:2
17
Cluster path Number of distinct ads Dictionaries cont’s Dictionary C (counter dictionary) used to record number of distinct Ads per cluster Corpus data C A = Ad0, Ad3 B = Ad1, Ad2, Ad3 C = Ad2, Ad3, Ad4 Cluster path Distinct ads 6|Ad0, Ad3|=2 6_5|Ad1, Ad2, Ad3, Ad4| = 4 (6, 2) (6_5, 4)
18
AdSearch cont’s Query processing Finding Related Bid phrases with Corresponding Ads Ranking Top-k Relevant Ads
19
Finding Related Bid phrases with Corresponding Ads The process to find related bid phrases Input: user queries Look up the dictionary D to get corresponding artificial words Find minimum clusters that contain enough ads Bid phrase artificial words (path) A6:0 B6_5:1 C6_5:2 Query: ABD Cluster path Distinct ads 6|Ad0, Ad3|=2 6_5|Ad1, Ad2, Ad3, Ad4| = 4 e.g. Top 2 ads M=1.5 *2 = 3 Bid phrase artificial words (path) A6:0 B6_5:1 C6_5:2 Cluster path Distinct ads 6|Ad0, Ad3|=2 6_5|Ad1, Ad2, Ad3, Ad4| = 4
20
Finding Related Bid phrases with Corresponding Ads The process to find related bid phrases Return clusters, those containing at least one bid are stored in one group Perform a multi-way merge operation to get the final results. AdAd1Ad2Ad3Ad4 Bid phrases AB,CA,B,CC AdAd1Ad2Ad3Ad4 Bid phrases A B,CA,B,C C
21
Ranking Top-k Relevant Ads A procedure to expand the user query with related bid phrases and get a list of ads To get the top K User a scoring function QQuery B(x)Set of related bid phrases Similarity between x and y tfidf(y, ad) term frequency and inverse document frequency
22
Experimental evaluation Both Chinese and English
23
Experimental evaluation cont’s NameDescription CQS1 (Chinese )or EQS1 (English)Randomly sampled 100 bid phrases and each bid phrase is associated with few distinct ads CQS2 (Chinese )or EQS2 (English)Selected 100 pairs bid phrases, each pair could return ads associated with both bid phrases inside it CQS3 (Chinese )or EQS3 (English)Constructed similarly with queries composed of 3 to 4 bid phrases CQF ( Chinese Frequent Query set)and EQS( English Frequency Query Set ) 100 popular bid phrases to build the CQF and EQF
24
Experimental evaluation cont’s Evaluation of the clusters step
25
Experimental evaluation cont’s Efficiency evaluation The adSearch was implemented in fixed and unfixed block sizes The block size is defined as the fraction of distinct ads in the block with regards to the whole ads. AdSearch(0.001) number of distinct ads in each block. For example Chinese data 524, 868 * 0.001 = 525 Chinese data set = 525 Inv= perform query expansion on top of the traditional inverted index
26
Experimental evaluation cont’s Effectiveness valuation Randomly selected 50 queries 10 people invited to evaluate the returned ads by AdSearch and Baidu.
27
Experimental evaluation cont’s Effectiveness evaluation
28
Conclusion Introduced a AdSearch system which consists Bid phrase clustering For each bid phrase and ad, it will contract a bipartite graph Used the agglomerative iterative clustering to cluster similar ads Index structure for efficient ad search Used a block-based index structure to index all ads and bid phrases Used the dictionary to record mappings between bid phrases and ads Query processing Explained how ads we retrieved and ranked to get the top-k results
29
THANK YOU
30
Introduction cont’s Back All Docs Relevant Ads Relevant Docs (R) Relevant Ads in the Ads set (Ra ) Q = “job training”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.