Autumn 20111 Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.

Autumn 20111 Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University alizareh@yaduni.ac.ir

Autumn 20112 A3CRank: An adaptive ranking method based on connectivity, content and click-through data

Autumn 20113 The goal of algorithm The goal is to merge the results of BM25, PageRank and TF- IDF algorithm based on click-through data using OWA operator PageRan k HITS BM25 TF-IDF Results Aggregation User click-through data.... User

Autumn 20114 Stages of Algorithm 1.Computing the goodness factor (gf) for each algorithm using click-through data gathered from user behavior 2.Utilizing the goodness factor of each algorithm to compute the weight of each resulting page from that algorithm 3.Computing OWA Operator 4.Computing the final weight of each result using the OWA vector

Autumn 20115 Stage 1: Computing the goodness factor The goodness factor (gf) shows the degree of suitability of each algorithm for users in average Gamma depends on page’s rank and click order –gamma is the quality of the clicked pages for t th query (the query number t) –T is the set of clicked pages and c j depicts the order of clicking on a page with rank j (j th page) Algorithms with top clicked pages have higher gamma

Autumn 20116 Stage 2: Computing the weight of every resulting page of each algorithm The weight of resulting page d for algorithm i is dependent on its gf and the page’s rank n is the number of results for each query returned by algorithm i and R(d) shows the rank (order) of d in the results R i is the set of resulting pages from a query returned by algorithm i

Autumn 20117 Stage 3: Finding OWA Vector bj is the j th largest element in vector A –The weights have nothing to do with the magnitudes of the aggregates a 1 –a n, but depend upon their ranking and orness degree We use the Optimistic Exponential OWA operator to find the weights of each vector:

Autumn 20118 Stage 4: Computing the final weight of each result using the OWA vector The results are presented to the user in descending order of their weights OWA Aggregation v 11 Algorithm 1 v 12 v 1n v 21 Algorithm 2 v 22 v 2n v 31 Algorithm 3 v 32 v 3n v 1d v 2d v 3d wdwd

Autumn 20119 Experimental Results We used University of California at Berkeley’s web site with five million web pages to evaluate our algorithm About 130 queries in two categories including computer science and biology fields, have been used We asked 10 computer science students to enter these queries into the system and click on the results to gather click-through data collection We conducted two stages for evaluation: training and testing the system. –About 60 queries were used in the training phase and the rest were used for the testing phase

Autumn 201110 Comparisons of A3CRank algorithm with other algorithms in the Precision and NDCG

Autumn 201111 Mean Average Precision (MAP)

Autumn 201112 The goodness factor of each algorithm gf BM25 =0.53 gf TF-IDF =0.38 gf PageRank =0.09

Autumn 201113 Combination of Features Using Gradient Descent

Autumn 201114 Documents Features Coarse grained features (Ranking algorithms) –BM25, PageRank, HITS, HostRank,… Fine grained features –TF, IDF, In-degree, Out-degree,…

Autumn 201115 The goal of the algorithm The goal is to find the optimal combination function g(), using appropriate weight vector Assume we have m features for each pair of a query and a document, the goal is to find W, the weighting vector of features: F=[f1, f2, f3,… fm]

Autumn 201116 Using pairs of documents and features By considering pairs of documents for each query q we have three conditions for each pair di and dj –di is more relevant than dj (di>dj) –dj is more relevant than di (dj>di) –they have equal relevancy The goal is finding the weight vector (combination function) that orders the documents correctly. –For example if di>dj we are going to find the W with condition W.Fi>W.Fj or W.(Fi-Fj)>0. We can extract preferences data from click- through data (we don’t have explicit relevancy)

Autumn 201117 The proposed algorithm(1) Considering two document i and j and also their features for query q we could produce the following equation:

Autumn 201118 The proposed algorithm(2) The B ij vector is extracted from features relations and s ij is from the documents relations. Thus, it is reasonable to find a weighting vector that map B ij vector to s ij that both are from objects relations. We are going to find the weighting vector that minimizes the instantaneous errors e where:

Autumn 201119 The proposed algorithm Using the gradient descent to solve this problem we can find the following iterative equation where Beta denotes the learning rate. Also is the current estimate of the aggregated values:

Autumn 201120 Learning Process This process will be repeated for each pair of documents for each query. In this solution in addition to documents relations we also consider their features relations to find a combination function We select new pairs from a new query in each iteration not from the same query. Also when we are in the end of a query results we jump to the beginning. Thus the weights are created from all queries in parallel and they are not biased to the query results length i.e. number of relevant results

Autumn 201121 Experimental Results We used LETOR dataset released by Microsoft Research Asia for evaluation This data set contains set of document retrieval data: TREC03, TREC04 Documents are labeled {relevant, irrelevant}. These labels are provided for each pair of query and document TREC dataset contains 1,053,110 web pages and 11,164,829 hyperlinks There are 50 queries and 75 queries for TREC03 and TREC04 respectively. In LETOR 44 state of-the-art features are extracted for the TREC dataset These include classical features such as BM25 and TF-IDF, connectivity based features (PageRank, HostRank) and query dependent features (HITS) and simple features (TF, IDF) The proposed algorithm is called LGD

Autumn 201122 Comparisons of LGD algorithm with RankSVM and RankBoost algorithms in NDCG TREC03TREC04

Autumn 201123 MAP (Mean Average Precision) BM25RankSVMRankBoostLGD TREC030.1250.2560.2120.235 TREC040.2810.3500.3830.374

Autumn 201124 Paired t-test results of LGD over RankSVM and RankBoost

Autumn 201125 How can we apply users’ behaviors to ranking? Directly to web graph!! ? 12 8 8 -2 3 15 -5 ? Reward/Punishment

Autumn 201126 Advised PageRank PageRank

Autumn 201127 Experimental Results Kendall factor between quality and popularity APageRank is 2.5 times faster than PageRank

Autumn 20111 Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.

Similar presentations

Presentation on theme: "Autumn 20111 Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Autumn 20111 Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.

Similar presentations

Presentation on theme: "Autumn 20111 Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University."— Presentation transcript:

Similar presentations

About project

Feedback