1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Introduction to Information Retrieval
Multimedia Database Systems
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Fast Algorithms For Hierarchical Range Histogram Constructions
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Introduction to Information Retrieval (Part 2) By Evren Ermis.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Evaluating Search Engine
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
ISP 433/533 Week 2 IR Models.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Modeling Modern Information Retrieval
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Route Planning Vehicle navigation systems, Dijkstra’s algorithm, bidirectional search, transit-node routing.
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Presented By: - Chandrika B N
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
A Survey on Social Network Search Ranking. Web vs. Social Networks WebSocial Network Publishing Place documents on server Post contents on social network.
 Focused Clustering and Outlier Detection in Large Attributed Graphs Author : Bryan Perozzi, Leman Akoglu, Patricia lglesias Sánchez, Emmanuel Müller.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
MASTER THESIS num. 802 ANALYSIS OF ALGORITHMS FOR DETERMINING TRUST AMONG FRIENDS ON SOCIAL NETWORKS Mirjam Šitum Ao. Univ. Prof. Dr. Dieter Merkl Univ.
Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
Querying Structured Text in an XML Database By Xuemei Luo.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Machine Learning Approach to Report Prioritization with an Application to Travel Time Dissemination Piotr Szczurek Bo Xu Jie Lin Ouri Wolfson.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Date: 2012/3/5 Source: Marcus Fontouraet. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou 1 Efficiently encoding term co-occurrences in inverted.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Searching Specification Documents R. Agrawal, R. Srikant. WWW-2002.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Information Retrieval CSE 8337 Spring 2005 Modeling (Part II) Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates.
1 Finding Spread Blockers in Dynamic Networks (SNAKDD08)Habiba, Yintao Yu, Tanya Y., Berger-Wolf, Jared Saia Speaker: Hsu, Yu-wen Advisor: Dr. Koh, Jia-Ling.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Retrieval Performance Evaluation - Measures
Information Retrieval and Web Design
Presentation transcript:

1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis, Berthier Ribeiro-Neto Date:2008/10/02 Speaker: Hsu, YuWen Advisor: Dr. Koh, JiaLing

2 Outline Introduction The ranking function Algorithm for computing the ranking Seeds-based Ranking Algorithm for computing seeds-based ranking Results Conclusion

3 Introduction Social networks: interact and share social experiences through the exchange of multimedia objects (text, audio, video) associated with the people themselves and their actions. MySpace ( with over 100 million registered users, Orkut ( with over 40 million, Facebook and Friendster.

4 Introduction (cont.) In a social network with a large number of users, a common operation is to search for people. Randomly selected 750 queries from the Orkut logs, all of them specify just user names. counted the average number of answers per query. exact matches: avg. result per query is 48 partial matches : avg. result per query is 6,034

5 Introduction (cont.) Designing a ranking function which takes as input distances in a friendship graph nodes : users edges : friendship relationship based on a set of pre-selected landmark nodes (called seeds). Our results show that effective ranking can be attained, while keeping query processing time small enough to be practical.

6 The ranking function Figure 1: Friendship graph: John is at distance 1 of ‘ Maria A ’, at distance 2 of ‘ Maria B ’, and at distance 3 of ‘ Maria C ’. We argue that any search ranking function should favor first ‘Maria A’ and second ‘Maria B’.

7 The ranking function (cont.) This reasoning leads us to propose the following simple ranking function: :the shortest path between John and Maria :the rank of the user Maria,with the regard to the query ‘Maria’ posed by John

8 Algorithms for computing the ranking * Pre-compute all distances between any pair of users and indexing them do not work : 40 million users be stored is in the order [ ], which is too large. if each user has 100 friends on average, then the average number of friends at a distance 3 or smaller of a given user is, resulting in a still very large index size in [ ] pre-computing all friendship pairs presents no scalability. The number of friendship distances that needs to be stored becomes overwhelming.

9 Algorithms for computing the ranking * On-the-fly Ranking calculating all the distances on-the-fly at scoring time. simply run a breadth first search for Maria starting from John avg. 100 friends per user, need to expand more than users if we limit ourselves to people at most 3 hops from John superior alternative run a bidirected breadth-first search, looking for intersections in the list of users reached in the new level.

10 Algorithms for computing the ranking * Co-friends Ranking mix of the first two: index their list of friends and at scoring time intersect both lists. advantage: fast, little space disadvantage: only captures friends or friends-of-friends relations. pre-compute a co-friends list and intersect it with the list of friends of the user submitting the query. friendship distances up to 3  the space and time requirements are pretty high  unfeasible to be used cost effectively in a huge Web live service

11 Seeds-based Ranking key purpose: to produce results of high precision but that can be computed efficiently based on estimates, approximations of shortest paths and has a more complex composition

12 Seeds-based Ranking *Seed Distances: Approximating Shortest Paths run a breadth first search reaching out to all nodes in the network annotate it with its distance to the seed started the breadth first search from a vector of distances to seeds is associated with each node of the graph These vectors of seed distances are computed offline.

13 Seeds-based Ranking *Seed Distances: Approximating Shortest Paths Figure 2: Friendship graph with three pre-selected seeds: S1, S2, S3. S1 S3 S2 Seed distances vectors for users D John = [2, 1, 1] D Maria A = [1, 1, 2] D Maria B = [4, 3, 1] D Maria C = [1, 2, 4] The higher the number of seeds, the better the approximation tends to be for a higher number of nodes.

14 Seeds-based Ranking *Seed-based Ranking (cont.)

15 Seeds-based Ranking *Seed-based Ranking : the number of seeds whose sum of distances to both John and Maria is exactly i. :the number of seeds with a finite distance to Maria and is used as a normalization factor. : the rank of the answer ‘Maria’ with regard to the query posed by John.

16 Seed distances vectors for users DJohn = [2, 1, 1] DMaria A = [1, 1, 2] DMaria B = [4, 3, 1] DMaria C = [1, 2, 4] ‘Maria A’ > ‘Maria B’ > ‘Maria C’ S1 S3 S

17 Algorithm for computing seeds-based ranks *Sparse Seed Distances Vectors

18 If a seed distance is higher than 3, we consider it to be infinite. these vectors become more sparse. a large number of seed distances is set to infinite no need to be stored in the seed distances vectors fast computation and reduces query processing time producing high precision rankings D John = [2, 1, 1] D Maria A = [1, 1, 2] D Maria B = [∞, ∞, 1] D Maria C = [1, 2, ∞]

19 Algorithm for computing seeds-based ranks *Map-reduce Computation of Seed Distances

20 Algorithm for computing seeds-based ranks *computing seeds-based ranks match user names to the query and retrieve those that partially match the query terms. generate the seeds-based ranks.

21 Result * Precision Results compare Ranking Precision(crP) the ideal answer set : the set of documents most likely to be relevant to the users On-the-fly Ranking algorithm as the ideal answer set and compute the metric for the Seeds- based Ranking algorithm, we get a precision value of 90%.

22 Definition of the ideal answer set: : the set of top 10 users returned by the On-the-fly Ranking algorithm : the 10 th answer in the result set : the distance of to the user who posed the query A : the set of all answers : the i th answer in the result set : For any, the distance between and the user who posed the query Define the subset as The ideal answer set of I is define as

23 Since the set I includes no relevance judgements, we call our metric compare- rankings Precision (also cr-precision). refers to the compare-rankings precision at the 10 th position of the ranking.

24 Seeds-based Ranking algorithm 100,000 seeds :avg. cr-precision starts at 71.48% 2,000,000 seeds: avg. cr-precision reaches the 90% range Co-friends Ranking algorithm pretty good avg. cr-precision of 87.02%. summary : high precision can be attained using ranking functions that require much less computational resources the Seeds-based Ranking is highly competitive in terms of precision.

25 In addition to standard metric, we also present our results based on generalized precision. In the context of generalized precision, the relevance decision is not binary. Instead, weights are assigned to the answers as follows.

26 Definition: generalize compare-rankings precision ( gerP) : On-the-flying Ranking : the alternative ranking we want to evaluate, :the i th answers in the respective rankings, :the friendship distances from and

27 Seeds-based Ranking algorithm 100,000 seeds :avg. cr-precision starts at 60% 2,000,000 seeds: avg. cr-precision reaches the 83-85% range Co-friends Ranking algorithm avg. precision: 80-82% range. summary : high precision is attained by ranking algorithms that require less computational resources the Seeds-based Ranking algorithm produces the best results, when 2,000,000 seeds or more are used.

28 Result * Performance Results Only related to the overhead generated by the ranking of the results summary: For the Co-friends Ranking and the On-the-fly Ranking algorithms, average query execution times are more magnitude higher than those of the Seeds-based Ranking algorithm using 100,000 seeds.

29 While there is no need for an extremely fast procedure, as it is carried out offline, this still needs to be fast enough to be usable in a live Web system 2,000,000 seeds of distances can be computed in basically 12 minutes (725seconds)

30 Result * Space Results On-the-fly Ranking Space Requirements This graph takes roughly 16GB of space. the number of machines in the cluster, N = 128. the space required is actually 128 * 16GB= 2,048GB. Co-friends Ranking Space Requirements user id :4 bytes integer. the list of friend-of-friends :40 million users each user has avg. 90 friends the space requirements are roughly 40M * 90* 90 * 4 = 1,260GB.

31 Result * Analysis Of the Results

32 Conclusion Social Networks are a new and important trend in the Web how to add this new signal in a distributed architecture, and how the strategy we have developed for that outperforms naive approaches, both in terms of precision and performance we moved from a text based ranking, almost meaningless to users, to a ranking where one can easily recognize the people in the search results.