Can Change this on the Master Slide Monday, August 20, 2007Can change this on the Master Slide0 A Distributed Ranking Algorithm for the iTrust Information.

Slides:

Advertisements

Similar presentations

CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.

Advertisements

Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.

Chapter 5: Introduction to Information Retrieval

Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

A Machine Learning Approach for Improved BM25 Retrieval

University of Cincinnati1 Towards A Content-Based Aggregation Network By Shagun Kakkar May 29, 2002.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Experience with an Object Reputation System for Peer-to-Peer File Sharing NSDI’06(3th USENIX Symposium on Networked Systems Design & Implementation) Kevin.

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.

Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.

Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.

Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.

1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)

Text-Based Content Search and Retrieval in ad hoc P2P Communities Francisco Matias Cuenca-Acuna Thu D. Nguyen

Federated Search of Text Search Engines in Uncooperative Environments Luo Si Language Technology Institute School of Computer Science Carnegie Mellon University.

IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

MPI Informatik 1/17 Oberseminar AG5 Result merging in a Peer-to-Peer Web Search Engine Supervisors: Speaker : Sergey Chernov Prof. Gerhard Weikum Christian.

Presented by Wei Dai The iTrust Local Reputation System for Mobile Ad-Hoc Networks.

Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Retrieval Models for Question and Answer Archives.

1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.

1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)

Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.

Which of the two appears simple to you? 1 2.

Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Presented by: Apeksha Khabia Guided by: Dr. M. B. Chandak

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

A Prediction-based Fair Replication Algorithm in Structured P2P Systems Xianshu Zhu, Dafang Zhang, Wenjia Li, Kun Huang Presented by: Xianshu Zhu College.

Querying Structured Text in an XML Database By Xuemei Luo.

25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”

The Effect of Collection Organization and Query Locality on IR Performance 2003/07/28 Park,

CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.

Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.

Introduction to Digital Libraries hussein suleman uct cs honours 2003.

Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.

Language Model in Turkish IR Melih Kandemir F. Melih Özbekoğlu Can Şardan Ömer S. Uğurlu.

1 The Threshold Join Algorithm for Top-k Queries in Distributed Sensor Networks D. Zeinalipour-Yazti, Z. Vagena, D. Gunopulos, V. Kalogeraki, V. Tsotras.

Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.

Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq

Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.

What Does the User Really Want ? Relevance, Precision and Recall.

Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.

Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.

PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet, Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker:

The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

CS791 - Technologies of Google Spring A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.

September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.

BotTracer: Bot User Detection Using Clustering Method in RecDroid

Efficient Multi-User Indexing for Secure Keyword Search

Collection Fusion in Carrot2

Proposal for Term Project

An Empirical Study of Learning to Rank for Entity Search

Martin Rajman, Martin Vesely

Applying Key Phrase Extraction to aid Invalidity Search

Murat Açar - Zeynep Çipiloğlu Yıldız

Trustworthy Distributed Search and Retrieval over the Internet

INF 141: Information Retrieval

Ranking using Multiple Document Types in Desktop Search

VECTOR SPACE MODEL Its Applications and implementations

Presentation transcript:

Can Change this on the Master Slide Monday, August 20, 2007Can change this on the Master Slide0 A Distributed Ranking Algorithm for the iTrust Information Search and Retrieval System Presented by Boyang Peng Research conducted in collaboration with Y. T. Chuang, I. Michel Lombera, L. E. Moser, and P. M. Melliar-Smith Supported in part by NSF Grant CNS

Overview 1) Introduction 2) Overview of iTrust 3) Distributed Ranking System 4) Trustworthiness 5) Evaluation 6) Conclusion and Future Work WEBIST 2013iTrustBoyang Peng 1

Introduction WEBIST 2013iTrustBoyang Peng 2 What is iTrust?

Introduction WEBIST 2013iTrustBoyang Peng 3 vs

Purpose WEBIST 2013iTrustBoyang Peng 4

WEBIST 2013iTrustBoyang Peng Source of Information 1. Distribution of metadata

WEBIST 2013iTrustBoyang Peng Source of Information Requester of Information 2. Distribution of Requests 3. Request encounters metadata

WEBIST 2013iTrustBoyang Peng Source of Information Requester of Information 4. Request matched

Distributed Ranking System Why ranking is needed in iTrust?  Centralized search engines have this functionality  Filters out trivial and not-relevant files  Increases both the fidelity and the quality of the results How is the ranking is done?  What metrics the ranking algorithm will use and what the ranking formula is.  What information the ranking algorithm needs and how to retrieve that information WEBIST 2013iTrustBoyang Peng 8

Distributed Ranking System Indexing performed at the source nodes  Generate a term-frequency table for an uploaded document Ranking performed at the requesting node  Ensure fidelity of results WEBIST 2013iTrustBoyang Peng

WEBIST 2013iTrustBoyang Peng Source of Information Requester of Information Request FreqTable(d) 5. Retrieve term-frequency table

Ranking Algorithm where norm(d) is the normalization factor for document d, computed as: number_of_common_terms for a document d is |s∩c|, where s is the set of all terms in the freqTable(d) and c is the set of common terms number_of_uncommon_terms for a document d is |freqTable(d)| - number_of_common_terms WEBIST 2013iTrustBoyang Peng 11

Ranking Algorithm where tf(t,d) is the term-frequency factor for term t in document d, computed as: freq(t,d) is the frequency of occurrence of term t in freqTable(d) avg(freq(d))) is the average frequency of terms contained in the freqTable(d) WEBIST 2013iTrustBoyang Peng 12

Ranking Algorithm where idf(t) is the inverse document frequency factor for term t, computed as: numDocs is the total number of documents being ranked docFreq(t) is the number of documents that contain the term t WEBIST 2013iTrustBoyang Peng 13

Trustworthiness Potential scammers  Falsifying Information Distribute a term-frequency table containing every single word in the language Set a limit on the size of term-frequency table of a document  Exaggerating Information A malicious node can exaggerate the information about a document to achieve a higher ranking WEBIST 2013iTrustBoyang Peng 14

WEBIST 2013iTrustBoyang Peng 15 The percent time that a document is ranked last as a function of the number of keywords in the query. The size of the term-frequency table for all documents is 200

WEBIST 2013iTrustBoyang Peng 16 The mean score of 1000 rankings of a document as a function of the number of keywords in the query. The lines, Document x 5 and Document x 10, correspond to the frequencies in the term-frequency table of a document multiplied by a factor of 5 and 10, respectively

Evaluation Because iTrust is a distributed and probabilistic system, for reproducibility of results, we evaluate the effectiveness of the ranking system by simulation, separate from the iTrust system implementation As the number of keywords in the query increases, the accuracy of the results increases WEBIST 2013iTrustBoyang Peng 17

Document Similarity WEBIST 2013iTrustBoyang Peng 18 The mean percent time of 1000 rankings that a set of documents (Document Set 1 at the left and Document Set 2 at the right) are ranked the top four, as a function of the number of keywords in the query

Ranking Stability WEBIST 2013iTrustBoyang Peng 19

Ranking Stability WEBIST 2013iTrustBoyang Peng 20

How big should the term-frequency tables be? WEBIST 2013iTrustBoyang Peng 21

Conclusion We have presented a Distributed Ranking System for iTrust  Effective in ranking documents relative to their relevance to queries that the user has input  Exhibits stability in ranking documents  Counters scamming by malicious nodes WEBIST 2013iTrustBoyang Peng 22

Related Works Danzig, P. B., Ahn, J., Noll, J., & Obraczka, K. (1991, September). Distributed indexing: a scalable mechanism for distributed information retrieval. In Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval (pp ). ACM. Kalogeraki, V., Gunopulos, D., and Zeinalipour-Yazti, D. (2002). A local search mechanism for peer-to-peer networks. In Proceedings of the Eleventh International Conference on Information and Knowledge Management, pages 300–307. Callan, J. P., Lu, Z., & Croft, W. B. (1995, July). Searching distributed collections with inference networks. In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval (pp ). ACM. Xu, J., & Croft, W. B. (1999, August). Cluster-based language models for distributed retrieval. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval (pp ). ACM. WEBIST 2013iTrustBoyang Peng 23

Future Work Ranking that also takes into account the reputation of the source node or the document or both Evaluating the distributed ranking algorithm on a larger and more varied set of documents Additional schemes to prevent malicious nodes from gaining an unfair advantage WEBIST 2013iTrustBoyang Peng 24

Question? Comments? Our iTrust Website:  Contact information:  Boyang Peng:  Personal Website: Our project is supported by NSF: CNS WEBIST 2013iTrustBoyang Peng 25