Efficient Search - Overview Improving Search In Peer-to-Peer Systems Presented By Jon Hess cs294-4 Fall 2003.

Slides:

Advertisements

Similar presentations

C. Mastroianni, D. Talia, O. Verta - A Super-Peer Model for Resource Discovery Services in Grids A Super-Peer Model for Building Resource Discovery Services.

Advertisements

Peer-to-Peer and Social Networks An overview of Gnutella.

One Hop Lookups for Peer-to-Peer Overlays Anjali Gupta, Barbara Liskov, Rodrigo Rodrigues Laboratory for Computer Science, MIT.

CMPE 521 Improving Search In P2P Systems by Yang and Molina Prepared by Ayhan Molla.

Problem Solving Agents A problem solving agent is one which decides what actions and states to consider in completing a goal Examples: Finding the shortest.

Improving Search in Peer-to-Peer Networks Beverly Yang Hector Garcia-Molina Presented by Shreeram Sahasrabudhe

University of Cincinnati1 Towards A Content-Based Aggregation Network By Shagun Kakkar May 29, 2002.

1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.

Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Routing Indices For Peer-to-Peer Systems Arturo Crespo, Hector Garcia-Molina Stanford ICDCS 2002.

Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.

LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.

Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.

Measurement, Modeling, and Analysis of a Peer-2-Peer File-Sharing Workload Presented For Cs294-4 Fall 2003 By Jon Hess.

Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By

Peer-to-Peer Networks João Guerreiro Truong Cong Thanh Department of Information Technology Uppsala University.

P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.

Effects of Applying Mobility Localization on Source Routing Algorithms for Mobile Ad Hoc Network Hridesh Rajan presented by Metin Tekkalmaz.

Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.

A Trust Based Assess Control Framework for P2P File-Sharing System Speaker ： Jia-Hui Huang Adviser : Kai-Wei Ke Date ： 2004 / 3 / 15.

Improving Search in P2P Networks By Shadi Lahham.

Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,

presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.

Blind Search-Part 2 Ref: Chapter 2. Search Trees The search for a solution can be described by a tree - each node represents one state. The path from.

Object Naming & Content based Object Search 2/3/2003.

Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.

Comparing Hybrid Peer-to-Peer Systems Beverly Yang and Hector Garcia-Molina Presented by Marco Barreno November 3, 2003 CS 294-4: Peer-to-peer systems.

1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.

Efficient Search in Peer to Peer Networks By: Beverly Yang Hector Garcia-Molina Presented By: Anshumaan Rajshiva Date: May 20,2002.

Searching in Unstructured Networks Joining Theory with P-P2P.

1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.

Gnutella & Searching Algorithms in Unstructured Peer-to-Peer Networks CS780-3 Lecture Notes In Courtesy of David Bryan.

P2P File Sharing Systems

INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.

Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.

1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.

Introduction Widespread unstructured P2P network

P2P Architecture Case Study: Gnutella Network

IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.

09/07/2004Peer-to-Peer Systems in Mobile Ad-hoc Networks 1 Lookup Service for Peer-to-Peer Systems in Mobile Ad-hoc Networks M. Tech Project Presentation.

Searching In Peer-To-Peer Networks Chunlin Yang. What’s P2P - Unofficial Definition All of the computers in the network are equal Each computer functions.

HERO: Online Real-time Vehicle Tracking in Shanghai Xuejia Lu 11/17/2008.

Introduction of P2P systems

Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.

1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 18: Data Management in Peer-to-Peer Systems Professor Chen Li Based on slides.

Representing and Using Graphs

Routing Indices For P-to-P Systems ICDCS Introduction Search in a P2P system –Mechanisms without an index –Mechanisms with specialized index nodes.

Structuring P2P networks for efficient searching Rishi Kant and Abderrahim Laabid Abderrahim Laabid.

Super-peer Network. Motivation: Search in P2P Centralised (Napster) Flooding (Gnutella)  Essentially a breadth-first search using TTLs Distributed Hash.

1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.

Lecture 3: Uninformed Search

Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.

By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase.

LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.

P2p, Fall 06 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Search in Unstructured P2p.

Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.

1. Outline  Introduction  Different Mechanisms Broadcasting Multicasting Forward Pointers Home-based approach Distributed Hash Tables Hierarchical approaches.

Basic Problem Solving Search strategy  Problem can be solved by searching for a solution. An attempt is to transform initial state of a problem into some.

1 Query-Flood DoS Attacks in Gnutella by Andreas Legrum based upon a paper by Neil Daswani and Hector Garcia-Molina.

1 Reading Report 3 Yin Chen 20 Feb 2004 Reference: Efficient Search in Peer-to-Peer Networks, Beverly Yang, Hector Garcia-Molina, In 22 nd Int. Conf. on.

Evaluation GUESS and Non-Forwarding Peer-to-Peer search ICDCS paper Beverly Yang Patrick Vinograd Hector Garcia-Molina Computer Science Department, Stanford.

1 Improve search in unstructured P2P overlay. 2 Peer-to-peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g.,

CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.

Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks

School of Electrical Engineering &Telecommunications UNSW Cost-effective Broadcast for Fully Decentralized Peer-to-peer Networks Marius Portmann & Aruna.

09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.

Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.

William Stallings Data and Computer Communications

Improving Performance in the Gnutella Protocol

Peer-to-Peer Information Systems Week 6: Performance

Presentation transcript:

Efficient Search - Overview Improving Search In Peer-to-Peer Systems Presented By Jon Hess cs294-4 Fall 2003

Efficient Search - Overview Goals –Present alternative searching methods for systems with loose integrity constraints Probabilistically optimize over loose constraints Must not sacrifice user satisfaction –Measure hypothesized methods –Analyze the measurements

Efficient Search - Overview Current peer-to-peer systems –Offer efficient search, but Restrict data placement Restrict search breadth –Provide data consistency –Provide data availability

Efficient Search - Overview Current file-sharing systems do not require –Top notch data availability or consistency –Search by identifier So they do not need –Restrictive data, and pointer, placement So they may –Make probabilistic optimizations

Efficient Search - Overview Problem Overview –System should Accept a query Respond with a set of pointers to files –System is composed of an overlay of peers Upon receiving a query, a peer –Processes the query –Responds with matches –Forwards query to neighbors

Efficient Search - Overview Cost Measurement –Processing Cost Deciding which neighbors to route to Checking local metadata for query matches. –Bandwidth Cost Forwarding messages Responding to queries –Calculate over a set of queries –Measure the aggregate network cost

Efficient Search - Overview Quality Measurement –Size of result set Maximum of the fraction of number of results with relation to some goal Z –Min ( Results / Z, 1.0 ) Latency from query to Z th, or final result –Min ( time( Z ), time( final ) )

Efficient Search - Methods Current Technique –Gnutella - breadth first flood Efficiencies –Maximizes number results –Minimizes latency Inefficiencies –The same node may process the same query twice –Many more than Z results may be achieved

Efficient Search - Methods Current Technique –Freenet – serial depth first search Efficiencies –Achieves exactly Z matches Inefficiencies –Maximum Latency –No parallel searching

Efficient Search - Methods Suggestions –Iterative deepening Try not to locate more than Z matches –Directed breadth first flood Don’t traverse wasteful links –Nearest neighbor Amortize processing cost across memory and neighbors Does not need to make final hop[s] to search furthest neighbors

Efficient Search - Methods Iterative Deepening –Successively send deeper reaching queries until Z is satisfied –Policy: set of depths {d 0 … d n } and a time t –Send query to depth d i Nodes at depth d i freezes query after processing If Z is satisfied stop If not, send a re-query message to nodes at d i instructing them to forward the query to d i+1

Efficient Search - Methods Iterative Deepening – {0, 2, 4, 5}, 3 Holding Frozen QueryAlready Processed QueryUnaware of Query

Efficient Search - Methods Iterative Deepening – {0, 2, 4, 5}, 3 Holding Frozen QueryAlready Processed QueryUnaware of Query

Efficient Search - Methods Iterative Deepening – {0, 2, 4, 5}, 3 Holding Frozen QueryAlready Processed QueryUnaware of Query

Efficient Search - Methods Iterative Deepening – {0, 2, 4, 5}, 3 Holding Frozen QueryAlready Processed QueryUnaware of Query

Efficient Search - Methods Directed Breadth First Search –Source Only sends queries to good neighbors –Good neighbors might have Produced results in the past Low latency Lowest hop count for results –They have good neighbors Highest traffic neighbors –They’re stable Shortest message queue –Routed as normal BFS after first hop

Efficient Search - Methods Directed Breadth First Search

Efficient Search - Methods Directed Breadth First Search

Efficient Search - Methods Directed Breadth First Search

Efficient Search - Methods Directed Breadth First Search

Efficient Search - Methods Directed Breadth First Search

Efficient Search - Methods Neighbor Index –Policy: Set of query depths {d 0, …, d n } and index depth r. –Each node holds an index of neighbor’s metadata within n hops –Nodes propagate local metadata upon joining –Only nodes at depths d i process queries Other nodes simply forward until d n

Efficient Search - Methods Neighbor Index – {0, 3, 5}, 1

Efficient Search - Methods Neighbor Index – {0, 3, 5}, 1

Efficient Search - Methods Neighbor Index – {0, 3, 5}, 1

Efficient Search - Methods Neighbor Index – {0, 3, 5}, 1

Efficient Search - Methods Neighbor Index – {0, 3, 5}, 1

Efficient Search - Methods Neighbor Index – {0, 3, 5}, 1 Processor For QueryForwarder of QueryUnaware of Query

Efficient Search - Experiment Data Collection –Inject a custom client into the Gnutella network for 1 month –Collect statistics Query relationships Average files per user Average length of messages Many other statistics –Calculate the expected cost of the proposed methods from the harvested data

Efficient Search - Experiment Iterative Deepening & Neighbor Index –For each Query q and Depth d Send Q with TTL d –See how many matches nodes at each depth possess Directed Breadth First Flood –Queries are sent to each neighbor once at a time, with a different query ID.

Efficient Search - Results Iterative deepening –Study the policies P 1 = {1, 2, 3, 4, 5, 6, 7} P 2 = {2, 3, 4, 5, 6, 7} … P 6 ={6, 7} P 7 ={7} For W = {1, 2, 4, 6, 150}

Efficient Search - Results As freeze time decreases, or policy number increases, Z is overshot. As freeze time decreases or policy number increases, we achieve Z faster. If all nodes used this policy would average satisfaction time decrease?

Efficient Search - Results Shortest average time to satisfaction obviously wins. Surprisingly, the greatest number of results method does very well. Greatest number of results, and shortest time to satisfaction do the best on the other benchmarks, but cost the most – obviously, results require bandwidth. This cost is still less than BFS.

Efficient Search - Results Query radius has exponential relationship to bandwidth cost. Only a 5MB Index for peers 5 hops away. If all nodes used this policy would average satisfaction time decrease?

Efficient Search - Questions Questions?