Peer to Peer Information Retrieval

Slides:



Advertisements
Similar presentations
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Advertisements

2/66 GET /index.html HTTP/1.0 HTTP/ OK... Clients Server.
Evaluating scalability Peer-to-Peer File Sharing Networks of Sayantan Mitra Vibhor Goyal.
Peer-to-Peer Networks as a Distribution and Publishing Model Jorn De Boever (june 14, 2007)
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Spotlighting Decentralized P2P File Sharing Archie Kuo and Ethan Le Department of Computer Science San Jose State University.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
A. Frank 1 Internet Resources Discovery (IRD) Peer-to-Peer (P2P) Technology (1) Thanks to Carmit Valit and Olga Gamayunov.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
Object Naming & Content based Object Search 2/3/2003.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Text-Based Content Search and Retrieval in ad hoc P2P Communities Francisco Matias Cuenca-Acuna Thu D. Nguyen
Searching in Unstructured Networks Joining Theory with P-P2P.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-area cooperative storage with CFS
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
Peer-to-Peer Computing CS587x Lecture Department of Computer Science Iowa State University.
P EER - TO -P EER N ETWORKS Michael Fine 1. W HAT ARE P EER -T O -P EER N ETWORKS ? Napster Social networking spawned from this concept. Emerged in the.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
Bit Torrent A good or a bad?. Common methods of transferring files in the internet: Client-Server Model Peer-to-Peer Network.
PSI Peer Search Infrastructure. Introduction What are P2P Networks? The term "peer-to-peer" refers to a class of systems and applications that employ.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
P2PComputing/Scalab 1 Gnutella and Freenet Ramaswamy N.Vadivelu Scalab.
Freenet Ubiquitous Computing - Assignment Guided By: Prof. Niloy Ganguly Department of Computer Science and Engineering Submitted By: o Parin Deepak Cheda.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Peer to Peer Network Design Discovery and Routing algorithms
Peer to Peer Computing. What is Peer-to-Peer? A model of communication where every node in the network acts alike. As opposed to the Client-Server model,
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
P2P Networking: Freenet Adriane Lau November 9, 2004 MIE456F.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
IP and MAC Addresses, DNS Servers
Data Management on Opportunistic Grids
CS 268: Lecture 22 (Peer-to-Peer Networks)
Peer-to-Peer Data Management
Distributed Systems CS
CHAPTER 3 Architectures for Distributed Systems
Internet Networking recitation #12
The Advantages of Database
Peer-to-Peer Internet Networks
EE 122: Peer-to-Peer (P2P) Networks
Comparison of LAN, MAN, WAN
DHT Routing Geometries and Chord
A Scalable content-addressable network
Peer-to-Peer (P2P) File Systems
Peer-to-Peer Storage Systems
Distributed computing deals with hardware
Computer communications
Mobile P2P Data Retrieval and Caching
Distributed Systems CS
GCSE OCR 3 A451 Computing Client-server and peer-to-peer networks
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
InfoShare A Distributed P2P Information Storage & Retrieval System
Client/Server and Peer to Peer
Objectives Explain the role of computers in client-server and peer-to-peer networks Explain the advantages and disadvantages of client- server and peer-to-peer.
#02 Peer to Peer Networking
Presentation transcript:

Peer to Peer Information Retrieval Going beyond Napster

What is P2P IR? No index on a central server Content is distributed across all users of the system Content is more then text Binary files Associated Metadata

An example of a P2P system

Why go P2P Spiraling costs of maintaining indexes Look at Google’s server farm New content forces new thinking on IR Large binary files are hard to index Freedom of speech Society is striving to communicate data which is being legislated against

First P2P Systems Central hash of distributed content Only the central hash was used for queries Disadvantages: Scalability Known location of content Single point of failure Advantages Quick searching Deterministic search results

Bumps that caused change Legal Centralized services were easy targets Owners of index could not claim they had no knowledge of content Growth Cost of maintaining service grew Hardware requirements exploded

Decentralized P2P Content spread between users w/ no explicit intent Centralized server is replaced by self-maintaining network Every user is also a server There is no index of content How do we search?

Searching Decentralized P2P Systems Many methods, none perfected yet Broadcast search Advantages Every node takes part in query Disadvantages As system grows, network bandwidth, query time grow exponentially

Intelligent P2P Crawls Ways to improve decentralized P2P query Intelligently place data (FreeNet) By knowing the algorithm that distributes data, querying can be done more intelligently Clustering (Fireworks model) Clients with similar properties are logically grouped Queries that don’t apply to a group will not be sent to that entire group of clients Both change the paradigm of what kind of data is shared and the means of sharing

Other improvements Today, most networks still rely on brute-force-search CRC/MD5 hashing A checksum of each file is computed Instead of searching metadata, search for file hash Files that are identical, but mislabeled, are still returned

Query time limiting Save on inter-system bandwidth, searches terminate after X hops Client ends query after 100 results Searches time out after X seconds

Distributed IR Traditional IR with the advantages of distributed systems A central server still stores the index Multiple brokers allow access to the data repository Multiple gatherers crawl data near to them Advantages are seen in the data acquisition end

Examples

Future Directions Next steps will be drastic re-thinking of content placement ala FreeNet Donate X amount of bandwidth, Y amount of HD space Share Z directories of content Actual content files are distributed to the network intelligently Most requested files are blanketed Unique files are still accessible

Future directions for Traditional IR Large central repositories such as Google will fade Internet will be fragmented into clusters of interest Similar interest groups will have decentralized search facilities An index of these groups will replace the Google’s of today