Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peer to Peer Information Retrieval

Similar presentations


Presentation on theme: "Peer to Peer Information Retrieval"— Presentation transcript:

1 Peer to Peer Information Retrieval
Going beyond Napster

2 What is P2P IR? No index on a central server
Content is distributed across all users of the system Content is more then text Binary files Associated Metadata

3 An example of a P2P system

4 Why go P2P Spiraling costs of maintaining indexes
Look at Google’s server farm New content forces new thinking on IR Large binary files are hard to index Freedom of speech Society is striving to communicate data which is being legislated against

5 First P2P Systems Central hash of distributed content
Only the central hash was used for queries Disadvantages: Scalability Known location of content Single point of failure Advantages Quick searching Deterministic search results

6

7 Bumps that caused change
Legal Centralized services were easy targets Owners of index could not claim they had no knowledge of content Growth Cost of maintaining service grew Hardware requirements exploded

8 Decentralized P2P Content spread between users w/ no explicit intent
Centralized server is replaced by self-maintaining network Every user is also a server There is no index of content How do we search?

9 Searching Decentralized P2P Systems
Many methods, none perfected yet Broadcast search Advantages Every node takes part in query Disadvantages As system grows, network bandwidth, query time grow exponentially

10 Intelligent P2P Crawls Ways to improve decentralized P2P query
Intelligently place data (FreeNet) By knowing the algorithm that distributes data, querying can be done more intelligently Clustering (Fireworks model) Clients with similar properties are logically grouped Queries that don’t apply to a group will not be sent to that entire group of clients Both change the paradigm of what kind of data is shared and the means of sharing

11 Other improvements Today, most networks still rely on brute-force-search CRC/MD5 hashing A checksum of each file is computed Instead of searching metadata, search for file hash Files that are identical, but mislabeled, are still returned

12 Query time limiting Save on inter-system bandwidth, searches terminate after X hops Client ends query after 100 results Searches time out after X seconds

13 Distributed IR Traditional IR with the advantages of distributed systems A central server still stores the index Multiple brokers allow access to the data repository Multiple gatherers crawl data near to them Advantages are seen in the data acquisition end

14 Examples

15 Future Directions Next steps will be drastic re-thinking of content placement ala FreeNet Donate X amount of bandwidth, Y amount of HD space Share Z directories of content Actual content files are distributed to the network intelligently Most requested files are blanketed Unique files are still accessible

16 Future directions for Traditional IR
Large central repositories such as Google will fade Internet will be fragmented into clusters of interest Similar interest groups will have decentralized search facilities An index of these groups will replace the Google’s of today


Download ppt "Peer to Peer Information Retrieval"

Similar presentations


Ads by Google