Peer to Peer Information Retrieval

Peer to Peer Information Retrieval
Going beyond Napster

What is P2P IR? No index on a central server
Content is distributed across all users of the system Content is more then text Binary files Associated Metadata

An example of a P2P system

Why go P2P Spiraling costs of maintaining indexes
Look at Google’s server farm New content forces new thinking on IR Large binary files are hard to index Freedom of speech Society is striving to communicate data which is being legislated against

First P2P Systems Central hash of distributed content
Only the central hash was used for queries Disadvantages: Scalability Known location of content Single point of failure Advantages Quick searching Deterministic search results

Bumps that caused change
Legal Centralized services were easy targets Owners of index could not claim they had no knowledge of content Growth Cost of maintaining service grew Hardware requirements exploded

Decentralized P2P Content spread between users w/ no explicit intent
Centralized server is replaced by self-maintaining network Every user is also a server There is no index of content How do we search?

Searching Decentralized P2P Systems
Many methods, none perfected yet Broadcast search Advantages Every node takes part in query Disadvantages As system grows, network bandwidth, query time grow exponentially

Intelligent P2P Crawls Ways to improve decentralized P2P query
Intelligently place data (FreeNet) By knowing the algorithm that distributes data, querying can be done more intelligently Clustering (Fireworks model) Clients with similar properties are logically grouped Queries that don’t apply to a group will not be sent to that entire group of clients Both change the paradigm of what kind of data is shared and the means of sharing

Other improvements Today, most networks still rely on brute-force-search CRC/MD5 hashing A checksum of each file is computed Instead of searching metadata, search for file hash Files that are identical, but mislabeled, are still returned

Query time limiting Save on inter-system bandwidth, searches terminate after X hops Client ends query after 100 results Searches time out after X seconds

Distributed IR Traditional IR with the advantages of distributed systems A central server still stores the index Multiple brokers allow access to the data repository Multiple gatherers crawl data near to them Advantages are seen in the data acquisition end

Examples

Future Directions Next steps will be drastic re-thinking of content placement ala FreeNet Donate X amount of bandwidth, Y amount of HD space Share Z directories of content Actual content files are distributed to the network intelligently Most requested files are blanketed Unique files are still accessible

Future directions for Traditional IR
Large central repositories such as Google will fade Internet will be fragmented into clusters of interest Similar interest groups will have decentralized search facilities An index of these groups will replace the Google’s of today

Peer to Peer Information Retrieval

Similar presentations

Presentation on theme: "Peer to Peer Information Retrieval"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Peer to Peer Information Retrieval

Similar presentations

Presentation on theme: "Peer to Peer Information Retrieval"— Presentation transcript:

Similar presentations

About project

Feedback