Exploiting Similarity for Multi-Source Downloads Using File Handprints
Internet Many files available on Internet Many people download files from Internet Resource is limited, long time to download files Client bandwidth Server capacity Router congestion
Solutions Many files on Internet are duplicate By use of all the available sources, client use shorter time to download files per-file (Bit Torrent) per-chunk (CFS and Shark) O(N) lookup where N is no of chunks O(1) lookup O(1) insert mappings per file
How to do? Similarity Lookup the similar file in O(1) Low overhead of locating source
Similarity MP3 with different header Movies with different language Damage files (only few bytes of error) Compressed file with different additional files
Parallelism Optimistic metric Download different chunks at the same time Client select different source for different chunks Each source send one chunk at a time
Parallelism Conservative parallelism metric Download one chunk at a time Download chunk at different source
Parallelism
Parallelism
Parallelism
Handprinting Two files A and B Na no of chunks of A Nb no of chunks of B m chunks in common k selected hashes
Handprinting How many chunks (k) we selected Two files A and B Na no of chunks of A Nb no of chunks of B m chunks in common k selected hashes
Implemention
Evaluation
Evaluation
Evaluation
Evaluation
Evaluation
Evaluation
Q & A Thank You