Download presentation
Presentation is loading. Please wait.
Published bySteven Wiggins Modified over 8 years ago
1
Compression & Networking
2
Background network links are getting faster and faster but many clients still connected by fairly slow links (mobile?) people wish to send more and more data battery life is a king problem how can we make this transparent to the user? senderreceiver data knowledge about data at receiver
3
Two standard techniques caching: “avoid sending the same object again” Done on the basis of “atomic” objects Thus only works if objects are unchanged How about objects that are slightly changed? compression: “remove redundancy in transmitted data” avoid repeated substrings in transmitted data can be extended to history of past transmissions (overhead) How if the sender has never seen data at receiver ?
4
Types of Techniques Common knowledge between sender & receiver Unstructured file: delta compression “partial” knowledge Unstructured files: file synchronization Record-based data: set reconciliation
5
Formalization Delta compression [diff, zdelta, REBL,…] Compress file f deploying known file f’ Compress a group of files Speed-up web access by sending differences between the requested page and the ones available in cache File synchronization [rsynch, zsync] Client updates its old file f old with f new available on a server Mirroring, Shared Crawling, Content Distr. Net Set reconciliation Client updates its structured file f old with file f new available on server Update of contacts or appointments, intersect IL in P2P search engine
6
Z-delta compression (one-to-one) Problem: We have two files f known and f new (known to both parties) and the goal is to compute a file f d of minimum size such that f new can be derived from f known and f d Assume that block moves and copies are allowed Find an optimal covering set of f new based on f known LZ77-scheme provides and efficient, optimal solution f known is “previously encoded text”, compress f known f new starting from f new zdelta is one of the best implementations Uses e.g. in Version control, Backups, and Transmission.
7
Efficient Web Access Use zdelta to reduce traffic: Old version is available at both proxies (one on client cache, and one on proxy) Restricted to pages already visited (30% cache hits), or URL-prefix match Dual proxy architecture: pair of proxies (client cache + proxy) located on each side of the slow link use a proprietary protocol to increase comm perf Small cache Client request Delta-encoding Proxy request Page web reference Cache reference Slow-linkFast-link
8
Cluster-based delta compression Problem: We wish to compress a group of files F Useful on a dynamic collection of web pages, back-ups, … Apply pairwise zdelta: find a good reference for each f F Reduction to the Min Branching problem on DAGs Build a (complete?) weighted graph G F, nodes=files, weights= zdelta-size Insert a dummy node connected to all, and weights are gzip-coding Compute the directed spanning tree of min tot cost, covering G’s nodes. spacetime uncompr 30Mb--- tgz20%linear THIS8%quadratic 1 2 3 123 20 620 2000 220 5 90
9
Improvement (group of files) Problem: Constructing G is very costly, n 2 edge calculations (zdelta exec) We wish to exploit some pruning approach Collection analysis: Cluster the files that appear similar and thus good candidates for zdelta-compression. Build a sparse weighted graph G’ F containing only edges between pairs of files in the same cluster Assign weights: Estimate appropriate edge weights for G’ F thus saving zdelta execution. Nonetheless, strict n 2 time spacetime uncompr 260Mb--- tgz12%2 mins THIS8%16 mins
10
File Synchronization
11
File synch: The problem client request to update an old file server has new file but does not know the old file updates f_old without sending the entire f_new rsync: file synch tool, distributed with Linux ServerClient update f_new f_old request Delta compression is a sort of local synch Since the server knows both files
12
The rsync algorithm ServerClient encoded file f_new f_old few hashes
13
The rsync algorithm (contd) simple, widely used, single roundtrip optimizations: 4-byte rolling hash + 2-byte MD5, gzip for literals choice of block size problematic (default: max{700, √n} bytes) not good in theory: granularity of changes may disrupt use of blocks
14
A new framework: zsync Server sends hashes (unlike the client in rsync), clients checks them Server deploys the common f ref to compress the new f tar (rsync compress just it).
15
Small differences (e.g. agenda) If d differences, then on each level d hashes not match, and need to continue Communication complexity is O(d lg(n/k) * lg n) bits [1 upward path per different k-block] n/k blocks of k elems each Log n/k levels match Server sends hashes to client per levels Client checks equality with hash-substrings
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.