Download presentation
Presentation is loading. Please wait.
1
1 Compressing Query Results for Mobile Clients Zhiyuan Chen and Praveen Seshadri Cornell University
2
2 Motivation(1) Database Server Slow network!!! Results Compress results on the server to save network bandwidth. Concern: compress as much as possible. 9.6 - 170 kbps, 1 - 10 minutes for 1 MB
3
3 Database Server I am a PDA, I can’t store that much for later use! –Often work offline. Need to store results for later use. –Severe storage constraints. –Usually only a small portion of results will be accessed, random access possible. Results Motivation(2) Store the result compressed, decompress on demand. Both space & decompression cost matters!
4
4 Why not just GZip or WinZip? Small decompression unit for PDA Utilize information of the query result: –Choose a combination of compression methods based on semantic and statistical information of the result. –Because different attributes have different characteristics, there is no unique winner. –The choice is made by cost-based optimization. reduce decompression cost More compression
5
5 An Example Query Select Year, Month, Day, Ticker, Low, High From Quotes where Year is between 1998 and 1999 Ordered by Year, Month, Day YearDayMonthTickerHighLow 990103T4240 1/16 990103IBM103100 3/8 990103DIS32 1/230 1/16 Semantic compression Grouping DictionaryUse 4 bits for each digit Universal compression Ziv-Lempel on each field
6
6 Outline Related work An algebraic framework –To represent the valid combinations. Compression optimization –To choose the best combination. Experiments –Chosen combinations beat universal compression tools like WinZip.
7
7 Related Work Compression community –Different compression methods. Run-length encoding, Ziv- Lempel, differential, etc. Database community – compress tables or indices –Specific compression methods Iyer & Wilhite - tuple level Ziv-Lempel. Goldstein etc.- per-page offset-encoding. Ng & Ravishankar - tuple differential coding. Theodore Johnson - compressing bitmap indices. –Impact on query processing (Roth&Horn, Graefe & Shapiro, Theodore)
8
8 Compression Framework Compressed results as a compressed table. A compression method as a compression operator: Input table -> output table. A combination of compressing methods as a compression plan Decompression operators and plans.
9
9 Compressed Table 10000 Customer_1 Customer_2 10003 10011 10001 Customer IDOrder # Customer_1 (Block 1) Customer_2 (Block 2) 10003 10000 10011 10001 Grouping A compressed block = a value compressed from some cells in the uncompressed table. Also the unit of decompression. A compression schema includes: Things compressed to a compressed block, compression method, relational schema. Field and tuple boundaries may blur. - Compressed data blocks Extra information enables decompression. - Compression schema
10
10 Compression Operator Defines how a compression method is applied on what part of the input table. The fields the operator is applied on. What input blocks will be compressed together. The compression method and information used in compression. Customer_1 (Block 1) Applied on Customer ID Same Customer IDs compressed together. Method = grouping.
11
11 Compression Plan Customer_1 Customer_2 10003 10000 10011 10001 Customer_1 Customer_2 10003 10011 10001 Customer ID Order # 10000 Customer_1 Customer_2 3 0 11 1 GroupingOffset-encoding A sequence of compression operators applied on the original result table, each takes the output of the previous operator as the input.
12
12 Optimization - Cost Model Formula –Cost = w 1 * compression cost + w 2 * decompression cost - w 3 * saving on network transfer - w 4 * saving on client side storage. –Adjust weight based on goal of compression. Compression cost –CPU speed, compression plans, results size. Decompression cost –Client processor speed, access pattern, etc. Provided by clients! Network transfer saving - compressed results size. Client side storage saving - If decompress on demand.
13
13 Searching Naïve algorithm has exponential search space. Heuristics: Consider semantic compression first. –For each field, use the naïve algorithm to find the best plan compressing this field only. –Combine these plans. Consider universal compression methods. –Add Ziv-Lempel applied on each field only if this will reduce the overall cost. Not complete but polynomial search space –O(#of fields * #of valid plans on each field)
14
14 Experiments Data: –TPCD Queries: –Adapted from TPCD queries by deleting aggregates. Experiment 1: To save network bandwidth. –Measure: the overall end-to-end time. Compression time + transfer time + decompression time. Experiment 2: To save PDA clients’ storage. –Measure: the space and random access time. Compare: Common tools v.s. chosen combinations.
15
15 –Semantic compression - allow decompress individual attribute values. (S) –WinZip on the whole table. (W) –WinZip applied on each field. (PW) –Semantic compression + WinZip applied on each field. (SPW) Un/compressed S: 3.1 W: 3.4 PW: 4.4 SPW:6.5 Modem Internet Wireless
16
16 Storage usage and time to randomly access 1000 tuples Compression plans : – Semantic compression (S) - allow to decompress an individual tuple. – Windows CE’s default compressor. (D) – Ziv-Lempel applied on each page. (Z) Result size: 3.7 MB. 2 MB Data storage, 2MB program storage. 50(+1 for semantic compression) KB program size.
17
17 Summary A combination of compression methods based on semantic and statistical information of the result. Choice made by cost-based optimization. A framework to model combinations of compression methods. Future work: –Apply the methodology to compress data tables. –Joint optimization of result compression & query processing.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.