Download presentation
Presentation is loading. Please wait.
Published byLindsay Wilcox Modified over 9 years ago
1
Page 1 Semantic Data Compression Techniques for NASA and Mobile Computing Databases Principal Investigators: G. Ozsoyoglu, Z.M. Ozsoyoglu Case Western Reserve University Nov 7, 2002
2
Page 2 Semantic Data Compression Relevance and Impact Relevance: Table data occurs frequently in computer networks, distributed mobile networks, and telecommunication networks such as the Earth Science Enterprise, Space Science Enterprise, Mars Network, and Space-Based Internets of NASA. Compression and querying of stream data is directly applicable to NASA projects. Impact: Databases will be compressed on a “query-need” basis. Query engines will be aware of the compression employed and perform efficient querying.
3
Page 3 A large number of syntactic compression techniques. Syntactic compression: Compress byte strings. Semantic Compression (new): Employ data semantics in approximating data. Answer queries with a guaranteed upper bound on the error of approximation. * Representative tuples and outliers (row-wise relationships) * Classification and regression trees (column-wise) * Employ attribute domain information. Current State of the Art
4
Page 4 Project Goals Semantic-based relational database compression High Data Compression Ratios Efficient Query Processing Techniques User-Specified Query Error Bounds Suitable for Real-Time Computing (when needed) Suitable for time-constrained query processing
5
Page 5 Details #1 Lossy compression Relation R Compressed Relation R c RidAgeSalary r12050K r27065K r33040K r44090K r550120K r650145K Rid P1.pid P1. Signature P1. Outlier P2.Pid P2. Signature P2. Outlier r1p1YYp3YY r2p1NY70P4NN65K r3p1YYp4YY r4p2YYP5YY r5p2YYP6YY r6p2YN145Kp6YN Representative Relation P 1 with error tolerances t Age = 10 and t Salary = 15K PidAgeSalary p12050K p250105K Tuple p1 represents the rectangle: 70K 20K 1030 * (20, 50K)
6
Page 6 Details #2 Multi-level lossy compression RidP1.pid P1. Signature P1. Outlier P2.Pid P2. Signature P2. Outlier r1p1YYp3YY r2p1NY70P4NN65K r3p1YYp4YY r4p2YYP5YY r5p2YYP6YY r6p2YN145Kp6YN PidAgeSalary p32050K p43040K Representative Relation P2 Representative Relation P3 Error Tolerances t Age = 0 t Salary = 0 PidAgeSalary p44090K p550120K
7
Page 7 Details #3 * Monotonically Decreasing Error Bounds: Error tolerances t Age =10 and t Salary = 15K Error Tolerances t Age = 0 and t Salary = 0 * Guaranteed Query Error Bounds * User-specified Guaranteed Error Bounds in Queries: SELECT … FROM.. WHERE … ERROR BOUND Age = and Salary =
8
Page 8 Details #4 Compromise between query processing efficiency and guaranteed error bounds: One Extreme: Main-memory-only query processing; Large error bounds; Small query processing times. Other Extreme: Disk-based query processing; Small error bounds; Large query processing times.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.