NASA Space Communications Symposium Semantic Data Compression Techniques for Mobile Computing and Stream Data Principal Investigators: G Ozsoyoglu, Z.M. Ozsoyoglu Task Number: NAG3-2578 Case Western Reserve University September 18, 2002
Semantic Data Compression Project Overview Start Date: 8/1/2001 End date: 3/31/2003 Querying Compressed Tables: Designing compression-aware query languages Compromise between query expressive power and compression efficiency Querying Compressed Data Streams: Real-time, one-pass-only stream querying and compression efficiency
Semantic Data Compression Enterprise Relevance and Impact Enterprise Relevance: Table and stream data occur frequently in computer networks, distributed mobile networks, and telecommunication networks such as the Earth Science Enterprise, Space Science Enterprise, Mars Network, and Space-Based Internets of NASA. Compression and querying of stream data is directly applicable to NASA projects. Impact: Databases will be compressed on a “query-need” basis. Query engines will be aware of the compression employed and perform efficient querying.
Milestones - Technical Accomplishments and Schedules Task Title Placed Here Milestones - Technical Accomplishments and Schedules Due Date Milestone Description Tech Accomplishments 1 2 10/2001 10/2002 Survey table compression techniques. Compression-aware query processing algorithms Report generated. In progress. Schedule Status Schedule Deviation 1 Completed 2 On schedule
A large number of compression techniques. Syntactic compression: Compress byte strings. Semantic Compression: Employ data semantics in approximating data; Answer queries with a guaranteed upper bound on the error of approximation. Representative tuples and outliers (row-wise relationships) Classification and regression trees (column-wise rel.s) Employ attribute domain information.
Given a compressed database DB and query Q, Evaluate Q on DB without decompressing DB; decompress output. Best for existing query engines; low compression ratio. By first decompressing selected relations/columns. Cost: Rewriting tables before Q evaluation. By decompressing tuple components (selectively) during query evaluation. Cost: On a per-query basis. Requires query engine changes, fast random decompression. Algebraic Laws: Commutativity Op(DeCmp(T)) =? DeCmp(Op(T))
Semantic Data Compression Funding Issues This is a research initiation project with a two-year funding of $35,869. There are no funding issues.