Download presentation
Presentation is loading. Please wait.
Published byShavonne Rogers Modified over 9 years ago
1
An Overview of Different Compression Algorithms Their application on compressing inverted files
2
Alternative Compression Algorithms Arithmetic coding Huffman coding Character-based Word-based Dictionary-based coding – Ziv-Lempel family of coding
3
Pros and Cons of Different Algorithms ArithmeticCharacter Huffman Word Huffman Ziv-Lempel Compression ratio very goodpoorvery goodgood Compression speed slowfast very fast Decompression speed slowfastvery fast Memory spacelow highmoderate Pattern matchingnoyes Random Accessnoyes no
4
Choosing an Compression Algorithm for inverted files Factors need to be considered Compression ratio Speed Random access In modern IR system, Word-based Huffman coding is commonly used There are a lot of research on Ziv-Lempel family coding to see if they can be applied to indices compression
5
An Improved Sliding-window Ziv-Lempel Algorithm Conventional LZ family compression algorithms use a sliding window approach. Based on longest matching length (m-length) An improved sliding window LZ algorithm is proposed by Bender and Wolf. Instead of m-length, the improved algorithm is based on the offset of the length (o-length) and the differential of the length ( -length)
6
Benefits of the Improved Algorithm Better compression ratio in the experiment Still linear compression and searching: O(n). It didn’t really provide an LZ algorithm that support random access.
7
Another Modified LZ algorithm Proposed by Williams Use literal/copy item; Each step, transmit original if it is a literal item, a pointer if it is a copy item; Aimed at faster compression speed and smaller memory footprint. Better used in the embedded system where real- time compression is required. Inappropriate for index compression.
8
Conclusion Up to date, the best practical compression algorithm for index is still word-based Huffman coding. There are theoretical studies about Ziv- Lempel family coding. Non of them are practically applicable to our problem. But they can be used in other areas.
9
Reference An Improved Data Compression Algorithm Based on Ziv-Lempel Data Compression Algorithm, Paul Edward Bender and Jack Keil Wolf; An Extremely Fast Ziv-Lempel Data Compression Algorithm, Ross N. Williams; Modern Information Retrieval, Ricardo Baeza- Yates and Berthier Ribeiro-Neto;
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.