Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Overview of Different Compression Algorithms Their application on compressing inverted files.

Similar presentations


Presentation on theme: "An Overview of Different Compression Algorithms Their application on compressing inverted files."— Presentation transcript:

1 An Overview of Different Compression Algorithms Their application on compressing inverted files

2 Alternative Compression Algorithms Arithmetic coding Huffman coding Character-based Word-based Dictionary-based coding – Ziv-Lempel family of coding

3 Pros and Cons of Different Algorithms ArithmeticCharacter Huffman Word Huffman Ziv-Lempel Compression ratio very goodpoorvery goodgood Compression speed slowfast very fast Decompression speed slowfastvery fast Memory spacelow highmoderate Pattern matchingnoyes Random Accessnoyes no

4 Choosing an Compression Algorithm for inverted files Factors need to be considered Compression ratio Speed Random access In modern IR system, Word-based Huffman coding is commonly used There are a lot of research on Ziv-Lempel family coding to see if they can be applied to indices compression

5 An Improved Sliding-window Ziv-Lempel Algorithm Conventional LZ family compression algorithms use a sliding window approach. Based on longest matching length (m-length) An improved sliding window LZ algorithm is proposed by Bender and Wolf. Instead of m-length, the improved algorithm is based on the offset of the length (o-length) and the differential of the length (  -length)

6 Benefits of the Improved Algorithm Better compression ratio in the experiment Still linear compression and searching: O(n). It didn’t really provide an LZ algorithm that support random access.

7 Another Modified LZ algorithm Proposed by Williams Use literal/copy item; Each step, transmit original if it is a literal item, a pointer if it is a copy item; Aimed at faster compression speed and smaller memory footprint. Better used in the embedded system where real- time compression is required. Inappropriate for index compression.

8 Conclusion Up to date, the best practical compression algorithm for index is still word-based Huffman coding. There are theoretical studies about Ziv- Lempel family coding. Non of them are practically applicable to our problem. But they can be used in other areas.

9 Reference An Improved Data Compression Algorithm Based on Ziv-Lempel Data Compression Algorithm, Paul Edward Bender and Jack Keil Wolf; An Extremely Fast Ziv-Lempel Data Compression Algorithm, Ross N. Williams; Modern Information Retrieval, Ricardo Baeza- Yates and Berthier Ribeiro-Neto;


Download ppt "An Overview of Different Compression Algorithms Their application on compressing inverted files."

Similar presentations


Ads by Google