Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thesis Presented By Mohammad Abul Kalam Azad C011054 Shabbir Ahmad C011051 Francis Palma Tony C013038 Supervised by S. M. Kamruzzaman Assistant.

Similar presentations


Presentation on theme: "Thesis Presented By Mohammad Abul Kalam Azad C011054 Shabbir Ahmad C011051 Francis Palma Tony C013038 Supervised by S. M. Kamruzzaman Assistant."— Presentation transcript:

1 Thesis Presented By Mohammad Abul Kalam Azad C Shabbir Ahmad C Francis Palma Tony C Supervised by S. M. Kamruzzaman Assistant Professor Department of Computer Science and Engineering International Islamic University Chittagong

2 An Efficient Technique for Text Compression

3 Summery of Presentation
☼ Introduction. ☼ Methodology. ☼ How Many Words. ☼ Word Lookup Table. ☼ Word Storing Architecture. ☼ Memory Allocation Method. ☼ Memory Space Requirement. ☼ Data Management Algorithm. ☼ Data Compression. ☼ Experimental Result. ☼ Conclusions.

4 Introduction ☼ Data Storage. ☼ Information Management Using Symbol.
☼ Secret Data Communication. ☼ Storage Requirement Reduction Problem. ☼ Reduction. ☼ Compression.

5 Methodology The data compression will be done in two phases:
☼ Reduction Using Lookup Table. ☼ Compression Using Deflate Algorithm.

6 How Many Words in English
Webster’s Third New International Dictionary 470,000 entries. The Oxford English Dictionary, second Edition, reports that it’s include similar number. Including: Abbreviation, Phrase, Taboo words, Dialects, Family words. Excluding: Names of entirely scientific terms.

7 Word Lookup Table A word lookup table is a special tabular data file containing the text dimension of a word as an attribute of an address, which is used to pop up text to display the possible text data.

8 Word Lookup Table A lookup table is defined by its capability of addressing. Expressed in bits. 2^ 19 = That is we need 19-bit lookup table.

9 Figure 1. The architecture of the stored word in lookup table
Word Storing Architecture Index Address Stored Word in Details Ending Signal 19-bit Address 7 Character long word Termination signal 19 bit 7*7 = 49 bit 7 bit Total 75 bit Figure 1. The architecture of the stored word in lookup table

10 Word Storing Architecture
Table 1. Words in word lookup table Index Address Word stored in details Ending signal 19-bit address a .. and Computer ... Intelligent

11 Word Storing Architecture
Table 2. Special situation handling addresses Index address Word stored in details Ending signal 01--10 Termination of address 01--01 Single Uppercase Multiple Uppercase 01--11 Multiple Uppercase Ter. 01--00 Title Case Single tOGGLE cASE Multiple tOGGLE cASE Multiple tOG. cAS. Ter.

12 Word Storing Architecture
Table 3. Entry of different punctuation signs Index address Word stored in details Ending Signal 01--00 . 01--01 .+Backspace 11--10 , 11--11 ,+Backspace

13 Example Example 1 Example 2 Gen. He is a very good boy . 176 LUT 19
133 Example 2 Gen. Sometimes I need some helping too . 248 LUT 19 133

14 Example Example 3 Example 4 Gen. Although computers may have basic
similarities , 376 LUT 19 133 Example 4 Gen. Several systematic tabular methods For machine reduction exits . 512 LUT 19 171

15 Memory Space Requirement
Space = 2^19 * 75 bits = * 75 bits = bits = Bytes = 4800 Kilo Bytes = Mega Bytes

16 Memory Allocation Method
Table 4. The Hash table Index address Starting Character a and ..... buy ... yolk zoo

17 Figure 2: How the word lookup table will be stored
Memory Allocation Method Figure 2: How the word lookup table will be stored 11--01 and 11--10 Andiron …… 00--01 sun 00--10 Sun-bath

18 Data Management Algorithm
Algorithm UnRedToRed( ) 1. Read file 2. Read character to form a word until empty. 3. Finds its appropriate address from Hash table. 4. Find the word in Lookup Table. 5. If found then 6. Check case 7. If case = lower then 8. Fetch addresses 9. else Do the case management Fetch Address 12. Print the address 13. else 14. Give termination symbol. 15. Start ASCII storage (word) 16. Go to step 1. End.

19 Data Management Algorithm
Algorithm RedToUnRed ( ) 1. Read file 2. Fetch address. 3. Check Address status. 4. If word then, 5. Print the word. 6. If situation handles then, 7. Do according to it. 8. Go to step 2. End

20 Data Compression Methods
☼ Lossy Data Compression. ☼ Lossless Data Compression.

21 Lossless Data Compression
☼ Run Length Encoding (RLE). ☼ Huffman Coding. ☼ Lempel-Ziv 77 Encoding (LZ77). ☼ Deflate Algorithm.

22 Run Length Encoding <Esc> Specific Character <Frequency>
XXXXXXXXXXXXXXX that’s all, folks! <Esc> X <15> that’s all, folks!

23 Huffman Coding Finally we get a new binary code for each character:
Let a sequence of character with their frequencies are: A B C D E F G Finally we get a new binary code for each character: F : B : 01 C : A : 101 G : D : 1110 E : 1111

24 Lempel-Ziv 77 Encoding the_rain_in_Spain_falls_mainly_in_the_plain the_rain_ the_rain_<3,3> So, in binary, the pointer <3,3> would look like this: the_rain_<3,3>Sp the_rain_<3,3>Sp<9,4> the_rain_<3,3>Sp<9,4>falls_m<11,3>

25 Combination of Huffman coding and Lempel-Ziv 77 encoding
Deflate Algorithm Combination of Huffman coding and Lempel-Ziv 77 encoding

26 Experimental Result

27 Comparison with other methods
☼ In general compression rate from 12% to highest 50%. ☼ Proposed method 53% reduction

28 According to Proposed Method + Deflate Algorithm
Comparison with other zip software Compression Type Size Normal 78.53 KB According to Proposed Method + Deflate Algorithm 14.38 KB Gzip 29.61 KB Winzip 31.27 KB

29 Comparison with other zip software

30 Conclusion ☼ Text Data Storage Reduced to 53%.
☼ After Compression 75% - 80%. ☼ Faster Portable.

31 Questions

32 Thank You


Download ppt "Thesis Presented By Mohammad Abul Kalam Azad C011054 Shabbir Ahmad C011051 Francis Palma Tony C013038 Supervised by S. M. Kamruzzaman Assistant."

Similar presentations


Ads by Google