Presentation is loading. Please wait.

Presentation is loading. Please wait.

Angela McCarthy CP5080, SP1 2010.  Received: 14 August 2008  Revised: 13 November 2008  Written by Sherif Sakr of University of New South Wales, Australia.

Similar presentations


Presentation on theme: "Angela McCarthy CP5080, SP1 2010.  Received: 14 August 2008  Revised: 13 November 2008  Written by Sherif Sakr of University of New South Wales, Australia."— Presentation transcript:

1 Angela McCarthy CP5080, SP1 2010

2  Received: 14 August 2008  Revised: 13 November 2008  Written by Sherif Sakr of University of New South Wales, Australia  eXtensible Markup Language (XML), standard for data representation over World Wide Web  Large document sizes, compression introduced to deal with issues  Paper provides survey over compression techniques

3  Author looking at XML compression techniques and launch a study ◦ Surveys each of the different compression techniques and compares advantages and disadvantages of each  Data transmitted online is rather large ◦ XML usage is growing, thus a demand for efficient XML compression tools exists

4  Contributions made: ◦ Comprehensive survey of XML compression techniques ◦ A rich XML corpus collected and constructed  Contains wide variety of XML data sources, natures and document sizes ◦ Detailed results examining performance and characteristics ◦ Work repeatable  Webpage of study provides access to test files, examined XML compressors and detailed results of study

5  Each section goes through each of the classifications of compressors  General Text Compressors ◦ Treats XML as plain text, uses traditional text compression techniques  XML Conscious Compressors ◦ Takes advantage of awareness of XML files ◦ Uses document structure to achieve better compression rates

6  Non-Queriable (Archival) XML Compressors ◦ No queries can be processed over compressed format ◦ Focus to achieve highest compression ratio  Queriable XML Compressors ◦ Queries can be processed over compressed format ◦ Compression ratio actually worse then archival XML compressors ◦ Focus to avoid full document decompression during query execution

7

8

9  Large variety of data sets (see previous) ◦ From 0.5MB to 1.3GB ◦ Four Categories  Structural Documents  Textual Documents  Regular Documents  Irregular Documents  Testing Environments ◦ To ensure consistency, two different were environments used, high VS low

10  Performance Metrics measured and compared ◦ Compression Ratio  Ratio between sizes of compressed and uncompressed  Compression Ratio = (Compressed Size)/(Uncompressed Size) ◦ Compression Time  Elapsed time during compression process ◦ Decompression Time  Elapsed time during decompression process  The lower the metric value, the better the compressor

11  11 XML Compressors Evaluated ◦ Three general purpose text compressors  Gzip, bzip2, PPM ◦ Eight XML conscious compressors  XMillGzip, XMillBzip, XMillPPM, XMLPPM, SCMPPM, XWRT, AXECHOP ◦ Compressors evaluated under default settings ◦ Additional experiments run with tuned parameters for highest level of compression paramters ◦ In total, 16 variant compressors

12  Ideally want to provide a global ranking on XML compression tools  Results show there is no clear winner ◦ Dependant upon the weight of each metric  Three ranking functions ◦ – WF1 = (1/3 ∗ CR)+(1/3 ∗ CT)+(1/3 ∗ DCT) ◦ – WF2 = (1/2 ∗ CR)+(1/4 ∗ CT)+(1/4 ∗ DCT) ◦ – WF3 = (3/5 ∗ CR)+(1/5 ∗ CT)+(1/5 ∗ DCT)  CR represents the compression ratio metric, CT represents the compression time metric and DCT represents the decompression time metric

13

14

15

16  Paper surveyed state-of-the-art XML compression techniques  Reported the behaviour of various different XML compressors using large corpus of XML documents  Paper could be valuable for ◦ Developers of new XML compression tools ◦ Users for making an effective decision on most suitable compressor for requirements  Fig 7. Shows none of XML conscious compressors has achieved outstanding compression ratio

17

18  Planning to continue maintaining and updating webpage of study with further evaluations  Enable visitors to perform online experiments using set of available compressors and own XML documents

19  Large number of references ◦ Due to different compression techniques used  Large amount of data  Thorough in research methods ◦ Large amount of data tested ◦ Tested on different systems ◦ Tested using different techniques  Abbreviations/Acronyms given ◦ Designed for specific audience  Paper seems to be a reference tool ◦ User to read to help decide on which compression tool to use

20 Thanks for listening!


Download ppt "Angela McCarthy CP5080, SP1 2010.  Received: 14 August 2008  Revised: 13 November 2008  Written by Sherif Sakr of University of New South Wales, Australia."

Similar presentations


Ads by Google