Download presentation
Presentation is loading. Please wait.
Published byMitchell Underwood Modified over 9 years ago
1
Angela McCarthy CP5080, SP1 2010
2
Received: 14 August 2008 Revised: 13 November 2008 Written by Sherif Sakr of University of New South Wales, Australia eXtensible Markup Language (XML), standard for data representation over World Wide Web Large document sizes, compression introduced to deal with issues Paper provides survey over compression techniques
3
Author looking at XML compression techniques and launch a study ◦ Surveys each of the different compression techniques and compares advantages and disadvantages of each Data transmitted online is rather large ◦ XML usage is growing, thus a demand for efficient XML compression tools exists
4
Contributions made: ◦ Comprehensive survey of XML compression techniques ◦ A rich XML corpus collected and constructed Contains wide variety of XML data sources, natures and document sizes ◦ Detailed results examining performance and characteristics ◦ Work repeatable Webpage of study provides access to test files, examined XML compressors and detailed results of study
5
Each section goes through each of the classifications of compressors General Text Compressors ◦ Treats XML as plain text, uses traditional text compression techniques XML Conscious Compressors ◦ Takes advantage of awareness of XML files ◦ Uses document structure to achieve better compression rates
6
Non-Queriable (Archival) XML Compressors ◦ No queries can be processed over compressed format ◦ Focus to achieve highest compression ratio Queriable XML Compressors ◦ Queries can be processed over compressed format ◦ Compression ratio actually worse then archival XML compressors ◦ Focus to avoid full document decompression during query execution
9
Large variety of data sets (see previous) ◦ From 0.5MB to 1.3GB ◦ Four Categories Structural Documents Textual Documents Regular Documents Irregular Documents Testing Environments ◦ To ensure consistency, two different were environments used, high VS low
10
Performance Metrics measured and compared ◦ Compression Ratio Ratio between sizes of compressed and uncompressed Compression Ratio = (Compressed Size)/(Uncompressed Size) ◦ Compression Time Elapsed time during compression process ◦ Decompression Time Elapsed time during decompression process The lower the metric value, the better the compressor
11
11 XML Compressors Evaluated ◦ Three general purpose text compressors Gzip, bzip2, PPM ◦ Eight XML conscious compressors XMillGzip, XMillBzip, XMillPPM, XMLPPM, SCMPPM, XWRT, AXECHOP ◦ Compressors evaluated under default settings ◦ Additional experiments run with tuned parameters for highest level of compression paramters ◦ In total, 16 variant compressors
12
Ideally want to provide a global ranking on XML compression tools Results show there is no clear winner ◦ Dependant upon the weight of each metric Three ranking functions ◦ – WF1 = (1/3 ∗ CR)+(1/3 ∗ CT)+(1/3 ∗ DCT) ◦ – WF2 = (1/2 ∗ CR)+(1/4 ∗ CT)+(1/4 ∗ DCT) ◦ – WF3 = (3/5 ∗ CR)+(1/5 ∗ CT)+(1/5 ∗ DCT) CR represents the compression ratio metric, CT represents the compression time metric and DCT represents the decompression time metric
16
Paper surveyed state-of-the-art XML compression techniques Reported the behaviour of various different XML compressors using large corpus of XML documents Paper could be valuable for ◦ Developers of new XML compression tools ◦ Users for making an effective decision on most suitable compressor for requirements Fig 7. Shows none of XML conscious compressors has achieved outstanding compression ratio
18
Planning to continue maintaining and updating webpage of study with further evaluations Enable visitors to perform online experiments using set of available compressors and own XML documents
19
Large number of references ◦ Due to different compression techniques used Large amount of data Thorough in research methods ◦ Large amount of data tested ◦ Tested on different systems ◦ Tested using different techniques Abbreviations/Acronyms given ◦ Designed for specific audience Paper seems to be a reference tool ◦ User to read to help decide on which compression tool to use
20
Thanks for listening!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.