Download presentation
Presentation is loading. Please wait.
Published byCaren Floyd Modified over 8 years ago
1
Benchmarking XML Processors for Applications in Grid Web Services Michael R. Head*, Madhusudhan Govindaraju*, Robert van Engelen**, Wei Zhang** *Grid Computing Research Laboratory Binghamton University (SUNY) **Florida State University
2
2006-11-16 GCRL Binghamton University 2 Outline ● Motivation ● XML Performance Obstacles ● Benchmark Suite ● Results for a Variety of XML Processors ● Recommendations and Conclusions ● Future Work
3
2006-11-16 GCRL Binghamton University 3 XML Defined ● Text based (usually UTF-8 encoded) ● Tree structured ● Language independent ● Generalized data format
4
2006-11-16 GCRL Binghamton University 4 Motivation from SOAP ● Generalized RPC mechanism ● Broad industrial support ● Web Services on the Grid – OGSA: Open Grid Services Architecture – WSRF: Web Services Resource Framework ● At bottom, SOAP depends on XML
5
2006-11-16 GCRL Binghamton University 5 XML Exclusive of SOAP ● General structured data format ● Becoming standard for many scientific datasets – HapMap – mapping genes – Protein Sequencing – NASA astronomical data – Many more instances
6
2006-11-16 GCRL Binghamton University 6 Benchmark Motivation ● Grid applications place a wide range of requirements on the communication substrate and data formats. ● Simple and straightforward implementations can have a severe performance impact.
7
2006-11-16 GCRL Binghamton University 7 XML Performance Limitations ● Compared to “legacy” formats – Text-based ● Lacks any “header blocks” (ex. TCP headers), so must scan every character to tokenize ● Numeric types take more space and conversion time – Lacks indexing ● Unable to quickly skip over fixed-length records
8
2006-11-16 GCRL Binghamton University 8 Array size: SOAP vs. Binary 5 times difference in size
9
2006-11-16 GCRL Binghamton University 9 CPU Usage when parsing doubles 90% of CPU time is being spent in floating point conversions
10
2006-11-16 GCRL Binghamton University 10 Parsing Optimizations in Use ● Look-aside buffers/String caching [gsoap, XPP] ● Trie data structure with schema-specific parser ● One pass table-driven recursive descent parser [TDX]
11
2006-11-16 GCRL Binghamton University 11 Benchmark Suite 1)A chosen set of XML documents – Low level probes – Application-based benchmarks 2)A driver application for each XML processor – Runs the parser on the input, but does not act on the data ● Eliminates application-level performance differences ● One for each interface style (SAX/DOM)
12
2006-11-16 GCRL Binghamton University 12 Benchmark Probes ● Overhead test – Minimal XML document ● (header plus one self-closing element) ● Buffering – Repeated use of xsi:type attributes ● Namespace management – Gratuitous use of xmlns attributes ● SOAP payloads
13
2006-11-16 GCRL Binghamton University 13 Application Benchmarks ● Ptolemy Workflow documents (which Kepler uses) ● Genetic data files – (Large) files from the International HapMap Project ● Molecular data ● Mesh interface objects, event streams (WSMG) ● WS-Security documents ● Eager for more
14
2006-11-16 GCRL Binghamton University 14 Results – Latency Overhead
15
2006-11-16 GCRL Binghamton University 15 C Parsers: SOAP Payloads
16
2006-11-16 GCRL Binghamton University 16 C Parsers: Application-level tests
17
2006-11-16 GCRL Binghamton University 17 Java Parsers: SOAP Payloads
18
2006-11-16 GCRL Binghamton University 18 Java Parsers: Application-level tests
19
2006-11-16 GCRL Binghamton University 19 TDX Performance SOAP payload of array of strings
20
2006-11-16 GCRL Binghamton University 20 Recommendations ● When handling disparate XML formats, different parsers, consider a pluggable XML handling mechanism ● Schema-specific parsing techniques (TDX for example) are very promising when schemas are known in advance ● When considering designs for multi-core architectures, using TDX may be far faster than attempting to parallelize the other existing processors
21
2006-11-16 GCRL Binghamton University 21 Community Relations ● Publicly available benchmark suite ● Encourage vendors, users, developers to contribute additional XML parsers and sample files as necessary – head@acm.org head@acm.org – mgovinda@cs.binghamton.edu mgovinda@cs.binghamton.edu – engelen@cs.fsu.edu engelen@cs.fsu.edu – http://grid.cs.binghamton.edu/projects/xmlbench http://grid.cs.binghamton.edu/projects/xmlbench
22
2006-11-16 GCRL Binghamton University 22 Future Work ● Various techniques to parallelize XML processing ● Add new XML parser tests to the suite – Add more tests for existing parsers ● Include more sample files ● Update web site with current performance snapshots
23
2006-11-16 GCRL Binghamton University 23 Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.