Download presentation
Presentation is loading. Please wait.
Published bySheena Fields Modified over 9 years ago
1
TDX: a High-Performance Table-Driven XML Parser Wei Zhang Robert van Engelen Department of Computer Science Florida State University
2
2 Outline Motivation Introduction Recent Work Table-Driven XML Parsing – TDX TDX Construction Toolkit Results and Preliminary Conclusion
3
3 Motivation Enhance performance for XML-based Web Services Provide flexibility Offer high-level modularity
4
4 Roadmap Motivation Introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary Conclusion
5
5 Introduction Validating XML Parsing Three stages Well-formedsness Validation Data conversion Frequent access to schema Separation introduces overhead and requires frequent access to schema well-formedness data conversion validation XML application
6
6 Introduction (cont’d) Schema-specific XML parsing (SSP) Merging well-formedness and validation No requirement to frequent access to schema Separation stage of data conversion in implemented SSP Well-formedness Data Conversion Validation
7
7 Roadmap Motivation Introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary Conclusion
8
8 Recent Work Chiu: “A compiler-based cpproach to schema-specific XML parsing” Merging parsing and validation by constructing PDA No namespace support Conversion from NFA to DFA may result in exponentially growing space requirement
9
9 Recent Work(cont'd) van Engelen: “Constructing finite automata for high-performance web services” Integrates parsing and validation into one stage by parsing actions encoded by DFA Cannot process cyclic XML schema
10
10 Recent Work(cont'd) van Engelen: ”The gSOAP toolkit for web services and peer-to-peer Computing Networks ” Namespace support Merging parsing and validation Implementing a recursive-decent parsing Disadvantages of recursive-descent Code size and function calling overhead
11
11 Roadmap Motivation Introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary Conclusion
12
12 Table-XML Parsing (TDX) LL(1) grammar can be derived from schema XML documents can be parsed and validated using LL(1) grammar Well-formedness (parsing) can be verified through grammar rules Validation can be accomplished using semantic actions Application-specific events can also be encoded as semantic actions
13
13 Illustrating Example LL(1) Grammar: s ‘ ’ t ‘ ’ t t 1 t 2 t 1 ‘ ’ DATA //imp_s(s.val) ‘ ’ t 2 ‘ ’ DATA //imp_s(s.val) ‘ ’
14
14 Illustrating Example (cont'd) XML Tech Bob s (a) An XML Instance t t1t1 t2t2 imp_s(“XML Tech”) DATA imp_s(“Bob”) (b) Predictive Parsing DATA ‘ ’
15
15 Roadmap Recent Work Table-Driven XML parsing – TDX Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/tokenizer TDX construction Tool Kit Experiment Results and Preliminary Conclusion
16
16 TDX - Architecture Token CDATA Tokens LL(1) Parsing Table Ll(1) Grammar Productions and Actions Events Error: invalid Modules application Scanner/ Tokenizer (DFA) Parsing Engine (TDX)
17
17 Roadmap Recent Work Table-Driven XML parsing – TDX Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer TDX construction Tool Kit Experiment Results and Preliminary Conclusion
18
18 Token Generation Defined by Element name (opening and closing) Attribute name some data type Such as Enumeration Namespace binding Identical tag names under different namespaces are represented as different tokens Normalized tokens
19
19 Roadmap Recent Work Table-Driven XML parsing – TDX Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer TDX construction Tool Kit Experiment Results and Preliminary Conclusion
20
20 Mapping Schema to LL(1) Grammar Structural constraints are mapped to rules Validation constraints are mapped to semantic actions Note that many types of validation constraints are mapped to rules Such as occurrence, enumeration
21
21 Mapping Example(1) state “OFF” | “ON” value DATA//imp_i(char *s)
22
22 <element name=“id” type=“id_type” minOccurs=“0”/> <element name=“value” type=“value_type” minOccurs=“2” maxOccurs=“unbounded”/> Mapping Example(2) c 1 ‘ ’ id_type ‘ ’ example c 1 | c 2 c 2 c’ 2 c’ 2 c’’ 2 example c 1 c 2 c’ 2 ‘ ’ value_type ‘ ’ c1 c1 c’’ 2 c’’ 2 c’ 2 c’’ 2
23
23 Roadmap Recent Work Table-Driven XML parsing – TDX Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer TDX construction Tool Kit Experiment Results and Preliminary Conclusion
24
24 LL(1) Parsing Table Constructed from LL(1) grammar Indexed by nonterminals and terminals Contains either index of grammar production or error entry
25
25 Roadmap Recent Work Table-Driven XML parsing – TDX Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer TDX construction Tool Kit Experiment Results and Preliminary Conclusion
26
26 Parsing Engine Schema Independent Maintains Parsing table Production table Action table Stack
27
27 Roadmap Recent Work Table-Driven XML parsing – TDX Illustrating example Architecture Token generation Mapping schema to LL(1) Parsing table Parsing engine Scanner/Tokenizer TDX construction Tool Kit Experiment Results and Preliminary Conclusion
28
28 Scanner/Tokenizer Constructed from schema Schema provides DFA states information Element name Has attribute? Attribute name Root element needs special care Schema information
29
29 Scanner/Tokenizer example <book xmlns:x ="http://www.x.org" xmlns:y ="http://www.y.org" targetnamespace ="http://www.x.org"> XML Bible Bob professor DATA
30
30 Roadmap Motivation introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary Conclusion
31
31 TDX Construction Toolkit Service.wsdl wsdl2TDX Service_flex.l Service_TDX.h tab.yy.c Service_TDX.c flex
32
32 Roadmap Motivation introduction Recent Work Table-Driven XML parsing – TDX TDX construction Tool Kit Experiment Results and Preliminary Conclusion
33
33 Experiment Setup Compare with DFA-based Parser gSOAP 2.7 eXpat 1.2 Xerces 2.7.0 Memory-resident XML message Elapsed real time using timeofday()
34
34 Parsing Performance(1)
35
35 Parsing Performance (2)
36
36 Conclusion Enhance parsing speed Flexible framework Encoding value-based validation and application-specific events as semantic rules Combining structural, syntactic and semantic constraints in one pass High-level of modularity
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.