20 November 2002ApacheCon US - Las Vegas, Nevada 1 Xerces2: The Sequel With No Equal Andy Clark
ApacheCon US - Las Vegas, Nevada2 20 November 2002 Introduction Speaker Worked for IBM Currently unemployed Parser First developed in IBM’s Tokyo research lab Maintained and expanded in California Donated to Apache Work continues in Toronto
ApacheCon US - Las Vegas, Nevada3 20 November 2002 Agenda Xerces1 Overview Design and problems Xerces2 Overview Challenges and design Q & A
20 November 2002ApacheCon US - Las Vegas, Nevada 4 Xerces1 Overview: Design and Problems Andy Clark
ApacheCon US - Las Vegas, Nevada5 20 November 2002 Design XML4J/Xerces1 designed for performance Parser Implementation Parsing pipeline Custom reader implementations StringPool Defers transcoding of byte buffers until needed Symbol table for common document strings
ApacheCon US - Las Vegas, Nevada6 20 November 2002 ScannerValidatorParser Intended to be generic XML API Pipeline Configuration
ApacheCon US - Las Vegas, Nevada7 20 November 2002 ScannerValidatorParser Pipeline Configuration Problems Hard-coded dependencies on implementation Inconsistent Interfaces XML API
ApacheCon US - Las Vegas, Nevada8 20 November 2002 Custom Readers Scanner Entity Handler Reader Stack UTF-8 Reader UCS Reader EBCDIC Reader Generic Reader scanName scanAttValue scanContent …
ApacheCon US - Las Vegas, Nevada9 20 November 2002 Custom Readers Problems Duplicated code Allows more bugs to appear Bugs are different based on encoding because code is not shared More complicated
ApacheCon US - Las Vegas, Nevada10 20 November 2002 Deferred Transcoding XML StringPool Parser Component String Producer Reader Data Buffer Data Buffer … addString(String):int toString(int):String addString(StringProducer,int,int):int
ApacheCon US - Las Vegas, Nevada11 20 November 2002 Deferred Transcoding Problems All components need reference to StringPool Strings not immediately available to methods Must make call to StringPool to query String Memory management is complicated Responsibility of callee to free resources Uses more memory
20 November 2002ApacheCon US - Las Vegas, Nevada 12 Xerces2 Overview: Challenges and Design Andy Clark
ApacheCon US - Las Vegas, Nevada13 20 November 2002 Challenges Requirements Simple design and implementation Easy to maintain More modularity and configurability Support current and future features Design Decisions Always transcode bytes into Unicode characters Removes StringPool and dependencies Clean architecture
ApacheCon US - Las Vegas, Nevada14 20 November 2002 Xerces Native Interface (XNI) “Streaming” Information Set Similar to SAX No loss of document information* Parser configuration and layering Future extensions Native pull-parser, tree model, etc. * Does not preserve all document information but communicates more information to the application than DOM or SAX.
ApacheCon US - Las Vegas, Nevada15 20 November 2002 org.apache.xerces.xniorg.apache.xerces.xni.parser XMLDTDHandler XMLDTDContentModelHandler XMLDocumentFragmentHandler XMLLocator XMLDocumentHandler NamespaceContext XMLAttributes Augmentations QNameXMLString XNIException RuntimeException XMLPullParserConfiguration XMLErrorHandlerXMLEntityResolver XMLDTDScanner XMLDocumentScanner XMLDTDContentModelSourceXMLDTDContentModelFilter XMLDTDSourceXMLDTDFilter XMLDocumentSourceXMLDocumentFilter XMLComponentManagerXMLComponent XMLConfigurationException XMLParseException XMLInputSource XMLParserConfiguration java.lang Interface Class Package Extends XMLResourceIdentifier
ApacheCon US - Las Vegas, Nevada16 20 November 2002 Parsing Pipeline Handlers communicate information between parser components ScannerValidatorParser XML API
ApacheCon US - Las Vegas, Nevada17 20 November 2002 Handler Overview XML API Document Scanner ValidatorParser DTD Scanner XMLDocumentHandler XMLDTDHandler XMLDTDContentModelHandler
ApacheCon US - Las Vegas, Nevada18 20 November 2002 Parser Layout Components and Manager Component Manager Symbol Table Grammar Pool Datatype Factory Regular Components ScannerValidator Entity Manager Error Reporter Configurable Components
ApacheCon US - Las Vegas, Nevada19 20 November 2002 Reader Management Entity Scanner Entity Manager Reader Stack scanName scanAttValue scanContent … UTF-8 Reader UCS Reader EBCDIC Reader Generic Reader
ApacheCon US - Las Vegas, Nevada20 20 November 2002 Parser Configuration Before * Parser pipeline is part of the document parser base class. * Required duplication to re-configure parser and still take advantage of API generator code. XML SAX ParserDOM Parser Document Parser ScannerValidator
ApacheCon US - Las Vegas, Nevada21 20 November 2002 Parser Configuration After * Parser pipeline and settings are specified in a separate parser configuration object. * Allows re-use of framework without rewriting existing code. SAX ParserDOM Parser Document Parser Parser Configuration ScannerValidator XML
ApacheCon US - Las Vegas, Nevada22 20 November 2002 API Generators Different APIs can be generated from same document parser XNI SAX ParserDOM Parser … Document Parser JavaBean Parser
ApacheCon US - Las Vegas, Nevada23 20 November 2002 Sample Parser Configuration #1 HTML parser Available as NekoHTML download SAX ParserDOM Parser Document Parser HTML Parser Configuration HTML Scanner HTML Tag Balancer
ApacheCon US - Las Vegas, Nevada24 20 November 2002 Non-validating parser (for performance) Available with Xerces download SAX ParserDOM Parser Document Parser Non-Validating Parser Configuration Scanner / Namespace Binder XML Sample Parser Configuration #2
ApacheCon US - Las Vegas, Nevada25 20 November 2002 Sample Parser Configuration #3 XInclude processing Not yet implemented SAX ParserDOM Parser Document Parser XInclude Parser Configuration Scanner XML XInclude Validator
ApacheCon US - Las Vegas, Nevada26 20 November 2002 Sample Parser Configuration #4 Database result set converted to XML Not yet implemented SAX ParserDOM Parser Document Parser Database Parser Configuration Database Query Validator DB
ApacheCon US - Las Vegas, Nevada27 20 November 2002 That’s All, Folks! Question and Answers Any questions? Links