Challenges in handling XML: performance and memory usage Sami Poikonen Republica oy
Republica Oy is Finland’s leading provider of products and services based on XML standards. Founded: 1996 Employees: 70+ (11/2001) Offices: Helsinki, Jyväskylä
1. DOM 2. SAX 3. DOM or SAX or something else Transformations 5. Conclusions TOC
Parsing XML: DOM Document Object Model standard API for accessing and creating xml data tree-based programming language indepedent developed by W3C whole document is read into memory read and write
DomNode book | |-->DomNode title | ||-->DomNode text | |-->DomNode author | |-->DomNode name Tuntematon sotilas
Parsing XML: SAX Simple API for XML API for accessing xml data event based programming language indepedent not defined by W3C application has to store fragments into memory read only
Roses are red, Violets are blue. Sugar is sweet, and I love you. Start element: poem Start element: line End element: line Start element: line End element: line Start element: line End element: line Start element: line End element: line End element: poem
DOM or SAX or something else? DOM: read and write need to move back and forth in data document is human created SAX: read only huge data or streams data is machine generated Best of both worlds? Adaptive parsing!
Transformations XSLT: XSL Transformations XSLT processors are built to use DOM XSLT to java conversion: still uses DOM SAX based custom-made application for trasformations Adaptive parsing with data binding?
Conclusions
Conclusions When building XML applications, you have to think how will you handle large chunks of data Choosing between SAX and DOM is not always trivial There are more smarter ways to parse XML also Adaptive parsing with data binding gives a lot of needed performance into transformations It is easy to reach the limits of XLST processing capabilities In some cases problems handling xml streams and large files has lead to assume that its is almost impossible to handle those
Republica Oyhttp:// Survontie Jyväskylä Sami Poikonen Vice President, Solutions p Contact Information