Beginning XML 4th Edition. Chapter 12: Simple API for XML (SAX)

Slides:



Advertisements
Similar presentations
J0 1 Marco Ronchetti - Web architectures – Laurea Specialistica in Informatica – Università di Trento Java XML parsing.
Advertisements

Technische universität dortmund Service Computing Service Computing Prof. Dr. Ramin Yahyapour IT & Medien Centrum 22. Oktober 2009.
What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
XML Parsers By Chongbing Liu. XML Parsers  What is a XML parser?  DOM and SAX parser API  Xerces-J parsers overview  Work with XML parsers (example)
1 SAX and more… CS , Spring 2008/9. 2 SAX Parser SAX = Simple API for XML XML is read sequentially When a parsing event happens, the parser invokes.
SAX A parser for XML Documents. XML Parsers What is an XML parser? –Software that reads and parses XML –Passes data to the invoking application –The application.
Document Type Definitions
XML Robert Grimm New York University. The Whirlwind So Far  HTTP  Persistent connections  (Style sheets)  Fast servers  Event driven architectures.
1 The Simple API for XML (SAX) Part I ©Copyright These slides are based on material from the upcoming book, “XML and Bioinformatics” (Springer-
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
Xerces The Apache XML Project Yvonne Yao. Introduction Set of libraries that provides functionalities to parse XML documents Set of libraries that provides.
21-Jun-15 SAX (Abbreviated). 2 XML Parsers SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files DOM is a W3C standard.
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
26-Jun-15 SAX. SAX and DOM SAX and DOM are standards for XML parsers--program APIs to read and interpret XML files DOM is a W3C standard SAX is an ad-hoc.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Tutorial 11 Creating XML Document
Apache DOM Parser©zwzOctober 24, 2002 Wenzhong Zhao Department of Computer Science The University of Kentucky.
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
17 Apr 2002 XML Programming: SAX Andy Clark. SAX Design Premise Generic method of creating XML parser, parsing documents, and receiving document information.
17 Apr 2002 XML Programming: JAXP Andy Clark. Java API for XML Processing Standard Java API for loading, creating, accessing, and transforming XML documents.
SDPL 2003Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
XML for E-commerce II Helena Ahonen-Myka. XML processing model n XML processor is used to read XML documents and provide access to their content and structure.
Introduction to XML cs3505. References –I got most of this presentation from this site –O’reilly tutorials.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
SAX Parsing Presented by Clifford Lemoine CSC 436 Compiler Design.
Advanced Java Session 9 New York University School of Continuing and Professional Studies.
SDPL 2002Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How can applications manipulate structured documents? –An overview of document parser.
SDPL 20113: XML APIs and SAX1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
XML Parsers Overview  Types of parsers  Using XML parsers  SAX  DOM  DOM versus SAX  Products  Conclusion.
SAX. What is SAX SAX 1.0 was released on May 11, SAX is a common, event-based API for parsing XML documents Primarily a Java API but there implementations.
Softsmith Infotech XML. Softsmith Infotech XML EXtensible Markup Language XML is a markup language much like HTML Designed to carry data, not to display.
XML Processing in Java. Required tools Sun JDK 1.4, e.g.: JAXP (part of Java Web Services Developer Pack, already in Sun.
Sheet 1XML Technology in E-Commerce 2001Lecture 3 XML Technology in E-Commerce Lecture 3 DOM and SAX.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
School of Computing and Information Systems CS 371 Web Application Programming XML and JSON Encoding Data.
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
C# and Windows Programming XML Processing. 2 Contents Markup XML DTDs XML Parsers DOM.
XML Study-Session: Part III
Apache DOM Parser©zwzOctober 24, 2002 Wenzhong Zhao Department of Computer Science The University of Kentucky.
XML and SAX (A quick overview) ● What is XML? ● What are SAX and DOM? ● Using SAX.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
What is XML? eXtensible Markup Language eXtensible Markup Language A subset of SGML (Standard Generalized Markup Language) A subset of SGML (Standard Generalized.
1 Introduction JAXP. Objectives  XML Parser  Parsing and Parsers  JAXP interfaces  Workshops 2.
Games: XML Presented by: Idham bin Mat Desa Mohd Sharizal bin Hamzah Mohd Radzuan bin Mohd Shaari Shukor bin Nordin.
SDPL 20063: XML Processor Interfaces1 3. XML Processor APIs n How can (Java) applications manipulate structured (XML) documents? –An overview of XML processor.
Simple API for XML (SAX) Aug’10 – Dec ’10. Introduction to SAX Simple API for XML or SAX was developed as a standardized way to parse an XML document.
Tutorial 9 Working with XHTML. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Describe the history and theory of XHTML.
7-Mar-16 Simple API XML.  SAX and DOM are standards for XML parsers-- program APIs to read and interpret XML files  DOM is a W3C standard  SAX is an.
XML DOM  XML Document Object Model provides a robust international standard for XML Documents.  DOM Level 1 is a Dec 11, 1998 W3C recommendation.  XML.
Introduction to XML Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
SDPL 2001Notes 3: XML Processor Interfaces1 3. XML Processor APIs n How applications can manipulate structured documents? –An overview of document parser.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
XML & JSON. Background XML and JSON are to standard, textual data formats for representing arbitrary data – XML stands for “eXtensible Markup Language”
1 Introduction SAX. Objectives 2  Simple API for XML  Parsing an XML Document  Parsing Contents  Parsing Attributes  Processing Instructions  Skipped.
Java API for XML Processing
Simple API for XML SAX. Agenda l Introduction to SAX l Installation and setup l Steps for SAX parsing l Defining a content handler l Examples Printing.
Week-9 (Lecture-1) XML DTD (Data Type Document): An XML document with correct syntax is called "Well Formed". An XML document validated against a DTD is.
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Unit 4 Representing Web Data: XML
XML Technologies and Related Applications
Chapter 7 Representing Web Data: XML
XML Parsers Overview Types of parsers Using XML parsers SAX DOM
Java API for XML Processing
A parser for XML Documents
SAX2 29-Jul-19.
Presentation transcript:

Beginning XML 4th Edition

Chapter 12: Simple API for XML (SAX)

Chapter 12 Objectives What is SAX? Where to download SAX and how to set it up How and when to use the primary SAX interfaces

What is SAX? Parser analyzes XML documents can be more efficient than DOM

A Simple Analogy Like watching a train go by Note the Start of the train End of the train Start of each car End of each car Color Length Engineer Baggage Passengers

A Brief History of SAX Not a formal specification It is popular because it works! Since 1997 Courtesy of David Megginson

Where to Get SAX Xerces2-J AEIfred2 Crimson Oracle XP

Setting Up SAX SAX libraries Parser JDK

Receiving SAX Events public class MyClass implements ContentHandler public class MyClass extends DefaultHandler

ContentHandler Interface startDocumentprocessingInstruction endDocumentstartPrefixMapping startElementendPrefixMapping characterssetDocumentLocator ignorableWhitespaceskippedEntity

Try It Out The Start of Something Big

Handling Element Events public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException ParameterValue urihttp://example.com localNamemyElement qNamemyPrefix:myElement

Attributes MethodDescription getLengthDetermine the number of attributes available in the Attributes interface. getIndexRetrieve the index of a specific attribute in the list. The getIndex function allows you to look up the index by using the attribute’s qualified name or by using both the local name and namespace URI. getLocalNameRetrieve a specific attribute’s local name by sending the index in the list. getQNameRetrieve a specific attribute’s qualified name by sending the index in the list. getURIRetrieve a specific attribute’s namespace URI by sending the index in the list. getTypeRetrieve a specific attribute’s type by sending the index in the list, by using the attribute’s qualified name, or by using both the local name and namespace URI. If there is no Document Type Definition (DTD), this function will always return CDATA. getValueRetrieve a specific attribute’s value by sending the index in the list, by using the attribute’s qualified name, or by using both the local name and namespace URI.

Try It Out Element and Attribute Events

Handling Character Content public void characters(char[] ch, int start, int len) throws SAXException

Try It Out Adding Colorful Characters

When to Ignore ignorableWhitespace public void ignorableWhitespace(char[ ] ch, int start, int len) throws SAXException

Skipped Entities public void skippedEntity(String name) throws SAXException

Handling Special Commands with Processing Instructions public void processingInstruction(String target, String data) throws SAXException

Namespace Prefixes public void startPrefixMapping(String prefix, String uri) throws SAXException public void endPrefixMapping(String prefix) throws SAXException xmlns:example=“

Try It Out Pulling the Brakes

Providing the Location of the Error MethodDescription getLineNumberRetrieves the line number for the current event. getColumnNumberRetrieves the column number for the current event (the SAX specification assumes that the column number is based on right-to-left reading modes). getSystemIdRetrieves the system identifier of the document for the current event. Because XML documents may be composed of multiple external entities, this may change throughout the parsing process. getPublicIdRetrieves the public identifier of the document for the current event. Because XML documents may be composed of multiple external entities, this may change throughout the parsing process.

Try It Out Which Stop is This?

ErrorHandler Interface EventDescription warningAllows the parser to notify the application of a warning it has encountered in the parsing process. Though the XML Recommendation provides many possible warning conditions, very few SAX parsers actually produce warnings. errorAllows the parser to notify the application that it has encountered an error. Even though the parser has encountered an error, parsing can continue. Validation errors should be reported through this event. fatalErrorAllows the parser to notify the application that it has encountered a fatal error and cannot continue parsing. Well- formedness errors should be reported through this event.

Try It Out To Err Is Human, To Handle Errors Is Divine

DTDHandler Interface EventDescription notationDeclAllows the parser to notify the application that it has read a notation declaration. unparsedEntityDeclAllows the parser to notify the application that it has read an unparsed entity declaration.

EntityResolver Interface EventDescription resolveEntityAllows the application to handle the resolution of entity lookups for the parser. reader.setEntityResolver(this); <!ENTITY train PUBLIC "-//TRAINS//freight cars xml 1.0//EN" "

Features and Properties public void setFeature(String name, boolean value) throws SAXNotRecognizedException, SAXNotSupportedException

Working with Properties public void setProperty(String name, Object value) throws SAXNotRecognizedException, SAXNotSupportedException public Object getProperty(String name) throws SAXNotRecognizedException, SAXNotSupportedException

Extension Interfaces: DeclHandler EventDescription attributeDeclAllows the parser to notify the application that it has read an attribute declaration elementDeclAllows the parser to notify the application that it has read an element declaration. externalEntityDeclAllows the parser to notify the application that it has read an external entity declaration. internalEntityDeclAllows the parser to notify the application that it has read an internal entity declaration.

Extention Interfaces: LexicalHandler EventDescription commentAllows the parser to notify the document that it has read a comment. The entire comment will be passed back to the application in one event call; it will not be buffered as it may be in the characters and ignorableWhitespace events. startCDATAAllows the parser to notify the document that it has encountered a CDATA section start marker. The character data within the CDATA section will always be passed to the application through the characters event. endCDATAAllows the parser to notify the document that it has encountered a CDATA section end marker. startDTDAllows the parser to notify the document that it has begun reading a DTD. endDTDAllows the parser to notify the document that it has finished reading a DTD. startEntityAllows the parser to notify the document that it has started reading or expanding an entity. endEntityAllows the parser to notify the document that it has finished reading or expanding an entity.

Good SAX and Bad SAX Good Simple Doesn’t load whole document Small footprint Quick Filtering Bad No control over order Intricate state keeping required DOM is better at analyzing

Consumers, Producers, and Filters Sometimes it’s easier to filter than to write an application to consume and produce events.

Other Languages LanguageAvailable Interfaces C++Xerces-C++, the counterpart to the Xerces-J toolkit you are using from Apache, defines a set of C and C++ bindings available at Microsoft Core XML Services (formerly MSXML), provides C++ and COM interfaces (including ActiveX wrappers) available at Arabica toolkit provides C++ bindings that make more extensive use of C++ language features, available at Perl SAX bindings for Perl can be found at PythonPython 2.0 includes support for SAX processing in its markup toolkit as part of the default distribution available at PascalSAX for Pascal bindings can be found at Visual BasicMSXML, the Microsoft XML toolkit, provides Visual Basic interfaces available at dc73e357f7bb.asp..NETThe System.Xml classes distributed with.NET provide psuedo-SAX implementations usable in various.NET languages. The interfaces are "pull-based" and are not exact correlations. To use SAX interfaces in.NET visit the SAX for.NET project at CurlCurl is a web content management system with its own SAX bindings, available at