XML Basics Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Extensible Meta Language Markup Language
Information Information Age Processing Representation Storage Retrieval SearchSharing Management Interchange
IS there such a creation ? The Needs for Information Interchange Power Flexibility Simplicity Fault tolerance Scalability Interoperability Open standard Extensible Character-based Human-readable
IS There Such a Creation? Power Flexibility Simplicity Fault tolerance Scalability Interoperability Open standard Extensible Character-based Human-readable HTML SGML XML XXX ?XXX ??? X XXXX Criteria
Weaknesses of HTML HTML isn’t extensible – can’t define custom tags. HTML is display-centric. HTML isn’t usually directly reusable HTML only provide one view of data. HTML has little or no semantic structure. Getting bigger and slower! Not fault tolerance. XML will complement, rather than replace, HTML
The Buzz Words Around XML SVG – Scalable Vector Graphics Language OFX – Open Financial Exchange SGML – Standard Generalized Markup Language DTD – Document Type Definition DSSSL – Document Style Semantics and Specification Language CSS – Cascading Style Sheet XSL – XML Stylesheet Language DOM – Document Object Model....
Basics of XML XML What? Why? Who? Where? When? How?
What is XML? XML stands for Extensible Markup Language. Markup is the code, embedded with the document, which store the information required for electronic processing. XML is extensible because it predefines no tag but lets the user create tags that are needed for application. XML is a meta language because it can be used to define markup languages.
Family of Markup Languages GML – Generalized Markup Language SGML – Standard Generalized Markup Language HTML – Hyper Text Markup Language XML – Extensible Markup Language XHTML – Extensible Hyper-Text Markup Language CML - Chemistry Markup Language MathML – Mathematical Markup Language SVG – Scalable Vector Graphic SMIL – Synchronized Multimedia Integration Language HDML – Handheld Device Markup Language WML – Wireless Markup Language OEB – Open eBook Structure Specification
Genealogy of Markup Languages GML (1969) SGML (1985) XML (1998) HTML (1993) XHTMLSVGSMILHDMLOEB IBM ISO 8879 W3C CERN
SGML XML Genealogy of Markup Languages HTML XSL
Advantages of XML Common language for system-to-system communication Enables loose connectivity, yet tight integration Relatively easy to implement conversion from an RDB record to an XML message. Platform independent Scalable XML Signature provides message and party authentication.
Traditional vs. Nontraditional Document Information Structure Format Traditional Format InformationStructure Nontraditional
Ways of Displaying XML Format Information (Document) Structure (DTD) XSL DHTML + CSS DSSSL CGI + Script
Write One Publish Many Idea XML Document Print out CD ROM Web WAP, etc. Process
XML for Information Interchange CAD Package Word Processor Statistical Processing Spreadsheet Package XML
Demand for Platform Independent Technology InternetXHTML XMLJava Platform DataProcessing Presentation
Selected XML Applications Middle-Tier Servers: Personalized Frequent-Flyer Website Building an Online Auction Website Anatomy of an Information Server E-Commerce: Electronic Data Interchange (EDI) Collaboration in an e-commerce Supply Web
Selected XML Applications Portals: Enterprise Information Portals (EIP) Syndication: Information and Content Exchange (ICE) Publishing: PC World Online Content Management: Enterprise Data Management
Selected XML Applications Content Acquisition: Integrating Legacy Data Schema: Building a Schema for a Product Catalog Stylesheet: A Stylesheet-Driven Tutorial Generator. Navigation – Application Integration: Application Integration Using Topic Map
Components of XML Systems XML Parser (Processor) XML Application XML Document (Contents) XML DTD (Rule) Well-Formed (Syntax) Validate (Structure)
Well Formed Document Here are some general guidelines: At least one root element. All elements must contain both start and end tags. Tags are case sensitive No overlapping tags. Elements must nest inside each other properly. Attribute values must be enclosed in quotes. An empty element must end with “/>” The text characters ( ) and (“) must always be represented by character entities. Well formed XML documents are those documents that are syntactically correct.
(optional) How a Parser Interprets XML - Validate XML Document Data Type Definition Issue Warning/Stop Processing Further Processing Well Formed? DTD? Valid? Issue Warning/Stop Processing no yes
Popular Parsers for XML MSXML – Microsoft’s IE Gecko – Netscape IBM XML Parser for Java ( Data Channel XJ Parser ( ) SUN XML Parser for Java ( )
Thank You? Any Question?