XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302

Slides:



Advertisements
Similar presentations
CSCI N241: Fundamentals of Web Design Copyright ©2004 Department of Computer & Information Science Introducing XHTML: Module B: HTML to XHTML.
Advertisements

XML: Extensible Markup Language
1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs.
XML: Extensible Markup Language. Slide Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree)
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
Extensible Markup Language XML MIS 520 – Database Theory Fall 2001 (Day) Lecture 14.
Semantic Web 06 th March, 2002 Robert Kaminski, Thomas Panas.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
Tutorial 11 Creating XML Document
XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302
Managing XML and Semistructured Data Lecture 2: XML Prof. Dan Suciu Spring 2001.
Introducing XHTML: Module B: HTML to XHTML. Goals Understand how XHTML evolved as a language for Web delivery Understand the importance of DTDs Understand.
Introduction to XML This material is based heavily on the tutorial by the same name at
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
4/20/2017.
Copyright © 2003 Pearson Education, Inc. Slide 2-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
ECA 228 Internet/Intranet Design I Intro to XML. ECA 228 Internet/Intranet Design I HTML markup language very loose standards browsers adjust for non-standard.
XML – Data Model, DTD and Schema
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
XML introduction to Ahmed I. Deeb Dr. Anwar Mousa  presenter  instructor University Of Palestine-2009.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
Chapter 10: XML.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
CISC 3140 (CIS 20.2) Design & Implementation of Software Application II Instructor : M. Meyer Address: Course Page:
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XMLI Structure of XML Data Structure of XML Data XML Document Schema XML Document Schema XPATH XPATH.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
XML Documents Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Elements Attributes Comments PI Document.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 27 XML: Extensible Markup Language.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
Jeff Ullman: Introduction to XML 1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
An Introduction to XML Sandeep Bhattaram
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
1 XML eXtensible Markup Language. 2 XML vs. HTML HTML is a HyperText Markup language HTML is a HyperText Markup language Designed for a specific application,
Lecture 20 XML. 2 Objectives What semistructured data is. Concepts of the Object Exchange Model (OEM), a model for semistructured data. Basics of Lore,
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
C# and Windows Programming XML Processing. 2 Contents Markup XML DTDs XML Parsers DOM.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
1 Indexing The syntax for creating a index is: CREATE [UNIQUE] INDEX index_name ON table_name (column1, column2,... column_n) [ COMPUTE STATISTICS ]; Why.
+ 1 XML eXtensible Markup Language. + 2 XML Lecture Adapted from the work of Dr. Praveen Madiraju of Marquette University.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
Introduction to XML Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML – Basic Concepts (modified version from Dr. Praveen Madiraju) 2015, Fall Pusan National University Ki-Joune Li.
XML Extensible Markup Language
PART 1 XML Basics. Slide 2 Why XML Here? You need to understand the basics of XML to do much with Android All of they layout and configuration files are.
XML Introduction to XML Extensible Markup Language.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides.
Extensible Markup Language (XML) Pat Morin COMP 2405.
XML Technologies and Applications
Lecture 9: XML Monday, October 17, 2005.
eXtensible Markup Language
Lecture 11: XML and Semistructured Data
Presentation transcript:

XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA I: Introduction and XML Basics December 2005

Outline  Introduction  XML Basics  XML Structural Constraint Specification  Document Type Definitions (DTDs)  XML Schema  XML/Database Mappings  XML Parsing APIs  Simple API for XML (SAX)  Document Object Model (DOM)  XML Querying and Transformation  XPath  XQuery  XSLT  XML Applications

Introduction  XML: A W3C standard to complement HTML  Two facets of XML: document-centric and data-centric  Motivation  HTML describes presentation  XML describes content  User defined tags to markup “content”  Text based format.  Ideal as “Data Interchange” format.  Key technology for “distributed” applications.  All major database products have been retrofitted with facilities to store and construct XML documents.  XML is closely related to object-oriented and so-called semi- structured data.

Semistructured Data An HTML document (student list) to be displayed on the Web Name: John Doe Id: s Address: Number: 123 Street: Main Name: Joe Public Id: s … … … … HTML does not distinguish between attributes and values

Semistructured Data (cont’d.)  To make the previous student list suitable for machine consumption on the Web, it should have the following characteristics  Be object-like schemaless  Be schemaless (not guaranteed to conform exactly to any schema, but different objects have some commonality among themselves. self-describing  Be self-describing (some schema-like information, like attribute names, is part of data itself)  Data with these characteristics are referred to as semistructured.

Semi-structured Data Model  Set of label-value pairs. {name: "Alan", tel: , }  The values themselves may be structures {name: {first: “Alan”, last: “Black”}, tel: , } nametel nametel lastfirst “Alan”“Black” Graph Model: Nodes represent objects connected by labeled edges to values

Semi-structured Data Model  Duplicate labels allowed {name: "Alan", tel: , tel: "}  The syntax is easily generalized to describe sets of objects {person: {name: “Alan”,tel: , person: {name: “Sara”,tel: , person: {name: “Fred”,tel: , }  All objects within a set need not have the same structure {person:{name: “Alan”,tel: , person:{name: {first: “Sara”,last: “Black”}, person:{name: “Fred”, tel: , height: 168} }

Semi-structured Data Model  Relational Data is easily represented {r1: {row: {a: a1, b: b1, c: c1}, {row: {a: a2, b: b2, c: c2}}, r2: {row: {c: c2, d: d2}, row: {c: c3, d: d3}, row: {c: c4, d: d4}} }  Object-oriented data is also naturally represented (each node has a unique object id, either explicitly mentioned or system generated) {person: &o1{name: “Mary”, age: 45, child: &o2, child: &o3}, person: &o2{name: “John”, age: 17, relatives: {mother: &o1, sister: &o3}}, person: &o3{name: “Jane”, country: “Canada”, mother: &o1} }

Semi-structured Data Model  Formal syntax for semi-structured data model ::== | oid | oid ::== atomicvalue | ::== { label:,..., label: }  An oid value is said to be DEFINED if it appears before a value; otherwise it is said to be USED  An ssd-expression is CONSISTENT if  An oid is defined at most once and  If an oid is used, it must also be defined.  A flexible and powerful data model that is capable of representing data that does not have to follow the strict rules of databases.

What is Self-describing Data? Non-self-describing (relational, object-oriented): Data part: (#12345, [“Students”, {[“John”, s , [123,”Main St”]], [“Joe”, s , [321, “Pine St”]] } ] ) Schema part: PersonList PersonList[ ListName: String, Contents: [ Name: String, Id: String, Address: [Number: Integer, Street: String] ] ]

What is Self-Describing Data? (contd.)  Self-describing:  Attribute names embedded in the data itself, but are distinguished from values  Doesn’t need schema to figure out what is what (but schema might be useful nonetheless) (#12345, [ListName: “Students”, Contents: { [ Name: “John Doe”, Id: “s ”, Address: [Number: 123, Street: “Main St.”] ], [Name: “Joe Public”, Id: “s ”, Address: [Number: 321, Street: “Pine St.”] ] } ] )

XML – The De Facto Standard for Semi-structured Data XM L  XML: eXtensible Markup Language  Suitable for semi-structured data and has become a standard  Used to describe content rather than presentation  Differs from HTML in following ways  New tags may be defined at will by the author of the document (extensible) …  No semantics behind tags. For instance, HTML’s … means: render contents as a table; in XML: doesn’t mean anything special.  Structures may be nested arbitrarily  XML document may contain an optional schema that describes its structure  Intolerant to bugs; Browsers will render buggy HTML pages but XML processors will reject ill-formed XML documents.

XML Syntax XML Elements element: piece of text bounded by user-defined matching tags: Alan 42 Note:  Element includes the start and end tag  No quotation marks around strings; XML treats all data as text. This is referred to as PCDATA (Parsed Character Data).  Empty elements: can be abbreviated to

XML Syntax - Continued Collections are expressed using repeated structures. Ex. The collection of all persons on the 4th floor: People on the 4th floor Alan 42 Patsy 36 Ryan 58

XML Syntax - Continued XML Attributes Attributes define some properties of elements Expressed as a name-value pairs trompette six trous rue Croix-Bosset Sevres France As with tags, user may define any number of attributes Attribute values must be enclosed within quotation marks.

XML Syntax - Continued Attributes vs Elements  A given attribute can occur only once within a tag; Its value is always a string  On the other hand tags defining elements/sub-elements can repeat any number of times and their values may be string data or sub-elements  Same data may be encoded using attributes or elements or a combination of the two or 42

XML Syntax - Continued XML References  Use id attribute to define a reference (similar to oids)  Use idref attribute (in an empty element) to refer to a previously defined reference. -- defines an id or a reference NE Nevada CCN Carson City -- refers to object called s2; -- this is an empty element

XML Syntax - Continued Mixing Elements and Text  XML allows us to mix PCDATA and sub-elements within an element. This is my best friend Alan 42 I am not sure of the following  This seems un-natural from a database perspective, but from a document perspective, this is quite natural!

XML Syntax - Continued Order The semi-structured data model is based on unordered collections, whereas XML is ordered. The following two pieces of semi-structured data are equivalent: person: {fname: "John", lname: "Smith:} person: {lname: "Smith", fname: "John"} but the following two XML data are not: John Smith Smith> John To make matters worse, attributes are NOT ordered in XML; Following two are equivalent:

XML Syntax - Continued Other XML Constructs  Comments:  Processing Instruction (PI): Such instructions are passed on to applications that process XML files.  CDATA (Character Data): used to write escape blocks containing text that otherwise would be considered markup: this is not an element ]]>  Entities: &lt stands for <

Well-Formed XML Documents  An XML document is well-formed if  Tags are syntactically correct  Every tag has an end tag  Tags are properly nested  There is a root tag  A start tag does not have two occurrences of the same attribute  An XML document must be well-formed before it can be processed.  A well-formed XML document will parse into a node-labeled tree

… … … … … … Elements are nested Root element contains all others Element (or tag) names Terminology elements Root Root element Empty element attributes

More Terminology John is a nice fellow 21 Main St. … … … Opening tag Closing tag: What is open must be closed Nested element, Person child of Person Address Parent of Address, number Ancestor of number “standalone” text, not very useful as data, non-uniform Address Child of Address, Person Descendant of Person Person Content of Person

XML Data Model person name tel Bart Simpson 02– –  Document Object Model (DOM) – DOM Tree  Leaves are either empty or contain PCDATA  Unlike ssd tree model, nodes are labeled with tags.