Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML-An Introduction The eXtensible Markup Language (XML) created by the World Wide Web Consortium (W3C) in 1996 to address limitations of HTML XML – a.

Similar presentations


Presentation on theme: "XML-An Introduction The eXtensible Markup Language (XML) created by the World Wide Web Consortium (W3C) in 1996 to address limitations of HTML XML – a."— Presentation transcript:

1 XML-An Introduction The eXtensible Markup Language (XML) created by the World Wide Web Consortium (W3C) in 1996 to address limitations of HTML XML – a language similar to HTML, but more extensible Supports user defined tags that allow both data and metadata (i.e. data about data) to be stored in a single document At the same time, presentation aspects remain decoupled from data representation

2 A Brief History HTML and XML – are like children of the same parent, Standardized General Markup Language (SGML). SGML, made a standard of the ISO in 1986 SGML – originated in IBM, which wanted a means of publishing document content in different ways. The result of the standards process – A rich document markup language, allowing authors to separate logical content from its presentation SGML, a series of commands understood by another program.

3 Why Another Markup Language? The question to be asked is: What's Wrong With SGML or HTML? SGML is very large, powerful and COMPLEX. SGML used in industry, for commercial purposes for over a decade. SGML too complex to program for a Web environment

4 Why Another Markup Language? …contd HTML can be thought of as a small application of the SGML used on the web HTML defines a very simple class of report-style documents, with section headings,paragraphs, lists, tables, and illustrations etc. It was the first computer language that could be understood and used by the masses. It gave the Web to the common person. HTML is said to be static, one can do limited things with HTML

5 Advantages of XML over HTML FeatureHTMLXML ExtensibilityFixed set of tagsExtensible set of tags Presentation/ContentTags for Presentation onlyTags describe data content ViewsSingle presentation of each document Multiple views of the same document (provided by XSL) Document/Data orientationDocument orientation onlySupport for documents plus extensive infrastructure for exchange and validation of structured data Search/querySearch onlySearch plus field-sensitive queries and later update

6 XML XML allows users to: Bring multiple files together to form compound documents Identify where illustrations are to be incorporated into text files, and the format used to encode each illustration Provide processing control information to supporting programs, such as document validators and browsers Add editorial comments to a file

7 XML Components XML is based on the concept of documents composed of a series of entities Each entity can contain one or more logical elements Each of these elements can have certain attributes (properties) that describe the way in which it is to be processed

8 XML – Few Important Points Tag names are case sensitive Every opening tag must have a corresponding closing tag A nested tag pair cannot overlap another tag Attribute values must appear within quotes Every document must have a root element

9 XML Editor XML documents are raw text documents Any simple text editor can be used as an XML editor For eg., Windows users can use windows notepad or Wordpad Microsoft XML editor – Microsoft XML notepad Java based XML editor

10 XML Document - the root element of the document. - a question and its associated answers. question – a question. a – the first possible answer to a question. b – the second possible answer to a question c – the third possible answer to a question.

11 XML Document (contd) In 1994, a man had an accident while robbing a pizza restaurant in Akron, Ohio, that resulted in his arrest. What happened to him?

12 XML Document (contd) he slipped on a patch of grease on the floor and knocked himself out. he backed into a police car while attempting to drive off. he choked on a breadstick that he had grabbed as he was running out.

13 Viewing XML Document Style sheet is the best way to view an XML document. Style sheet is a series of formatting descriptions that determines how elements are displayed on a web page. In simple english, a style sheet controls how a web page content looks like in a web browser.

14 A CSS for Example doc XML Document CSS for eq tag: eq { Display: block; Width: 750px; Padding: 10px; Margin-bottom: 10px; Border: 4px double black; Background color: silver; }

15 Style Sheets (contd) In the absence of a style sheet, internet explorer or any browser just displays the XML code To attach the style sheet to the document, add the following line of code just after the XML declaration for the document

16 Is XML a Database? XML and its surrounding technologies constitute a "database" in the looser sense of the term i.e. database management system (DBMS). XML provides many of the things found in databases: storage (XML documents), schemas (DTDs, XML schema languages) Query languages (XQuery, XPath, XQL, XML-QL, QUILT, etc.), programming interfaces (SAX, DOM, JDOM) etc.

17 Is XML a Database? Contd.. But lacks many of the things found in real databases: efficient storage, indexes, security,transactions and data integrity, multi-user access, triggers, queries across multiple documents Use XML documents as database in environments with small amounts of data, few users, and modest performance requirements. Fails in an environment, with many users, strict data integrity requirements, and the need for good performance.

18 XML And Databases XML’s proliferation raises questions: how is data transferred by XML documents to be read, stored and queried. In other words how do DBMSs handle XML documents??? Two ways to look at XML Documents: Data-Centric and Document Centric documents.

19 Data Centric Documents Data-Centric documents use XML as a data transport Such documents usually are found in business-to-business applications Examples: Buyer-supplier trading automation, Sales orders, Flight Schedules, Scientific data Data-centric documents have a regular structure Data originates both in the database (in which case we want to expose it as XML) and outside the database (in which case we want to store it in a database)

20 Example - Data Centric Document 1234 John Doe 1998-02-11 5678 Joy Black 1998-03-09

21 Data Centric Documents To manage Data-Centric documents, there need to be data extraction as well as data formatting services Data Extraction: Receive XML documents from a network, and extract structured data from them, to be stored in a DBMS To support data extraction, a mapping must be defined between XML documents and the DBMS data model Data extracted stored in a table, follows a predefined schema. (that is why called a structured representation) The original XML documents structure is not maintained in this case

22 Example: Data Extraction 7369 Paul Smith 7000 Steve Adam Number First Name Last Name 7000 Steve Adam 7369 Paul Smith

23 Data Centric Documents Data Formatting: XML encoding software, takes result of a query expressed in a DBMS Query language, and encode the resulting data in an XML document to be transferred over the network. To support data formatting, implement a sort of a reverse formatting with respect to data extraction After a set of tuples is selected from the database with a database query, data formatting services transform it into an XML document

24 Data Formatting - Data centric document Select FirstName, LastName From Clients Where number = “7369” NumberFirstNameLastName 7369PaulSmith 7000SteveAdam Xml document Paul Smith Table Clients

25 Document Centric Documents In this view, XML documents are application-relevant objects, i.e. new data objects to be stored and managed by a DBMS The meaning of the XML document depends on the document as a whole. Structure is more irregular, and data are heterogeneous Examples: books, email, advertisements Unlike data-centric documents, they usually do not originate in the database

26 Document Centric Documents Document centric documents are application-relevant objects The meaning of the XML document depends on the document as a whole. Structure is more irregular, and data are heterogeneous Unlike data-centric documents, they usually do not originate in the database

27 The Turkey Wrench from Full Fabrication Labs, Inc. is like a monkey wrench, but not as big. The turkey wrench, which comes in both right- and left-handed versions (skyhook optional),is made of the finest stainless steel. The Ready-grip rubberized handle quickly adapts to your hands your hands, even in the greasiest situations. Adjustment is possible through a variety of custom dials. You can: Order your own turkey wrench Example - Document Centric Documents

28 Document Centric Documents This type of document requires a DBMS enhanced with new data types for representing XML data types New capabilities for querying and managing the documents Two types of data types devised are:  Unstructured representation  Hybrid representation

29 Document Centric Documents (Unstructured) Unstructured representation: A single data field inside the DBMS is managed by the DBMS A single data field outside the DBMS, but linked to the DBMS. In this case the operating system manages it For unstructured XML documents, DBMSs extend query languages with XML based selection conditions

30 10 7369 Paul Smith 7000 Steve Adam XML Document 7369 Paul Smith 7000 Steve Adam Id 10 Example - Unstructured

31 Document Centric Documents --Hybrid Hybrid Representation: Combination of Structured and unstructured type. Useful while mixing types, such as structural information about a book, but unstructured information consisting of the contents or chapters of the book.

32 Example -- Hybrid

33 Commercial Support In Databases Oracle 8i Has extended architecture with tools to manage XML documents Supports structured, unstructured and hybrid representation of XML documents XML-SQL utility supports data extraction and data formatting for data-centric documents Document-centric data stored using CLOB (character large object )

34 Commercial Support In Databases IBM DB2 The XML Extender provides features to store and manage XML documents Handles structured, unstructured as well as hybrid types Data centric documents stored in a set of relational tables containing data extracted from XML documents The Extender supports storage and access methods to compose an XML document from existing data or decompose data from an XML document

35 Commercial Support In Databases Document-centric documents stored as either XMLClob or XMLVarChar or XML File Microsoft SQL Server Data-centric: The OpenXML function extracts data from XML document and stores it in a relational database Extending the Select-From-Where statement with the FOR XML clause provides XML formatting of a query language Permits construction of XDR Schemas: Schemas that generate views of the database in XML format, which can be queried with XPath.

36 Data-centric and Document-centric In practice, the distinction between data-centric and document-centric documents is not always clear. For example, a data-centric document, such as an invoice, might contain irregularly structured data, such as a part description. An otherwise document-centric document, such as a user's manual, might contain regularly structured data (often metadata), such as an author's name and a revision date.

37 Document Schema,Database Schema A schema is a set of rules that defines the structure of any document or database Database schema describes over all structure of the database. Document schema describes exact elements and attributes available with in a given markup language along with association between attributes and elements and relationship between elements The schema will allow XML documents to be validated for accuracy

38 Document schemas There are two different approaches for creating schemas in XML documents Document Type Definition(DTD) XML Schema Definition(XSD) A DTD describes vital information about the structure of XML document i.e, it lists element types,attributes and their relationships to each other It sets out what names are to be used for the different types of element, where they may occur, and how they all fit together

39 Limitations of DTD Non –XML syntax No data-type facility Employs a closed-data model which does not allow much flexibility to extend markup languages

40 XML Schema XSDs are not only significant in defining XML structures but also in providing data type capabilities to XML Coded in XML tags Supports Integrity constraints such as Primary and foreign keys etc. Represents an open-ended data model allowing to extend custom markup languages and establishing complex relationships between elements

41 Mapping Document Schemas to Database Schema Two mappings used commonly: Table-based mapping and Object-relational mapping The data transfer software is built on top of this mapping. Use an XML query language (such as XPath, XQuery, or a proprietary language) OR Simply transfer data according to the mapping (the XML equivalent of SELECT *FROM Table).

42 Table Based mapping Used by many of the middleware products that transfer data between an XML document and a relational database It models XML documents as a single table or set of tables. That is, the structure of an XML document must be as follows:............

43 Table based mapping Advantages Its simplicity because it matches structure of tables and result sets in relational databases Mainly useful for transferring data between databases Disadvantages Applies to only limited set of XML documents It doesn't exploit ability of XML to represent hierarchies of data It doesn’t preserve physical structure i.e., DTD

44 Object-relational mapping The object-relational mapping is used by all XML-enabled relational databases and some middleware products. It models the data in the XML document as a tree of objects that are specific to the data in the document. Object–relational mapping is done in two steps : – An Document Schema( DTD) is mapped to object schema – The object schema is mapped to database schema

45 Object-relational mapping Contd.. In this model, element types with attributes are generally modeled as classes. The model is then mapped to relational databases using traditional object-relational mapping techniques i.e. Classes are mapped to tables, scalar properties are mapped to columns, and object valued properties are mapped to primary key / foreign key

46 Object-relational mapping - contd For example, consider the following XML document : 1234 ABC Industries 29.10.00 123 12 10.95 456 600 3.99

47 Object-relational mapping Contd.. Object SalesOrder { number = 1234; Customer = “ABC Industries”; orderdate = 12.15.98; Items = { ptrs to Item Objects}; Object Item{ Number = 1; Part = “123”; Quantity = 12; Price = 10.95; } Object Item{ Number = 2; Part = “456”; Quantity = 600; Price =3.99; } Which maps to the following objects :

48 Object-relational mapping Contd.. and then to rows in the following tables: SaleOrders ---------- Number Customer Date ------ -------------------- -------- 1234 ABC Industries 29.10.00......... Items ----- SONumber Item Part Quantity Price -------- ---- ---- -------- ----- 1234 1 123 12 10.95 1234 2 456 600 3.99...............

49 Query Languages Use of XSLT or Integrate limited number of transformations into mappings Long Term:Implementation of query languages that return XML Almost all of XML query languages (including XQuery 1.0) are read-only, so different means needed to insert, update, and delete data In the long term, XQuery will add these capabilities

50 Template-Based Query Languages Most of these languages rely on SELECT statements embedded in templates The following flights have available seats: SELECT Airline, FltNumber, Depart, Arrive FROM Flights $Airline $FltNumber

51 Template-based Query languages - contd The result of processing such a template might be: The following flights have available seats: ACME 123 Dec 12, 1998 13:43 Dec 13, 1998 01:21...

52 SQL Based Query Languages SQL-based query languages use modified SELECT statements, the results of which are transformed to XML The simplest of these uses nested SELECT statements, which are transformed directly to nested XML according to the object-relational mapping

53 XML Query Languages Template-based query languages and SQL-based query languages can only be used with relational databases XML query languages can be used over any XML document To use with relational databases, the data in the database must be modeled as XML, thereby allowing queries over virtual XML documents There are different types of XML Query languages such as XQuery,XPath etc.,

54 XQuery -- An Introduction XQuery is a functional language in which a query is represented as expression An XQuery expression leverages the capabilities of XML by allowing both specification of what is being selected and designation of output format. There are several types of expressions used in Xquery such as path expressions, element constructors, FLWR expressions, conditional expressions etc.,,

55 XQuery Either a table-based mapping or an object-relational mapping can be used If a table-based mapping is used, each table is treated as a separate document and joins between tables (documents) are specified in the query itself, as in SQL If an object-relational mapping is used, hierarchies of tables are treated as a single document and joins are specified in the mapping

56 Xpath - An Introduction XPath is a set of syntax rules for defining parts of an XML document XPath uses path expressions to identify nodes in an XML document These path expressions look very much like the expressions you see when you work with a computer file system

57 XPath An object-relational mapping is used to do queries across more than one table (Xpath does not support joins across documents) If the table-based mapping used, it is possible to query only one table at a time

58 Native XML Databases (NXD) A native XML database defines a logical model for an XML document-as opposed to data in that document-and stores and retrieves documents according to that model. The model must include elements, attributes, PCDATA and document order. Eg: Xpath data model, the XML Infoset, and the most models implied by the DOM and the events in SAX.

59 NXDs (contd) It has an XML document as it’s fundamental unit of logical storage. Any particular underlying physical storage model is not required. An NXD does not really store the XML in true native form (i.e., text).

60 NXDs in Brief It is specialized for storing XML data and stores all components of XML model intact. An NXD may not actually be a standalone database. It does not represent a new low-level database model, and is not intended to replace existing databases. Is simply a tool intended to assist the developer by providing robust storage and manipulation of XML documents.

61 NXD Features XML Storage: NXDs store XML documents as a unit and will create a model that is closely aligned with XML or one of XML’s technologies like the Infoset or DOM. Includes arbitrary levels of nesting and complexity. This model is automatically mapped by the NXD into the underlying storage mechanism.

62 NXD Features (contd) Collections: NXDs manage collections of documents, allowing you to query and manipulate those documents as a set. Any XML document can be stored in the collection, regardless of the schema – “Schema-Independent” functionality. In the future, it is likely that W3C XML Schema will emerge as the schema language of choice for NXDs.

63 NXD Features (contd) Queries: XPath is the current NXD query language of choice. In order to function as a database query language, XPath is extended slightly to allow queries across collections of documents. XPath has several shortcomings which include lack of grouping, sorting, cross documented joints and support for data types. These issues can be resolved by XSLT and XQuery.

64 NXDs (contd) Native XML databases are the databases designed especially to store XML documents. Like other databases, they support features like transactions, security, multi user access, query languages etc., They are mainly useful for storing document-centric documents. NXDs support XML query languages that execute complex queries which are not possible in sql. Eg.,In NXDs, data can be retrieved based on the structural information, which is not possible in SQL.

65 NXDs (contd) NXD offers XML-specific capabilities such as, XML query languages and will be faster at retrieving whole document. In NXDs we can store semi-structured data i.e., documents that do not have DTDs, to increase retrieval speed. NXDs can store and understand any XML document without prior configuration. Eg., Web search engines where no single or set of DTDs apply to all documents.

66 Application Areas of NXD Any application area that uses XML can use NXD. In general, NXDs excel at storing document- oriented data (eg., XHTML or DocBlock). If the data is represented as XML and is “kind of fuzzy”, an NXD will probably be a good solution. An NXD might not be the best tool to for something like an accounting system where the data is very well-defined and rigid.

67 Application Areas (contd) Corporate Information Portals Catalog Data Manufacturing Parts Database Medical Information Storage Document Management Systems B2B Transaction Logs Personalization Databases.

68 XML Programming Interfaces Programming interfaces give developers a consistent interface for working with XML documents. There are four of the most popular and useful ones: Document Object Model (DOM) Simple API for XML (SAX) JDOM Java API for XML Parsing (JAXP).

69 XML Parsers XML Parsers are programs which are able to read XML syntax and get information required out of it. There are two kinds of XML Parsers: Non-valid:. For e.g. LARK, XP and HEX etc Valid: For e.g. IBM's XML Parser for Java which include DOM and SAX, Oracle XML parser, XMLbooster and DXP etc

70 Document XML DTD (optional) XML parser Application Relationship Between XML Documents, Parsers and Applications

71 Document object Model DOM was created by the W3C, and is an Official Recommendation of the consortium. It is defined as a set of interfaces to the parsed version of an XML document. DOM provides a rich set of functions that you can use to interpret and manipulate an XML document

72 Parse Get Info Application XML Document DOM XML Parser

73 DOM issues It requires a significant amount of memory. The DOM creates objects that represent everything in the original document, including elements, text, attributes, and white space. A DOM parser causes significant delays for large documents.

74 The Simple API for XML (SAX) To get around the DOM issues, the XML-DEV participants (led by David Megginson) created the SAX interface. A SAX parser is event based. A SAX parser doesn't create any objects at all, it simply delivers events to your application. A SAX parser starts delivering events as soon as the parse begins and the application starts generating results right away.

75 Parse Information Application XML Document XML Parser Event handlers

76 SAX issues SAX events are stateless SAX events are not permanent. SAX is not controlled by a centrally managed organization

77 JDOM Frustrated by the difficulty in doing certain tasks with the DOM and SAX models, Jason Hunter and Brett McLaughlin created the JDOM package JDOM is a Java based-technology, open source project that attempts to follow the 80/20 rule JDOM works with SAX and DOM parsers The main feature of JDOM is that it greatly reduces the amount of code

78 The Java API for XML Parsing (JAXP) There are still several things that DOM, SAX, and JDOM don’t address. So, Sun has released JAXP, the Java API for XML Parsing. JAXP provides common interfaces for processing XML documents using DOM, SAX, and XSLT.

79 Which interface is right for you? Will your application be written in Java? How will your application be deployed? Once you parse the XML document, will you need to access that data many times? Do you need just a few things from the XML source? Are you working on a machine with very little memory?

80 Applications Of XML There are several applications of XML which are: Wireless Markup Language (WML): It is an XML application which is designed specifically to support wireless communication networks. MathML: It is an XML application which supports mathematical and scientific markups for the use on the web. Scalable Vector Graphics (SVG): It is an application of XML which is used for describing two-dimensional graphics in XML.

81 Applications Of XML (contd.) Resource Description Framework (RDF): A framework for metadata to assure interoperability between applications. Synchronized Multimedia Integration Language (SMIL): SMIL enables to integrates independent multimedia objects into synchronized multi media presentation. Web Services: A tool to access the Web browser, such as SOAP, UDDI, WSDL, all these are XML based technologies. Other applications include VoiceML, VectorML and MusicML etc.

82 Conclusion Even though current DBMSs support XML, several problems remain to be investigated. –Development of clustering algorithms for persistent XML documents –Extension of support for XML query languages in commercial databases –Development of access control models to provide more secure content based access to XML documents –Development of ad hoc indexing structures for more efficient document access –Data centric Architectures need flexible extraction and formatting mechanisms –Architectural support for document-centric document management


Download ppt "XML-An Introduction The eXtensible Markup Language (XML) created by the World Wide Web Consortium (W3C) in 1996 to address limitations of HTML XML – a."

Similar presentations


Ads by Google