Worzyk FH Anhalt Telemedizin WS 09/10 XML - 1 XML Extensible Markup Language
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 2 XML Metalanguage –A Language, which describes languages –Languages describe formats for data exchange
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 3 Example Hans Meyer Lohmannstrasse Köthen Dr. Else Müller Bernburger Strasse Köthen
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 4 Example Hans Meyer Lohmannstrasse Köthen Dr. Else Müller Bernburger Strasse Köthen
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 5 Structure of XML documents Prolog –Deklaration of type of dokument –DTD (Document Type Definition) Elements
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 6 Document Type Definition DTD It describes the grammar of a XML - document It describes permitted elements and attributes –their data type and range of values –their nesting An XML – Dokument, that conforms to a DTD is called valid
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 7 Example DTD <!DOCTYPE Personen [ ]> Hans Meyer Lohmannstrasse Köthen
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 8 Structure of XML documents DTD describes the characteristics of the elements Elements are initiated by a start tag and are terminated by a closing tag. XML tags are case sensitive Elements can contain Elements. #PCDATA Parsed character data: The elements consist of character strings whose characters are part of the defined character set.
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 9 Names of Elements Names can contain letters, numbers, and other characters Names must not start with a number or punctuation character Names must not start with the letters xml (or XML or Xml..) Names cannot contain spaces
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 10 Sequence of Elements Subordinate elements are separated in the declaration by commas and included in parentheses. Example: <!DOCTYPE Personen [ ]>
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 11 selection list Selection of exactly one element: The available elements are seperated by | Example: <!DOCTYPE Personen [
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 12 Multiple occurrence * The element can appear no time or arbitrarily often + The element can appear at least one time or arbitrarily often ? The element can appear no time or at most one time
Datenbanksysteme 2 SS 2004 Seite Worzyk FH Anhalt Attributes Types of attriutes:: CDATA, (en1|en2|..), ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, ENTITY, ENTITIES, NOTATION, xml: Defaultvalue: value #REQUIRED, #IMPLIED, #FIXED value
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 14 Comments Comments are embedded by
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 15 Well-formed XML - File The file starts with the XML-declaration, which establish the reference to XML It exists at least one data element It exists exactly one root element, which contain all other data elements All required attributes are defined All elements have the right content The elements must be nested properly
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 16 Valide XML - File The file is well-formed A DTD is assigned to the file The content of the file is according to the assigned DTD
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 17 Parser A parser validates if an XML Document is valide: var xmlDoc = new ActiveXObject("Microsoft.XMLDOM") xmlDoc.async="false" xmlDoc.validateOnParse="true" xmlDoc.load("Patienten5.xml") document.write(" Error Code: ") document.write(xmlDoc.parseError.errorCode) document.write(" Error Reason: ") document.write(xmlDoc.parseError.reason) document.write(" Error Line: ") document.write(xmlDoc.parseError.line)
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 18 DTD - Disadvantages Few datatypes specification not in XML – Syntax –Specification can not be validated with a parser
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 19 XML - Schema An XML Schema: defines elements that can appear in a document defines attributes that can appear in a document defines which elements are child elements defines the order of child elements defines the number of child elements defines whether an element is empty or can include text defines data types for elements and attributes defines default and fixed values for elements and attributes
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 20 XML Schema Advantages over DTD XML Schemas are extensible to future additions XML Schemas are richer and more useful than DTDs XML Schemas are written in XML XML Schemas support data types –xs;date, xs;dateTime, xs:string XML Schemas support namespaces –xmlns:xs="
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 21 Dublin Core Standard Dublin Core Metadata Initiative Conference in 1995 in Dublin / Ohio defined a set of describing attributs to categorize documents in the internet 15 core elements are recommended in Dublin Core Metadata Element Set, Version 1.1 (ISO 15836)
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 22 How to create an XML structure Create a tree-structure of the data Convert that structure to a DTD Add data elements Test
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 23 Example Quarterly billing One file consists of exactly one physician and at least one patient A phyiscian is either a General Practitioner or a dentist A general practitioner has an address and a profession A dentist has an address A patient has an address and no ore more diagnisis An address consists of Name, City, Street A name has a salutation Mr. or Ms.
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 24 Example Quarterly billing billing PhysicianPatient General PractitionerDentistAddressDiagnosis Address Profession ? AdresseNameCityStreet MrMs + |*
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 25 Example - DTD <!DOCTYPE Billing [ ]>
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 26 Example - Data Dr. Erpel Entenhausen Am Krankenhaus 1 Geriatrics Daniel Entenhausen Bahnhofstrasse 3a Bettflucht Daisy Entenhausen Am Stadtpark Sonnenbrand Migräne
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 27 Queries to XML - Files XPath XQuery
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 28 XPath The language XPath serves to address parts of a XML document. It was designed for the use both in XSLT and in XPointer. XPath models a XML document as a tree, which consists of knots.
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 29 Example Everyday Italian Giada De Laurentiis Harry Potter J K. Rowling XQuery Kick Start James McGovern Per Bothner Kurt Cagle James Linn Vaidyanathan Nagarajan Learning XML Erik T. Ray
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 30 Queries with XPath Select all titles: /bookstore/book/title Select the title of the first book /bookstore/book[1]/title Select all the prices /bookstore/book/price/text() Select price nodes with price>35 /bookstore/book[price>35]/title
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 31 XQuery Querylanguage for XML data Uses Xpath expression Analogy to SQL
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 32 Xquery Example TCP/IP Illustrated Stevens W. Addison-Wesley Advanced Programming in the Unix environment Stevens W. Addison-Wesley Data on the Web Abiteboul Serge Buneman Peter Suciu Dan Morgan Kaufmann Publishers The Technology and Content for Digital TV Gerbarg Darcy CITI Kluwer Academic Publishers
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 33 Xquery Example Query: doc("books.xml")/bib/book[price<50] results: Data on the Web Abiteboul Serge Buneman Peter Suciu Dan Morgan Kaufmann Publishers 39.95
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 34 FLWOR For, Let, Where, Order by, Return for $x in doc("books.xml")/bib/book where $x/price>50 order by $x/title return $x/title Results: Advanced Programming in the Unix environment TCP/IP Illustrated The Technology and Content for Digital TV
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 35 XML – Documents in Databases XML – Documents can be Focussed on data Focussed on text Semi-structured
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 36 Alternatives to store XML Documents Storage as a whole Storage within the XML-Structure Transformation to structures of the database
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 37 Storage of XML documents as a whole Original will be stored in a file system or as CLOB in a database full-text index Strukturindex
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 38 Example <hotel url= id=h0001 erstellt-am=03/02/2003 Autor=Hans Müller> Hotel Hübner Warnemünde Seestraße 0381 / / Aus Richtung Rostock kommend...
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 39 full-text index <hotel url= id=h0001 erstellt-am=03/02/2003 Autor=Hans Müller> Hotel Hübner Warnemünde Seestraße 0381 / / Aus Richtung Rostock kommend...
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 40 full-text - and Structurindex <hotel url= id=h0001 erstellt-am=03/02/2003 Autor=Hans Müller> Hotel Hübner Warnemünde Seestraße 0381 / / Aus Richtung Rostock kommend...
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 41 Queries Volltextindex hotel AND warnemünde (hotel OR pension) AND (rostock OR warnemünde) Volletxt- und Strukturindex hotel.adresse.ort CONTAINS (warnemünde) AND hotel.freizeitmoeglichkeit CONTAINS (swimming pool)
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 42 Characteristics full-text index
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 43 generic storage Storage within the XML- Structure All Informationen of the XML-Dokument will be stored –simple generic Storage –Document Object Model
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 44 Beispiel
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 45 Document Object Model The structure of the tree will be transformed to a class hierarchy Storage in objectrelational or objektoriented databases
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 46 Queries XPath QXuery XQL –Abfragesprache der Software AG SQL
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 47 Characteristics Generic Storage
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 48 Transformation to Structures of databases DTD or Schema must be available Automatic or userdriven procedures Transformtion to relational objectrelational objectoriented Databases
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 49 Transformation
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 50 Example
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 51 Queries SQL with –Joins –Aggregatfunktionen –Queryoptimizing –Update
Worzyk FH Anhalt Telemedizin WS 09/10 XML - 52 Characteristics Structures of databases