Presentation is loading. Please wait.

Presentation is loading. Please wait.

S EMISTRUCTURED D ATA AND XML. 2222 D ATA F ILES ON THE W EB HTML documents often generated by applications consumed by humans only easy access: across.

Similar presentations


Presentation on theme: "S EMISTRUCTURED D ATA AND XML. 2222 D ATA F ILES ON THE W EB HTML documents often generated by applications consumed by humans only easy access: across."— Presentation transcript:

1 S EMISTRUCTURED D ATA AND XML

2 2222 D ATA F ILES ON THE W EB HTML documents often generated by applications consumed by humans only easy access: across platforms, across organizations only layout, no semantic information No application interoperability: HTML not understood by applications Database technology: client-server vendor specific data files.

3 3333 XML D ATA E XCHANGE F ORMAT A standard from the W3C (World Wide Web Consortium, http://www.w3.org).http://www.w3.org The mission of the W3C „... developing common protocols that promote its evolution and ensure its interoperability...“. Basic ideas XML = data XML generated by applications XML consumed by applications Easy access: across platforms, organizations.

4 4444 P ARADIGM S HIFT ON THE W EB For web search engines: From documents (HTML) to data (XML) From document management to document understanding (e.g., question answering) From information retrieval to data management For database systems: From relational (structured) model to semistructured data From data processing to data /query translation From storage to transport

5 5555 T HE S EMISTRUCTURED D ATA M ODEL &o1 &o12&o24&o29 &o43 &96 &243 &206 &25 “Serge” “Abiteboul” 1997 “Victor” “Vianu” 122133 paper book paper references author title year http author title publisher author title page firstname lastname firstnamelastnamefirst last Bib Object Exchange Model (OEM) complex object atomic object with objectID

6 6666 T HE S EMISTRUCTURED D ATA M ODEL Data is self-describing : the data description is integrated with the data itself rather than in a separate schema. Database is a collection of nodes and arcs (directed graph). Leaf nodes represent attribute data of some atomic type ( atomic objects, such as numbers or strings). Interior nodes represent complex objects, entities, or elements. Complex objects consist of components (child nodes), connected by arcs to this node.

7 7777 T HE S EMISTRUCTURED D ATA M ODEL Arc labels indicate the relationship between the two corresponding nodes. The root node is the only interior node without in- arcs, representing the entire database. All database objects are descendants of the root node. The graph need not be a tree structure, but is usually acyclic.

8 XML XML Programmer

9 9999 Language A way of communicating information Part of the Semantic Web. Markup Notes or meta-data that describe your data or language Extensible Limitless ability to define new languages or data sets. Sophisticated query languages for XML are available: XPath Xquery XML – The Extensible Markup Language

10 10 XML: A N E XAMPLE Richard Feynman The Character of Physical Law 1980 R.K. Narayan Waiting for the Mahatma 1981 R.K. Narayan The English Teacher 1980

11 11 XML – W HAT ’ S T HE P OINT ? You can include your data and a description of what the data represents This is useful for defining your own language or protocol Example: Chemical Markup Language 234.5 … XML design goals: XML should be compatible with SGML It should be easy to write XML processors The design should be formal and precise

12 12 XML – S TRUCTURE XML looks like HTML XML is a hierarchy of user-defined tags called elements with attributes and data Data is described by elements, elements are described by attributes … closing tag attribute attribute valuedata open tag element name

13 13 XML – E LEMENTS … XML is case and space sensitive Element opening and closing tag names must be identical Opening tags: “ ” Closing tags: “ ” closing tag attribute attribute valuedata open tag element name

14 14 XML – A TTRIBUTES … Attributes provide additional information for element tags. There can be zero or more attributes in every element; each one has the the form: attribute_name =‘ attribute_value ’ - There is no space between the name and the “=‘” - Attribute values must be surrounded by “ or ‘ characters Multiple attributes are separated by white space (one or more spaces or tabs). closing tag attribute attribute valuedata open tag element name

15 15 XML – D ATA AND C OMMENTS … XML data is any information between an opening and closing tag XML data must not contain the ‘ ’ characters Comments: closing tag attribute attribute valuedata open tag element name

16 16 XML – N ESTING & H IERARCHY XML tags can be nested in a hierarchy (think tree). XML documents can have only one root tag Between an opening and closing tag you can insert: 1. Data 2. More Elements 3. A combination of data and elements Some Text More XML Examples and Exercises

17 17 G RAPHICAL D ATA M ODEL FOR XML Some Text More Node Type: Element_Node Name: Element Value: Root Node Type: Element_Node Name: Element Value: tag1 Node Type: Text_Node Name: Text Value: More Node Type: Element_Node Name: Element Value: tag2 Node Type: Text_Node Name: Text Value: Some Text

18 18 XML VS. S EMISTRUCTURED D ATA Both described best by a graph. Both are schema-less, self-describing (XML without DTD / XML schema). XML is ordered, semistructured data is not. XML can mix text and elements: Making Java easier to type and easier to type Phil Wadler XML has lots of other stuff: attributes, entities, processing instructions, comments.

19 19 XML VS. R ELATIONAL D ATABASES RelationalXML StructureTablesHierarchical Graph, Tree SchemaFixed before adding data Flexible, self- describing QueriesSimple, nice Less so OrderingNoneImplied ImplementationNativeAdd-On Jennifer Widom

20 20 DTD – D OCUMENT T YPE D EFINITION A DTD is a schema for XML data XML protocols and languages can be standardized with DTD files A DTD says what elements and attributes are required or optional Defines the formal structure of the language More advanced version: XML Schema.XML Schema (not on exam)

21 21 XML I SSUES Database issues: How are we going to model XML? (graphs). How are we going to query XML? (XPath, XQuery) How are we going to store XML (in a relational database? object-oriented? native?) How are we going to process XML efficiently? (many interesting research questions!)

22 22 XML S CHEMA The successor of DTDs to specify a schema for XML documents. A W3C standard. Includes and extends functionality of DTDs. In particular, XML Schemas support data types. This makes it easier to validate the correctness of data and to work with data from a database. XML Schemas are written in XML. You don't have to learn a new language and can use your XML parser to parse your Schema files.

23 23 E XAMPLE XML S CHEMA …

24 24 S IMPLE E LEMENTS Simple elements contain only text. They can have one of the built-in datatypes: xs:string, xs:decimal, xs:integer, xs:boolean xs:date, xs:time. Example

25 25 S IMPLE E LEMENTS Restrictions allow you to further constrain the content of simple elements.

26 26 A TTRIBUTES Attributes can be specified using the attribute element: Attribute elements are nested within the element of the element with which they are associated. By default, attributes are optional. To make an attribute mandatory, use Attributes can have the same built-in datatypes as simple elements.

27 27 C OMPLEX E LEMENTS Complex elements can contain other elements and can have attributes. Nested elements need to occur in the order specified. The number of repetitions of elements are controlled by the attributes minOccurs and maxOccurs. The default is one repetition. A complex element with an attribute:

28 28 C OMPLEX E LEMENTS A complex element containing a sequence of nested (simple) elements:

29 29 C OMPLEX E LEMENTS If you name the complex element, other elements can reference and include it:

30 30 E XAMPLE XML S CHEMA …

31 XML-P ATH = X PATH XML-Q UERY = XQ UERY

32 32 Q UERY L ANGUAGES FOR XML XPath is a simple query language based on describing similar paths in XML documents. XQuery extends XPath in a style similar to SQL, introducing iterations, subqueries, etc. (not on exam) XPath and XQuery expressions are applied to an XML document and return a sequence of qualifying items. Items can be primitive values or nodes (elements, attributes, documents).

33 33 XP ATH A path expression returns the sequence of all qualifying items that are reachable from the input item following the specified path. A path expression is a sequence consisting of tags or attributes and special characters such as slashes (“/”). Absolute path expressions are applied to some XML document and returns all elements that are reachable from the document’s root element following the specified path. Relative path expressions are applied to an arbitrary node.

34 34 P ATH E XPRESSIONS Examples: DB = &o1 &o12&o24&o29 &o43 &o70&o71 &96 &243 &206 &25 “Serge” “Abiteboul” 1997 “Victor” “Vianu” 122133 paper book paper references author title year http author title publisher author title page firstname lastname firstnamelastnamefirst last Bib &o44&o45&o46 &o47&o48 &o49 &o50 &o51 &o52 Bib/paper={&o12,&o29} Bib/book/publisher={&o51} Bib/paper/author/lastname={&o71,&206} Bib/paper={&o12,&o29} Bib/book/publisher={&o51} Bib/paper/author/lastname={&o71,&206}

35 35 XP ATH E XAMPLE D OCUMENT Foundations… Abiteboul Hull Vianu Addison Wesley 1995 XML and Databases Ullmann XML Query Optimization Ng

36 36 P ATH E XPRESSIONS Examples: bib/paper returns XML and Databases Ullmann XML Query Optimization Ng bib/book/publisher returns Addison Wesley Given an XML document, the value of a path expression p is a set of objects (elements or attribute values).

37 37 A TTRIBUTES If we do not want to return the qualifying elements, but the value one of their attributes, we end the path expression with @attribute. Foundations… Abiteboul Hull Vianu Addison Wesley 1995 the XPath expression /bib/book/@bookID returns the sequence “b100“...

38 38 W ILDCARDS We can use wildcards instead of actual tags and attributes: * means any tag, and @* means any attribute. // looks for any descendants. Examples /bib/*/author returns the sequence Abiteboul Hull Vianu Ullmann Ng /bib//author returns the same in this case

39 39 C ONDITIONS AND O THER C ONSTRUCTS /bib/paper[2]/author[1] : choose the second paper, first author: Ng /bib/paper[author = “Ng” ] : find all papers such that there exists an author Ng. XML Query Optimization Ng /bib/(paper|book)/title : find the titles of each element that is a paper or a book. Foundations… XML and Databases XML Query Optimization XPath examples

40 40 E XERCISE Evaluate 1. /bib/*/title 2. /bib//title 3. /bib/*[publisher = “McGraw”] Foundations… Abiteboul Hull Vianu Addison Wesley 1995 XML and Databases Ullmann XML Query Optimization Ng

41 41


Download ppt "S EMISTRUCTURED D ATA AND XML. 2222 D ATA F ILES ON THE W EB HTML documents often generated by applications consumed by humans only easy access: across."

Similar presentations


Ads by Google