Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

Similar presentations


Presentation on theme: "Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin."— Presentation transcript:

1 Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin

2 Introduction to Databases, B. Ludaescher & K. Lin 2 Overview (Part 2) 09:15-10:20Relational Databases (1h05’) 10:20-10:30BREAK (10’) 10:30-11:50Relational Databases (1h20’) 11:50-13:15LUNCH (1h25’) 13:15-13:45 Demo & Hands-on (30’) 13:45-15:10 XML: Basics (1h25’) 15:10-15:30BREAK (20’) 15:30-16:30 XML: Querying (1h) 16:30-17:00 Demo & Hands-on (30’)

3 Introduction to Databases, B. Ludaescher & K. Lin 3 XML and Related Standards An introduction to XML, DTDs, XML Schema, and the DOM includes material by Shawn Bowers, SDSC Michael Gertz, UC Davis

4 Introduction to Databases, B. Ludaescher & K. Lin 4

5 A Neuroscientist’s Information Integration Problem What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? ? Information Integration protein localization (NCMIR) neurotransmission (SENSELAB) sequence info (CaPROT) morphometry (SYNAPSE) “Complex Multiple-Worlds” Mediation Biomedical Informatics Research Network http://nbirn.net

6 A Home Buyer’s Information Integration Problem What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood with below-average crime rate and diverse population? ? Information Integration Realtor Demographics School Rankings Crime Stats “Multiple-Worlds” Mediation

7 An Online Shopper’s Information Integration Problem El Cheapo: “Where can I get the cheapest copy (including shipping cost) of Wittgenstein’s Tractatus Logicus-Philosophicus within a week?” ?InformationIntegration addall.com “One-World” Mediation amazon.comamazon.com A1books.comA1books.com half.comhalf.com barnes&noble.combarnes&noble.com Mediator (virtual DB) (vs. Datawarehouse)

8 Introduction to Databases, B. Ludaescher & K. Lin 8 Information Integration Challenges System aspects: “Grid” Middleware distributed data & computing Web Services, WSDL/SOAP, OGSA, … sources = functions, files, data sets … Syntax & Structure: (XML-Based) Data Mediators wrapping, restructuring (XML) queries and views sources = (XML) databases Semantics: Model-Based/Semantic Mediators conceptual models and declarative views Knowledge Representation: ontologies, description logics (RDF(S),OWL...) sources = knowledge bases (DB+CMs+ICs) Syntax Structure Semantics System integ.  reconciling S 4 heterogeneities  “gluing” together resources  bridging information and knowledge gaps computationally

9 Introduction to Databases, B. Ludaescher & K. Lin 9 Information Integration Challenges: S 4 Heterogeneities Systems Integration –platforms, devices, data & service distribution, APIs, protocols, …  Grid middleware technologies + e.g. single sign-on, platform independence, transparent use of remote resources, … Syntax & Structure –heterogeneous data formats (one for each tool...) –heterogeneous data models (RDBs, ORDBs, OODBs, XMLDBs, flat files, …) –heterogeneous schemas (one for each DB...)  Database mediation technologies + XML-based data exchange, integrated views, transparent query rewriting, … Semantics –fuzzy metadata, terminology, “hidden” semantics, implicit assumptions, …  Knowledge representation & semantic mediation technologies + “smart” data discovery & integration + e.g. ask about X (‘mafic’); find data about Y (‘diorite’); be happy anyways!

10 Introduction to Databases, B. Ludaescher & K. Lin 10 Structural / XML-Based Mediation

11 Introduction to Databases, B. Ludaescher & K. Lin 11 Information Integration from a DB Perspective Information Integration Problem –Given: data sources S 1,..., S k (DBMS, web sites,...) and user questions Q 1,..., Q n that can be answered using the S i –Find: the answers to Q 1,..., Q n The Database Perspective: source = “database”  S i has a schema (relational, XML, OO,...)  S i can be queried  define virtual (or materialized) integrated/global view G over S 1,..., S k using database query languages (SQL, XQuery,...)  questions become queries Q i against G(S 1,..., S k )

12 Introduction to Databases, B. Ludaescher & K. Lin 12 Standard (XML-Based) Mediator Architecture MEDIATOR Integrated Global (XML) View G Integrated View Definition G(..)  S 1 (..)…S k (..) USER/Client USER/Client 1. Query Q ( G (S 1,..., S k ) ) 1. Query Q ( G (S 1,..., S k ) ) S1S1 Wrapper (XML) View S2S2 Wrapper (XML) View SkSk Wrapper (XML) View web services as wrapper APIs 3. Q1 Q2 Q3 4. {answers(Q1)} {answers(Q2)} {answers(Q3)} 6. {answers(Q)}

13 Introduction to Databases, B. Ludaescher & K. Lin 13 Query Planning for Mediators Given: –User query Q: answer(…)  …G... –… & { G  … S … } global-as-view (GAV) –… & { S  … G … } local-as-view (LAV) –… & { ic(…)  … S … G… } integrity constraints (ICs) Find: –equivalent (or min. containing, max.contained) query plan Q’: answer(…)  … S … Results: –A variety of results/algorithms; depending on classes of queries, views, and ICs: P, NP,…, undecidable –many variants still open

14 Introduction to Databases, B. Ludaescher & K. Lin 14 Background Markup –Annotations (tags) for carrying information about a document’s content a writer’s handwritten notes for typesetting an editor’s corrections in a manuscript –A Markup Language defines a syntax and grammar for tags

15 Introduction to Databases, B. Ludaescher & K. Lin 15 Background (cont’d) SGML –Standard Generalized Markup Language –Standardized in 1986 (ISO) –A language for defining markup languages –And for marking-up content –Syntax + Document Type Definition (DTD) –Tools aimed at document management

16 Introduction to Databases, B. Ludaescher & K. Lin 16 Background (cont’d) HTML –A markup language –A particular SGML Document Type (called an “application”) –Tools for browsing and authoring

17 Introduction to Databases, B. Ludaescher & K. Lin 17 Background (cont’d) Limitations –SGML Complex, many options and shortcuts Must know the DTD to parse correctly Cost of SGML technology is high –HTML Not extensible—can’t define new tags Tags for presenting data not describing it Doesn’t capture much document structure or content meaning

18 Introduction to Databases, B. Ludaescher & K. Lin 18 Enter XML XML (Extensible Markup Language) –Standardized by W3C in 1998 –For data interchange over the Web –A Simpler SGML: Actually, a subset of SGML DTDs are optional Less features and options –Widely available tools for parsing, authoring, browsing, etc.

19 Introduction to Databases, B. Ludaescher & K. Lin 19 Uses for XML Why XML? –Capture logical structure of documents Presentation Independent –Data Interchange XML is implementation independent –Storage Format Maier’s Maxim: Any successful interchange format becomes a storage format –Metadata Searching, filtering, organizing –Data Packaging, Movement, and Processing Client-Side processing, Server-to-Server communication, Non- browser based clients, Simplified Server Processing, etc.

20 Introduction to Databases, B. Ludaescher & K. Lin 20

21 Introduction to Databases, B. Ludaescher & K. Lin 21 (Some of) The Many Standards of XML XML Document XML DTD Query XQuery, XQL, XML-QL Programming Document Object Model (DOM) - API to XML documents Transformation XSLT for rearranging and restructuring XML documents Transport XML-RPC, SOAP, XML-Protocol for message and object serialization and remote procedure calls Metadata RDF - using XML to define resource metadata Schema and Types XML Schema and XML data types Linking XLink for simple and complex hyperlinks between XML Documents Addressing XPath and Xpointer for addressing XML subdocuments

22 Introduction to Databases, B. Ludaescher & K. Lin 22 The Running Example Lego Product Catalogs –catalogs have: a publishing date, an identifier, a title, etc. –catalogs are made up of products either a kit or accessory each has an item #, price, name, picture, etc. kits can have an age level, # of pieces, set type (duplo, basic), a theme (star wars), a system (space)

23 Introduction to Databases, B. Ludaescher & K. Lin 23 An Example XML Catalog Document 2000 X-Wing Fighter 7 12 263 Star Wars Take to the skies with Luke as he battles the forces of evil! …

24 Introduction to Databases, B. Ludaescher & K. Lin 24 An Example XML Document prolog body elements have start and end-tags elements can also contain content elements are nested “boxes within boxes” 2000 X-Wing Fighter 7 12 263 Star Wars Take to the skies with Luke as he battles the forces of evil! …

25 Introduction to Databases, B. Ludaescher & K. Lin 25 Well Formed Documents Well-formed XML documents: –A single root element –Start and end tags required (unlike HTML) X-Wing Fighter empty-element tags: –Elements must be properly nested 263 –More rules: naming elements, document has at least one element, etc. This is NOT properly nested!!!

26 Introduction to Databases, B. Ludaescher & K. Lin 26 XML Attributes Elements can contain attributes element name attribute name attribute value attribute name attribute value attribute name attribute value Attributes are always assigned in element start tags, are always surrounded by double quotes, and must be unique in the element

27 Introduction to Databases, B. Ludaescher & K. Lin 27 Attributes vs. Content In general, it is up to the document designer In SGML, content usually was for data you see and attributes for metadata … how I do it: Attribute: “atomic” content, applying to the whole element Content (Subelement): otherwise

28 Introduction to Databases, B. Ludaescher & K. Lin 28 Document Type Definition Why DTDs? –To standardize tags and structure for interchange and creation –To make the documents machine processable What is a DTD? –A grammar for describing XML documents (tags, attributes, nesting, etc.) –An XML document that is well-formed and conforms to a DTD is said to be valid

29 Introduction to Databases, B. Ludaescher & K. Lin 29 An Example DTD: Elements <!ELEMENT kit (name, ages, pieces, theme?, series?, desc)> An element content model for LegoCatalog A character data content model for pubDate * zero or more + one or more ? optional | Choice, Strict Sequence () Grouping Empty, Any, and Mixed content models

30 Introduction to Databases, B. Ludaescher & K. Lin 30 An Example DTD: Attributes <!ATTLIST kit price CDATA #REQUIRED shipWeight CDATA #REQUIRED avail (yes | no) #IMPLIED image CDATA “na.jpg” unitId ID #IMPLIED > <!ATTLIST accessory forKits IDREFS #IMPLIED orderStatus CDATA #FIXED “special” > each attribute has the form: attr-name type default-decl CDATA = character data ID = unique identifier IDREF = reference to an ID IDREFS = list of references enumeration = list of possible values #REQUIRED = must appear #IMPLIED = optionally appear #FIXED + default = if attribute is missing, parser assumes value Default only = if attribute is missing, default is assumed, otherwise any value

31 Introduction to Databases, B. Ludaescher & K. Lin 31 Limitations of DTDs DTDs are not optimal –Not well-formed XML can’t parse them with an XML parser need different tools to create them + but at least you can sort-of read/understand them (try XML Schema ;-) –Limited support for defining data types –Limited modeling capabilities hard to express some structures no support for reusing structure

32 Introduction to Databases, B. Ludaescher & K. Lin 32 Enter XML Schema XML Schema –W3C proposed recommendation (2001) –Divided into 2 parts: structures, datatypes –Main features Well-formed XML documents A schema can span multiple documents Can define new data types and constraints Inheritance among content model types Improves data interchange –Offers more precision for computer-computer transfer

33 Introduction to Databases, B. Ludaescher & K. Lin 33 Example XML Schema <element name=“accessory” type=“Product” minOccurs=“0” maxOccurs=“unbounded”/>... …... Many ways to describe new data types (not just regular expressions) ComplexType = Content Model

34 Introduction to Databases, B. Ludaescher & K. Lin 34 XML Schema: User-Defined Type/Class Hierarchy Time to Leave the Trees: From Syntactic to Conceptual Querying of XMLTime to Leave the Trees: From Syntactic to Conceptual Querying of XML, B. Ludäscher, I. Altintas, A. Gupta, Intl. Workshop on XML Data Management (XMLDM), Prague, Czech Republic, March 2002, LNCS 2490, Springer(XMLDM) Time to Leave the Trees: From Syntactic to Conceptual Querying of XMLTime to Leave the Trees: From Syntactic to Conceptual Querying of XML, B. Ludäscher, I. Altintas, A. Gupta, Intl. Workshop on XML Data Management (XMLDM), Prague, Czech Republic, March 2002, LNCS 2490, Springer(XMLDM)

35 Introduction to Databases, B. Ludaescher & K. Lin 35 XML Schema Declarations (“home-style” syntax) Complex Type Declarations

36 Introduction to Databases, B. Ludaescher & K. Lin 36 XML Schema (“home-style”) Complex Types Simple Type Declarations

37 Introduction to Databases, B. Ludaescher & K. Lin 37 Programming with XML The DOM (document object model) –Maintained by the W3C –Language and platform independent –An object model for XML (actually, an API) core, views, events, style, persistence, etc. XML Parser Application generates DOM objects accesses creates & manipulates output

38 Introduction to Databases, B. Ludaescher & K. Lin 38 DOM Example Document Node NodeList Element Node Element Node Named Node Map Attr Node NodeList Text Char. Data Node NodeList Take to the skies... Document Root pieces=“263” Take to the skies... d.load(…) ln = d.documentElement kn = lnl.item(0) lnl = ln.childNodes ka = knm.item(0) knm = kn.attributes knl = kn.childNodes knl = knl.item(0) NOTE: I left off the desc element and just placed its content under kit.

39 Introduction to Databases, B. Ludaescher & K. Lin 39 XML Query Languages XPath: – /order//books/book[cover_style=“paperback”][price<80] XQuery –the W3C XML query language XSLT –XML transformations (XML=>HTML, XML=>XML)...

40 Introduction to Databases, B. Ludaescher & K. Lin 40 XPath

41 Introduction to Databases, B. Ludaescher & K. Lin 41 Example

42 Introduction to Databases, B. Ludaescher & K. Lin 42 XSLT Processing Model XML source tree XML,HTML,csv, text… result tree XSLT stylesheet Transformation

43 Introduction to Databases, B. Ludaescher & K. Lin 43 XSLT Elements –root element of an XSLT stylesheet "program"...template... –declares a rule: (pattern => template) –apply templates to selected children (default=all) –optional mode attribute

44 Introduction to Databases, B. Ludaescher & K. Lin 44 XSLT Processing Model XSL stylesheet: collection of template rules template rule: (pattern  template) main steps: –match pattern against source tree –instantiate template (replace current node “.” by the template in the result tree) –select further nodes for processing control can be a mix of –recursive processing ("push":...) –program-driven ("pull":...)

45 Introduction to Databases, B. Ludaescher & K. Lin 45 Template Rule: Example (i) match pattern: process elements (ii) instantiate template: replace each product element with two HTML tables (iii) select the grandchildren (“sales/domestic”, “sales/foreign”) for further processing pattern template

46 Introduction to Databases, B. Ludaescher & K. Lin 46 XSLT Example

47 Introduction to Databases, B. Ludaescher & K. Lin 47 XSLT Example (cont’d)

48 Introduction to Databases, B. Ludaescher & K. Lin 48 XSLT Example (cont’d)

49 Introduction to Databases, B. Ludaescher & K. Lin 49 Demonstrations XML Queries and Transformations

50 Introduction to Databases, B. Ludaescher & K. Lin 50 A Commercial Tool: XML Spy

51 Introduction to Databases, B. Ludaescher & K. Lin 51 XQuery

52 Introduction to Databases, B. Ludaescher & K. Lin 52 Example

53 Introduction to Databases, B. Ludaescher & K. Lin 53 XQuery Example

54 Introduction to Databases, B. Ludaescher & K. Lin 54 An XQuery Implementation: Galax http://www.galaxquery.org/

55 Introduction to Databases, B. Ludaescher & K. Lin 55 Example: Relational Data => XML c2b2a2 c3b3a3 c1b1a1 CBA R  R   tuple   A  a1  /A   B  b1  /B   C  c1  /C   /tuple   tuple   A  a2  /A   B  b2  /B   C  c2  /C   /tuple  …  /R  R tuple ABC a1 b1 c1 tuple ABC a2 b2 c2 tuple ABC a3 b3 c3

56 Introduction to Databases, B. Ludaescher & K. Lin 56 XQuery References XQuery:An XML query language, Don Chamberlin, IBM Systems Journal, 41(4), 2002. http://www.research.ibm.com/journal/sj/414/chamberlin.pdf http://www.research.ibm.com/journal/sj/414/chamberlin.pdf Galax XQuery implementation, http://www.galaxquery.org/http://www.galaxquery.org/


Download ppt "Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin."

Similar presentations


Ads by Google