Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

Slides:



Advertisements
Similar presentations
XML-XSL Introduction SHIJU RAJAN SHIJU RAJAN Outline Brief Overview Brief Overview What is XML? What is XML? Well Formed XML Well Formed XML Tag Name.
Advertisements

XML: Extensible Markup Language
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
CSE 190: Internet E-Commerce Lecture 17: XML, XSL.
1 Overview XML crash course –HTML vs. XML –pure XML data model (XML = linear syntax for trees) XML Schema Rubin Landau, Bertram Ludaescher, Richard Marciano,
1 COS 425: Database and Information Management Systems XML and information exchange.
XML A brief introduction ---by Yongzhu Li. XML --- a brief introduction 2 CSI668 Topics in System Architecture SUNY Albany Computer Science Department.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
Ontology-based Access Ontology-based Access to Digital Libraries Sonia Bergamaschi University of Modena and Reggio Emilia Modena Italy Fausto Rabitti.
Fundamentals of Web DevelopmentRandy Connolly and Ricardo HoarFundamentals of Web DevelopmentRandy Connolly and Ricardo Hoar Fundamentals of Web DevelopmentRandy.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Manohar – Why XML is Required Problem: We want to save the data and retrieve it further or to transfer over the network. This.
Aalborg University – Department of Production XML Extensible Markup Language Kaj A. Jørgensen Aalborg University, Department of Production XML – Extensible.
XML Fundementals XML vs.. HTML XML vs.. HTML XML Document (elements vs. attributes) XML Document (elements vs. attributes) XML and RDBMS XML and RDBMS.
XML – Extensible Markup Language Sivakumar Kuttuva & Janusz Zalewski.
Marco Mesiti Dep. of Computer Science University of Genova XML eXtensible Markup Language.
XML – what is it? eXtensible Markup Language Standard for publishing and interchange on the web and over the wire simpler version of SGML adapted to internet.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
XML and XSL Institutional Web Management 2001: Organising Chaos.
Scientific Markup Languages Birds of a Feather A 10-Minute Introduction to XML Timothy W. Cole Mathematics Librarian & Professor of.
1 XML at a neighborhood university near you Innovation 2005 September 16, 2005 Kwok-Bun Yue University of Houston-Clear Lake.
CREATED BY ChanoknanChinnanon PanissaraUsanachote
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Introduction to XML Eugenia Fernandez IUPUI. What is XML? From the World Wide Web Consortium (W3C) The Extensible Markup Language (XML) is the universal.
Extensible Markup and Beyond
CISC 3140 (CIS 20.2) Design & Implementation of Software Application II Instructor : M. Meyer Address: Course Page:
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
FIGIS’ML Hands-on training - © FAO/FIGIS An introduction to XML Objectives : –what is XML? –XML and HTML –XML documents structure well-formedness.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
EXtensible Markup Language (XML) and Documentation --ManojBokil -- Manoj Bokil.
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
XML About XML Things to be known Related Technologies XML DOC Structure Exploring XML.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
XML – An Introduction Structured Data Mark-up James McCartney CSCE 590, Cluster and Grid Computing.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
An Introduction to XML Sandeep Bhattaram
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
1 XML eXtensible Markup Language. 2 XML vs. HTML HTML is a HyperText Markup language HTML is a HyperText Markup language Designed for a specific application,
XML Engr. Faisal ur Rehman CE-105T Spring Definition XML-EXTENSIBLE MARKUP LANGUAGE: provides a format for describing data. Facilitates the Precise.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
1 Indexing The syntax for creating a index is: CREATE [UNIQUE] INDEX index_name ON table_name (column1, column2,... column_n) [ COMPUTE STATISTICS ]; Why.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
XML A Language Presentation. Outline 1. Introduction 2. XML 2.1 Background 2.2 Structure 2.3 Advantages 3. Related Technologies 3.1 DTD 3.2 Schemas and.
Martin Kruliš by Martin Kruliš (v1.1)1.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML – Basic Concepts (modified version from Dr. Praveen Madiraju) 2015, Fall Pusan National University Ki-Joune Li.
XML Introduction to XML Extensible Markup Language.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides.
Extensible Markup Language (XML) Pat Morin COMP 2405.
XML: Extensible Markup Language
XML in Web Technologies
Database Processing with XML
CSE591: Data Mining by H. Liu
Semi-Structured data (XML)
Presentation transcript:

Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin

Introduction to Databases, B. Ludaescher & K. Lin 2 Overview (Part 2) 09:15-10:20Relational Databases (1h05’) 10:20-10:30BREAK (10’) 10:30-11:50Relational Databases (1h20’) 11:50-13:15LUNCH (1h25’) 13:15-13:45 Demo & Hands-on (30’) 13:45-15:10 XML: Basics (1h25’) 15:10-15:30BREAK (20’) 15:30-16:30 XML: Querying (1h) 16:30-17:00 Demo & Hands-on (30’)

Introduction to Databases, B. Ludaescher & K. Lin 3 XML and Related Standards An introduction to XML, DTDs, XML Schema, and the DOM includes material by Shawn Bowers, SDSC Michael Gertz, UC Davis

Introduction to Databases, B. Ludaescher & K. Lin 4

A Neuroscientist’s Information Integration Problem What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? ? Information Integration protein localization (NCMIR) neurotransmission (SENSELAB) sequence info (CaPROT) morphometry (SYNAPSE) “Complex Multiple-Worlds” Mediation Biomedical Informatics Research Network

A Home Buyer’s Information Integration Problem What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood with below-average crime rate and diverse population? ? Information Integration Realtor Demographics School Rankings Crime Stats “Multiple-Worlds” Mediation

An Online Shopper’s Information Integration Problem El Cheapo: “Where can I get the cheapest copy (including shipping cost) of Wittgenstein’s Tractatus Logicus-Philosophicus within a week?” ?InformationIntegration addall.com “One-World” Mediation amazon.comamazon.com A1books.comA1books.com half.comhalf.com barnes&noble.combarnes&noble.com Mediator (virtual DB) (vs. Datawarehouse)

Introduction to Databases, B. Ludaescher & K. Lin 8 Information Integration Challenges System aspects: “Grid” Middleware distributed data & computing Web Services, WSDL/SOAP, OGSA, … sources = functions, files, data sets … Syntax & Structure: (XML-Based) Data Mediators wrapping, restructuring (XML) queries and views sources = (XML) databases Semantics: Model-Based/Semantic Mediators conceptual models and declarative views Knowledge Representation: ontologies, description logics (RDF(S),OWL...) sources = knowledge bases (DB+CMs+ICs) Syntax Structure Semantics System integ.  reconciling S 4 heterogeneities  “gluing” together resources  bridging information and knowledge gaps computationally

Introduction to Databases, B. Ludaescher & K. Lin 9 Information Integration Challenges: S 4 Heterogeneities Systems Integration –platforms, devices, data & service distribution, APIs, protocols, …  Grid middleware technologies + e.g. single sign-on, platform independence, transparent use of remote resources, … Syntax & Structure –heterogeneous data formats (one for each tool...) –heterogeneous data models (RDBs, ORDBs, OODBs, XMLDBs, flat files, …) –heterogeneous schemas (one for each DB...)  Database mediation technologies + XML-based data exchange, integrated views, transparent query rewriting, … Semantics –fuzzy metadata, terminology, “hidden” semantics, implicit assumptions, …  Knowledge representation & semantic mediation technologies + “smart” data discovery & integration + e.g. ask about X (‘mafic’); find data about Y (‘diorite’); be happy anyways!

Introduction to Databases, B. Ludaescher & K. Lin 10 Structural / XML-Based Mediation

Introduction to Databases, B. Ludaescher & K. Lin 11 Information Integration from a DB Perspective Information Integration Problem –Given: data sources S 1,..., S k (DBMS, web sites,...) and user questions Q 1,..., Q n that can be answered using the S i –Find: the answers to Q 1,..., Q n The Database Perspective: source = “database”  S i has a schema (relational, XML, OO,...)  S i can be queried  define virtual (or materialized) integrated/global view G over S 1,..., S k using database query languages (SQL, XQuery,...)  questions become queries Q i against G(S 1,..., S k )

Introduction to Databases, B. Ludaescher & K. Lin 12 Standard (XML-Based) Mediator Architecture MEDIATOR Integrated Global (XML) View G Integrated View Definition G(..)  S 1 (..)…S k (..) USER/Client USER/Client 1. Query Q ( G (S 1,..., S k ) ) 1. Query Q ( G (S 1,..., S k ) ) S1S1 Wrapper (XML) View S2S2 Wrapper (XML) View SkSk Wrapper (XML) View web services as wrapper APIs 3. Q1 Q2 Q3 4. {answers(Q1)} {answers(Q2)} {answers(Q3)} 6. {answers(Q)}

Introduction to Databases, B. Ludaescher & K. Lin 13 Query Planning for Mediators Given: –User query Q: answer(…)  …G... –… & { G  … S … } global-as-view (GAV) –… & { S  … G … } local-as-view (LAV) –… & { ic(…)  … S … G… } integrity constraints (ICs) Find: –equivalent (or min. containing, max.contained) query plan Q’: answer(…)  … S … Results: –A variety of results/algorithms; depending on classes of queries, views, and ICs: P, NP,…, undecidable –many variants still open

Introduction to Databases, B. Ludaescher & K. Lin 14 Background Markup –Annotations (tags) for carrying information about a document’s content a writer’s handwritten notes for typesetting an editor’s corrections in a manuscript –A Markup Language defines a syntax and grammar for tags

Introduction to Databases, B. Ludaescher & K. Lin 15 Background (cont’d) SGML –Standard Generalized Markup Language –Standardized in 1986 (ISO) –A language for defining markup languages –And for marking-up content –Syntax + Document Type Definition (DTD) –Tools aimed at document management

Introduction to Databases, B. Ludaescher & K. Lin 16 Background (cont’d) HTML –A markup language –A particular SGML Document Type (called an “application”) –Tools for browsing and authoring

Introduction to Databases, B. Ludaescher & K. Lin 17 Background (cont’d) Limitations –SGML Complex, many options and shortcuts Must know the DTD to parse correctly Cost of SGML technology is high –HTML Not extensible—can’t define new tags Tags for presenting data not describing it Doesn’t capture much document structure or content meaning

Introduction to Databases, B. Ludaescher & K. Lin 18 Enter XML XML (Extensible Markup Language) –Standardized by W3C in 1998 –For data interchange over the Web –A Simpler SGML: Actually, a subset of SGML DTDs are optional Less features and options –Widely available tools for parsing, authoring, browsing, etc.

Introduction to Databases, B. Ludaescher & K. Lin 19 Uses for XML Why XML? –Capture logical structure of documents Presentation Independent –Data Interchange XML is implementation independent –Storage Format Maier’s Maxim: Any successful interchange format becomes a storage format –Metadata Searching, filtering, organizing –Data Packaging, Movement, and Processing Client-Side processing, Server-to-Server communication, Non- browser based clients, Simplified Server Processing, etc.

Introduction to Databases, B. Ludaescher & K. Lin 20

Introduction to Databases, B. Ludaescher & K. Lin 21 (Some of) The Many Standards of XML XML Document XML DTD Query XQuery, XQL, XML-QL Programming Document Object Model (DOM) - API to XML documents Transformation XSLT for rearranging and restructuring XML documents Transport XML-RPC, SOAP, XML-Protocol for message and object serialization and remote procedure calls Metadata RDF - using XML to define resource metadata Schema and Types XML Schema and XML data types Linking XLink for simple and complex hyperlinks between XML Documents Addressing XPath and Xpointer for addressing XML subdocuments

Introduction to Databases, B. Ludaescher & K. Lin 22 The Running Example Lego Product Catalogs –catalogs have: a publishing date, an identifier, a title, etc. –catalogs are made up of products either a kit or accessory each has an item #, price, name, picture, etc. kits can have an age level, # of pieces, set type (duplo, basic), a theme (star wars), a system (space)

Introduction to Databases, B. Ludaescher & K. Lin 23 An Example XML Catalog Document 2000 X-Wing Fighter Star Wars Take to the skies with Luke as he battles the forces of evil! …

Introduction to Databases, B. Ludaescher & K. Lin 24 An Example XML Document prolog body elements have start and end-tags elements can also contain content elements are nested “boxes within boxes” 2000 X-Wing Fighter Star Wars Take to the skies with Luke as he battles the forces of evil! …

Introduction to Databases, B. Ludaescher & K. Lin 25 Well Formed Documents Well-formed XML documents: –A single root element –Start and end tags required (unlike HTML) X-Wing Fighter empty-element tags: –Elements must be properly nested 263 –More rules: naming elements, document has at least one element, etc. This is NOT properly nested!!!

Introduction to Databases, B. Ludaescher & K. Lin 26 XML Attributes Elements can contain attributes element name attribute name attribute value attribute name attribute value attribute name attribute value Attributes are always assigned in element start tags, are always surrounded by double quotes, and must be unique in the element

Introduction to Databases, B. Ludaescher & K. Lin 27 Attributes vs. Content In general, it is up to the document designer In SGML, content usually was for data you see and attributes for metadata … how I do it: Attribute: “atomic” content, applying to the whole element Content (Subelement): otherwise

Introduction to Databases, B. Ludaescher & K. Lin 28 Document Type Definition Why DTDs? –To standardize tags and structure for interchange and creation –To make the documents machine processable What is a DTD? –A grammar for describing XML documents (tags, attributes, nesting, etc.) –An XML document that is well-formed and conforms to a DTD is said to be valid

Introduction to Databases, B. Ludaescher & K. Lin 29 An Example DTD: Elements <!ELEMENT kit (name, ages, pieces, theme?, series?, desc)> An element content model for LegoCatalog A character data content model for pubDate * zero or more + one or more ? optional | Choice, Strict Sequence () Grouping Empty, Any, and Mixed content models

Introduction to Databases, B. Ludaescher & K. Lin 30 An Example DTD: Attributes <!ATTLIST kit price CDATA #REQUIRED shipWeight CDATA #REQUIRED avail (yes | no) #IMPLIED image CDATA “na.jpg” unitId ID #IMPLIED > <!ATTLIST accessory forKits IDREFS #IMPLIED orderStatus CDATA #FIXED “special” > each attribute has the form: attr-name type default-decl CDATA = character data ID = unique identifier IDREF = reference to an ID IDREFS = list of references enumeration = list of possible values #REQUIRED = must appear #IMPLIED = optionally appear #FIXED + default = if attribute is missing, parser assumes value Default only = if attribute is missing, default is assumed, otherwise any value

Introduction to Databases, B. Ludaescher & K. Lin 31 Limitations of DTDs DTDs are not optimal –Not well-formed XML can’t parse them with an XML parser need different tools to create them + but at least you can sort-of read/understand them (try XML Schema ;-) –Limited support for defining data types –Limited modeling capabilities hard to express some structures no support for reusing structure

Introduction to Databases, B. Ludaescher & K. Lin 32 Enter XML Schema XML Schema –W3C proposed recommendation (2001) –Divided into 2 parts: structures, datatypes –Main features Well-formed XML documents A schema can span multiple documents Can define new data types and constraints Inheritance among content model types Improves data interchange –Offers more precision for computer-computer transfer

Introduction to Databases, B. Ludaescher & K. Lin 33 Example XML Schema <element name=“accessory” type=“Product” minOccurs=“0” maxOccurs=“unbounded”/>... …... Many ways to describe new data types (not just regular expressions) ComplexType = Content Model

Introduction to Databases, B. Ludaescher & K. Lin 34 XML Schema: User-Defined Type/Class Hierarchy Time to Leave the Trees: From Syntactic to Conceptual Querying of XMLTime to Leave the Trees: From Syntactic to Conceptual Querying of XML, B. Ludäscher, I. Altintas, A. Gupta, Intl. Workshop on XML Data Management (XMLDM), Prague, Czech Republic, March 2002, LNCS 2490, Springer(XMLDM) Time to Leave the Trees: From Syntactic to Conceptual Querying of XMLTime to Leave the Trees: From Syntactic to Conceptual Querying of XML, B. Ludäscher, I. Altintas, A. Gupta, Intl. Workshop on XML Data Management (XMLDM), Prague, Czech Republic, March 2002, LNCS 2490, Springer(XMLDM)

Introduction to Databases, B. Ludaescher & K. Lin 35 XML Schema Declarations (“home-style” syntax) Complex Type Declarations

Introduction to Databases, B. Ludaescher & K. Lin 36 XML Schema (“home-style”) Complex Types Simple Type Declarations

Introduction to Databases, B. Ludaescher & K. Lin 37 Programming with XML The DOM (document object model) –Maintained by the W3C –Language and platform independent –An object model for XML (actually, an API) core, views, events, style, persistence, etc. XML Parser Application generates DOM objects accesses creates & manipulates output

Introduction to Databases, B. Ludaescher & K. Lin 38 DOM Example Document Node NodeList Element Node Element Node Named Node Map Attr Node NodeList Text Char. Data Node NodeList Take to the skies... Document Root pieces=“263” Take to the skies... d.load(…) ln = d.documentElement kn = lnl.item(0) lnl = ln.childNodes ka = knm.item(0) knm = kn.attributes knl = kn.childNodes knl = knl.item(0) NOTE: I left off the desc element and just placed its content under kit.

Introduction to Databases, B. Ludaescher & K. Lin 39 XML Query Languages XPath: – /order//books/book[cover_style=“paperback”][price<80] XQuery –the W3C XML query language XSLT –XML transformations (XML=>HTML, XML=>XML)...

Introduction to Databases, B. Ludaescher & K. Lin 40 XPath

Introduction to Databases, B. Ludaescher & K. Lin 41 Example

Introduction to Databases, B. Ludaescher & K. Lin 42 XSLT Processing Model XML source tree XML,HTML,csv, text… result tree XSLT stylesheet Transformation

Introduction to Databases, B. Ludaescher & K. Lin 43 XSLT Elements –root element of an XSLT stylesheet "program"...template... –declares a rule: (pattern => template) –apply templates to selected children (default=all) –optional mode attribute

Introduction to Databases, B. Ludaescher & K. Lin 44 XSLT Processing Model XSL stylesheet: collection of template rules template rule: (pattern  template) main steps: –match pattern against source tree –instantiate template (replace current node “.” by the template in the result tree) –select further nodes for processing control can be a mix of –recursive processing ("push":...) –program-driven ("pull":...)

Introduction to Databases, B. Ludaescher & K. Lin 45 Template Rule: Example (i) match pattern: process elements (ii) instantiate template: replace each product element with two HTML tables (iii) select the grandchildren (“sales/domestic”, “sales/foreign”) for further processing pattern template

Introduction to Databases, B. Ludaescher & K. Lin 46 XSLT Example

Introduction to Databases, B. Ludaescher & K. Lin 47 XSLT Example (cont’d)

Introduction to Databases, B. Ludaescher & K. Lin 48 XSLT Example (cont’d)

Introduction to Databases, B. Ludaescher & K. Lin 49 Demonstrations XML Queries and Transformations

Introduction to Databases, B. Ludaescher & K. Lin 50 A Commercial Tool: XML Spy

Introduction to Databases, B. Ludaescher & K. Lin 51 XQuery

Introduction to Databases, B. Ludaescher & K. Lin 52 Example

Introduction to Databases, B. Ludaescher & K. Lin 53 XQuery Example

Introduction to Databases, B. Ludaescher & K. Lin 54 An XQuery Implementation: Galax

Introduction to Databases, B. Ludaescher & K. Lin 55 Example: Relational Data => XML c2b2a2 c3b3a3 c1b1a1 CBA R  R   tuple   A  a1  /A   B  b1  /B   C  c1  /C   /tuple   tuple   A  a2  /A   B  b2  /B   C  c2  /C   /tuple  …  /R  R tuple ABC a1 b1 c1 tuple ABC a2 b2 c2 tuple ABC a3 b3 c3

Introduction to Databases, B. Ludaescher & K. Lin 56 XQuery References XQuery:An XML query language, Don Chamberlin, IBM Systems Journal, 41(4), Galax XQuery implementation,