End of SQL XML April 22 th, 2002. Null Values If x=Null then 4*(3-x)/7 is still NULL If x=Null then x=“Joe” is UNKNOWN Three boolean values: –FALSE =

Slides:



Advertisements
Similar presentations
1 Lecture 4: Advanced SQL. 2 INTERSECT and EXCEPT: (missing from MySQL) (SELECT R.A, R.B FROM R) INTERSECT (SELECT S.A, S.B FROM S) (SELECT R.A, R.B FROM.
Advertisements

XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 311 Database Systems I The Semistructured Data Model.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
CSE 636 Data Integration XML Semistructured Data Document Type Definitions.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
1 Lecture 10: Database Design XML Wednesday, October 20, 2004.
1 Lecture 05: SQL Wednesday, October 8, Outline Outer joins (6.3.8) Database Modifications (6.5) Defining Relation Schema in SQL (6.6) Indexes.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 INTERSECT and EXCEPT: (may no be in MySQL) (SELECT R.A, R.B FROM R) INTERSECT (SELECT S.A, S.B FROM S) (SELECT R.A, R.B FROM R) INTERSECT (SELECT S.A,
1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,
XML May 2 nd, Agenda XML as a data model Querying XML Manipulating XML A lot of discussion, politics and stories.
XML May 1 st, XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” persons.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
1 Introduction to XML Yanlei Diao UMass Amherst April 19, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.
Lecture #6 XML November 2 nd, Administration Thanks for the mid-term comments Comment on the book & readings Project #2 Project #1 Homework #4 Homework.
CSE 326: Data Structures Lecture #22 Multidimensional Search Trees Alon Halevy Spring Quarter 2001.
XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions.
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Managing XML and Semistructured Data Lecture 2: XML Prof. Dan Suciu Spring 2001.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Introduction to XML This material is based heavily on the tutorial by the same name at
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
XML – a data sharing standard DSC340 Mike Pangburn.
XML: Extensible Markup Language FST-UMAC Gong Zhiguo.
Web Data Management XML and its Syntax.
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives.
CSCE 520- Relational Data Model Lecture 2. Relational Data Model The following slides are reused by the permission of the author, J. Ullman, from the.
1 Data Integration. 2 Motivating Examples An organization has on average 49 databases –can talk about the same topic, but use different vocabularies,
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
1 Lecture 04: SQL Wednesday, January 11, Outline Two Examples Nulls (6.1.6) Outer joins (6.3.8) Database Modifications (6.5)
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Jeff Ullman: Introduction to XML 1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
1 XML eXtensible Markup Language. 2 XML vs. HTML HTML is a HyperText Markup language HTML is a HyperText Markup language Designed for a specific application,
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
More XML: semantics, DTDs, XPATH February 18, 2004.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
XML e X tensible M arkup L anguage (XML) By: Albert Beng Kiat Tan Ayzer Mungan Edwin Hendriadi.
1 SQL Additional Notes. 2  1 Group and Aggregation*  2 Execution Order*  3 Join*  4 Find the maximum  5 Line Format SQL Additional Notes *partially.
1 Introduction to Database Systems CSE 444 Lecture 04: SQL April 7, 2008.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Lecture 14: Relational Algebra Projects XML?
Lecture 10 XML Monday, Oct. 21, 2001.
Lecture 05: SQL Wednesday, January 12, 2005.
Management of XML and Semistructured Data
Introduction to Database Systems CSE 444 Lecture 04: SQL
Managing XML and Semistructured Data
Lecture 11 XML Wednesday, Oct. 24, 2001.
eXtensible Markup Language (XML)
Semi-Structured data (XML Data MODEL)
Lectures 7: Introduction to SQL 6
Lecture 9: XML Monday, October 17, 2005.
CSE 544: Lecture 5 XML 4/15/2002.
Lecture 8: XML Data Wednesday, October
CSE591: Data Mining by H. Liu
Introduction to Database Systems CSE 444 Lecture 10 XML
Semi-Structured data (XML)
Lecture 04: SQL Monday, October 6, 2003.
Lecture 11: XML and Semistructured Data
Presentation transcript:

End of SQL XML April 22 th, 2002

Null Values If x=Null then 4*(3-x)/7 is still NULL If x=Null then x=“Joe” is UNKNOWN Three boolean values: –FALSE = 0 –UNKNOWN = 0.5 –TRUE = 1

Null Value Logic C1 AND C2 = min(C1, C2) C1 OR C2 = max(C1, C2) NOT C1 = 1 – C1 SELECT * FROM Person WHERE (age < 25) AND (height > 6 OR weight > 190) Semantics of SQL: include only tuples that yield TRUE

Null Values Unexpected behavior: SELECT * FROM Person WHERE age = 25 Some Persons are not included !

Testing for Null Can test for NULL explicitly: –x IS NULL –x IS NOT NULL SELECT * FROM Person WHERE age = 25 OR age IS NULL Now it includes all Persons

Outerjoins Explicit joins in SQL: Product(name, category) Purchase(prodName, store) SELECT Product.name, Purchase.store FROM Product JOIN Purchase ON Product.name = Purchase.prodName Same as: SELECT Product.name, Purchase.store FROM Product, Purchase WHERE Product.name = Purchase.prodName But Products that never sold will be lost !

Left Outerjoins Left outer joins in SQL: Product(name, category) Purchase(prodName, store) SELECT Product.name, Purchase.store FROM Product LEFT OUTER JOIN Purchase ON Product.name = Purchase.prodName

NameCategory Gizmogadget CameraPhoto OneClickPhoto ProdNameStore GizmoWiz CameraRitz CameraWiz NameStore GizmoWiz CameraRitz CameraWiz OneClick- ProductPurchase

Outer Joins Left outer join: –Include the left tuple even if there’s no match Right outer join: –Include the right tuple even if there’s no match Full outer join: –Include the both left and right tuples even if there’s no match

XML

More Facts About XML Every database vendor has an XML page: – – – Many applications are just fancier Websites But, most importantly, XML enables data sharing on the Web – hence our interest

What is XML ? From HTML to XML HTML describes the presentation: easy for humans

HTML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteboul, Buneman, Suciu Morgan Kaufmann, 1999 Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteboul, Buneman, Suciu Morgan Kaufmann, 1999 HTML is hard for applications

XML Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML describes the content: easy for applications

XML eXtensible Markup Language Roots: comes from SGML –A very nasty language After the roots: a format for sharing data Emerging format for data exchange on the Web and between applications

XML Applications Sharing data between different components of an application. Archive data in text files. EDI: electronic data exchange: –Transactions between banks –Producers and suppliers sharing product data (auctions) –Extranets: building relationships between companies Scientists sharing data about experiments.

Web Services A new paradigm for creating distributed applications? Systems communicate via messages, contracts. Example: order processing system. MS.NET, J2EE – some of the platforms XML – a part of the story; the data format.

XML Syntax Very simple: <db> <book> <title>Complete Guide to DB2</title> <author>Chamberlin</author> </book> < > <title>Transaction Processing</title> <author>Bernstein</author> < >Newcomer</author> </book> <publisher> <name>Morgan Kaufman</name> <state>CA</state> </publisher> </db>

XML Terminology tags: book, title, author, … start tag:, end tag: start tags must correspond to end tags, and conversely

XML Terminology an element: everything between tags –example element: Complete Guide to DB2 –example element: elements may be nested empty element: abbreviated an XML document has a unique root element well formed XML document: if it has matching tags Complete Guide to DB2 Chamberlin

The XML Tree db book publisher titleauthor titleauthor namestate “Complete Guide to DB2” “Chamberlin”“Transaction Processing” “Bernstein”“Newcomer” “Morgan Kaufman” “CA” Tags on nodes Data values on leaves

More XML Syntax: Attributes Complete Guide to DB2 Chamberlin 1998 Complete Guide to DB2 Chamberlin 1998 price, currency are called attributes

Replacing Attributes with Elements Complete Guide to DB2 Chamberlin USD Complete Guide to DB2 Chamberlin USD attributes are alternative ways to represent data

“Types” (or “Schemas”) for XML Document Type Definition – DTD Define a grammar for the XML document, but we use it as substitute for types/schemas Will be replaced by XML-Schema (will extend DTDs)

An Example DTD PCDATA means Parsed Character Data (a mouthful for string) <!DOCTYPE db [ ]> <!DOCTYPE db [ ]>

More on DTDs: Attributes <!DOCTYPE db [... <!ATTLIS book price CDATA #REQUIRED language CDATA #IMPLIED> ]> <!DOCTYPE db [... <!ATTLIS book price CDATA #REQUIRED language CDATA #IMPLIED> ]> Complete Guide to DB2 Chamberlin … Complete Guide to DB2 Chamberlin … The type: CDATA = string ID = a key IDREF = a foreign key others=rarely used Default declaration: #REQUIRED=required #IMPLIED=optional #FIXED=fixed (rarely used)

DTDs as Grammars Same thing as: A DTD is a EBNF (Extended BNF) grammar An XML tree is precisely a derivation tree XML Documents that have a DTD and conform to it are called valid db ::= (book|publisher)* book ::= (title,author*,year?) title ::= string author ::= string year ::= string publisher ::= string db ::= (book|publisher)* book ::= (title,author*,year?) title ::= string author ::= string year ::= string publisher ::= string

More on DTDs as Grammars <!DOCTYPE paper [ ]> <!DOCTYPE paper [ ]> … XML documents can be nested arbitrarily deep

XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” persons XML: persons

XML vs Data Models XML is self-describing Schema elements become part of the data –Relational schema: persons(name,phone) –In XML,, are part of the data, and are repeated many times Consequence: XML is much more flexible XML = semistructured data

Semi-structured Data Explained Missing attributes: Repeated attributes John 1234 Joe John 1234 Joe  no phone ! Mary Mary  two phones !

Semistructured Data Explained Attributes with different types in different objects Nested collections (no 1NF) Heterogeneous collections: – contains both s and s John Smith 1234 John Smith 1234  structured name !

XML Data v.s. E/R, ODL, Relational Q: is XML better or worse ? A: serves different purposes –E/R, ODL, Relational models: For centralized processing, when we control the data –XML: Data sharing between different systems we do not have control over the entire data E.g. on the Web Do NOT use XML to model your data ! Use E/R, ODL, or relational instead.

Data Sharing with XML: Easy Data source (e.g. relational Database) Application Web XML

Exporting Relational Data to XML Product(pid, name, weight) Company(cid, name, address) Makes(pid, cid, price) productcompany makes

Export data grouped by companies GizmoWorks Tacoma gizmo … Bang Kirkland gizmo … … GizmoWorks Tacoma gizmo … Bang Kirkland gizmo … … Redundant representation of products

The DTD

Export Data by Products Gizmo GizmoWorks Tacoma Bang Kirkland … OneClick … Gizmo GizmoWorks Tacoma Bang Kirkland … OneClick … Redundant Representation of companies

Which One Do We Choose ? The structure of the XML data is determined by agreement, with our partners, or dictated by committees –Many XML dialects (called applications) XML Data is often nested, irregular, etc No normal forms for XML

Storing XML Data We got lots of XML data from the Web, how do we store it ? Ideally: convert to relational data, store in RDBMS Much harder than exporting relations to XML (why ?) DB Vendors currently work on tools for loading XML data into an RDBMS