Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.

Slides:



Advertisements
Similar presentations
XML e X tensible M arkup L anguage (XML) By: Albert Beng Kiat Tan Ayzer Mungan Edwin Hendriadi.
Advertisements

XML to Relational Database Mapping
XML: Extensible Markup Language
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Managing XML and Semistructured Data Lecture 8: Query Languages - XML-QL Prof. Dan Suciu Spring 2001.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
1 XEM: Managing the Evolution of XML Documents Author: Hong Su, Diane Kramer. Li Chen, Kajal Claypool and Elke A. Rundensteiner Presented by: Li Shuhong.
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Web Site Management Based on Declarative Specifications Alon Levy University of Washington Joint work with: Strudel: Dana Florescu (INRIA), Mary Fernandez,
1 COS 425: Database and Information Management Systems XML and information exchange.
XML - QL A Query Language for XML Version /2000XML-QL2 Outline * Introduction * Examples in XML-QL * A Data Model for XML * Advanced Examples in.
1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,
Winter 2002Arthur Keller – CS 18018–1 Schedule Today: Mar. 12 (T) u Semistructured Data, XML, XQuery. u Read Sections Assignment 8 due. Mar. 14.
1 New Ways of Querying the Web by Eliahu Brodsky and Alina Blizhovsky.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
4/15/2002Bo Du 1 - Bo Du, April 15, XML - QL A Query Language for XML.
Putting Semi-structured Data to Practice Alon Levy Seattle, Washingon University of Washington.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
XML-QL A Query Language for XML Charuta Nakhe
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
Extensible Markup and Beyond
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
JSTL, XML and XSLT An introduction to JSP Standard Tag Library and XML/XSLT transformation for Web layout.
1 What Is XML? eXtensible Markup Language for data –Standard for publishing and interchange –“Cleaner” SGML for the Internet Applications: –Data exchange.
Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Database Systems Part VII: XML Querying Software School of Hunan University
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Jennifer Widom XML Data Introduction, Well-formed XML.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
XML e X tensible M arkup L anguage (XML) By: Albert Beng Kiat Tan Ayzer Mungan Edwin Hendriadi.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Part One XML and Databases Soumen Chakrabarti CSE, IIT Bombay.
XML Technology. Emerging Importance of XML –HTML-tagging is display oriented. –XML-based content tagging has important uses: data mining role-oriented.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
S EMISTRUCTURED D ATA AND XML D ISCUSSION Q UESTION Think about your personal Itunes library. Should it be maintained in a database system?
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
Geographic Information Systems GIS Data Databases.
Information Retrieval in Practice
CS 325 Spring ‘09 Chapter 1 Goals:
XML to Relational Database Mapping
eXtensible Markup Language
XML: Extensible Markup Language
XML path expressions CSE 350 Fall 2003.
Eugenia Fernandez IUPUI
Databases.
9/22/2018.
Prepared for Md. Zakir Hossain Lecturer, CSE, DUET Prepared by Miton Chandra Datta
 DATAABSTRACTION  INSTANCES& SCHEMAS  DATA MODELS.
Data Warehousing/Mining Comp 150 DW Semistructured Data
Concept of a document Lesson 3.
Advanced Database Models
XML Data Introduction, Well-formed XML.
eXtensible Markup Language (XML)
Semi-Structured data (XML Data MODEL)
Data Model.
Alin Deutsch, University of Pennsylvania Mary Mernandez, AT&T Labs
Lecture 9: XML Monday, October 17, 2005.
Lecture 8: XML Data Wednesday, October
CSE591: Data Mining by H. Liu
Introduction to Database Systems CSE 444 Lecture 10 XML
Semi-Structured data (XML)
Lecture 11: XML and Semistructured Data
Presentation transcript:

Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such data requires rethinking the design of components of a DBMS: data model, query language, optimizer, storage system. The emergence of XML data underscores the importance of semi-structured data.

Issues: Outline Semi-formal definition and examples. Modeling semi-structured data Querying semi-structured data The XML challenge

Main Characteristics Schema is not what it used to be: not given in advance (often implicit in the data) descriptive, not prescriptive, partial, rapidly evolving, may be large (compared to the size of the data) Types are not what they used to be: objects and attributes are not strongly typed objects in the same collection have different representations.

Example: XML <bib> <book year="1995"> <title> Database Systems </title> <author> <lastname> Date </lastname> </author> <publisher> Addison-Wesley </publisher> </book> <book year="1998"> <title> Foundation for Object/Relational Databases </title> <author> <lastname> Darwen </lastname> </author> <ISBN> <number> 01-23-456 </number > </ISBN> </bib>

Example: Data Integration user Mediator: uniform access to multiple data sources Structured file Legacy system RDBMS OODBMS Each source represents data differently: different data models, different schemas

Physical versus Logical Structure In some cases, data can be modeled in relational or object-oriented models, but extracting the tuples is hard extracting data from HTML: [Ashish and Knoblock, 97], [Hammer et al., 97], [Kushmerick and Weld, 97]. Semi-structured data: when the data cannot be modeled naturally or usefully using a standard data model.

Managing Semi-structured Data How do we model it? (directed labeled graphs). How do we query it? (many proposals, all include regular path expressions). Optimize queries? (beginning to understand). Store the data? (looking for patterns) Integrity constraints, views, updates,…,

Modeling Semi-Structured Data Labeled directed graphs: (from OEM [TSIMMIS]): b01 author year author title a1 “DBMS” 1997 a2 LastName FirstName url “Widom” “http://” “Ullman” “Jeff” Nodes are objects; labels on the arcs are attribute names.

Querying Semi-structured Data Important features: ability to navigate the data (regular path expressions), querying the attribute names (arc variables), create new structures, type coercion. Languages: Lorel (Stanford), UnQL (U. Penn), StruQL (AT&T, INRIA, UW).

The StruQL Query Language A StruQL query is a function from a set of input graphs to an output graph. A StruQL expression contains two parts: A query component, and A restructuring component. Formally: INPUT graph names WHERE conjunction of regular path expression atoms CREATE name the nodes in the output graph using Skolem functions LINK specify the links in the resulting graph. OUTPUT resulting-graph name.

Example: Reversing a graph WHERE x -> * -> y, y -> l -> z CREATE New(x), New(y), New(z) LINK New(z) -> l -> New(y)

Example Query: StruQL WHERE Articles(art), art -> l -> value, l in { "Title", "Abstract", "Date", "Text", "Image", "Topimage", "RelatedSite"}, art -> * -> art1, Article(art1) CREATE ArticlePage(art), ArticlePage(art1) LINK ArticlePage(art) -> l -> att, ArticlePage(art) -> “related article” -> ArticlePage(art1)

StruQL Details R <- “a” | e | R1.R2 | R1|R2 | R1* | L | _ Regular path expressions are constructed by a grammar: R <- “a” | e | R1.R2 | R1|R2 | R1* | L | _ Atoms in the WHERE clause are of the form X -> R -> Y or C(X) The LINK clause includes atoms of the form: LINK f(X) --> “new link” --> g(X) or LINK f(X) --> L --> g(X) Queries can be nested, inheriting the WHERE clauses of their outer blocks.

Can the database community & semi-structured data The Test of XML XML (Extended Markup Language) is emerging as a standard for exchanging data on the Web. Enables separation of content (XML) and presentation (XSL). DTD’s (Document Type Descriptors) provide partial schemas for XML documents. Applications will need to manage XML data. Can the database community & semi-structured data be of any help?

Semi-structured Data vs. XML Attributes ---> tags objects ---> elements atomic values ---> CDATA (characters) Order? Assumed in XML. XML attributes (fixable) References in XML. Real problem: XML comes with no data model!

References and Attributes <bib> <book year="1995”, key=“o12”, references=“o24”> <title> Database Systems </title> <author> <lastname> Date </lastname> </author> <publisher> Addison-Wesley </publisher> </book> <book year="1998”, key=“o24”> <title> Foundation for Object/Relational Databases </title> <author> <lastname> Darwen </lastname> </author> <ISBN> <number> 01-23-456 </number > </ISBN> </bib>

Semantics of Queries with Order select N from Bib.book X, X.reference Y, Y.reference Z, Y.author.lastname N, Z.year U where X.publisher = "Addison-Wesley" ordered-by U Semantics of the answer in unclear!

XML-QL where <book> <publisher><name>Addison-Wesley</></> <title> $t</> <author> $a</> </> in "www.a.b.c/bib.xml" construct <result> </> IBM, Oracle and Microsoft are jointly developing a query language for XML, based on various proposals.