Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.

Slides:



Advertisements
Similar presentations
XML e X tensible M arkup L anguage (XML) By: Albert Beng Kiat Tan Ayzer Mungan Edwin Hendriadi.
Advertisements

XML: Extensible Markup Language
Data Modeling and Database Design Chapter 1: Database Systems: Architecture and Components.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Managing XML and Semistructured Data Lecture 8: Query Languages - XML-QL Prof. Dan Suciu Spring 2001.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
1 XEM: Managing the Evolution of XML Documents Author: Hong Su, Diane Kramer. Li Chen, Kajal Claypool and Elke A. Rundensteiner Presented by: Li Shuhong.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
1 Overview XML crash course –HTML vs. XML –pure XML data model (XML = linear syntax for trees) XML Schema Rubin Landau, Bertram Ludaescher, Richard Marciano,
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Web Site Management Based on Declarative Specifications Alon Levy University of Washington Joint work with: Strudel: Dana Florescu (INRIA), Mary Fernandez,
1 COS 425: Database and Information Management Systems XML and information exchange.
XML - QL A Query Language for XML Version /2000XML-QL2 Outline * Introduction * Examples in XML-QL * A Data Model for XML * Advanced Examples in.
1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,
1 XML and QUERY Shilpi Ahuja CSE Data Mining 4 th April 2002.
Winter 2002Arthur Keller – CS 18018–1 Schedule Today: Mar. 12 (T) u Semistructured Data, XML, XQuery. u Read Sections Assignment 8 due. Mar. 14.
XML and The Relational Data Model
1 New Ways of Querying the Web by Eliahu Brodsky and Alina Blizhovsky.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
4/15/2002Bo Du 1 - Bo Du, April 15, XML - QL A Query Language for XML.
CH 11 Multimedia IR: Models and Languages
Putting Semi-structured Data to Practice Alon Levy Seattle, Washingon University of Washington.
RIZWAN REHMAN, CCS, DU. Advantages of ORDBMSs  The main advantages of extending the relational data model come from reuse and sharing.  Reuse comes.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Semi-Structured Data Models By Chris Bennett. Semi-Structured Data  What is it? Data where structure not necessarily determined in advance (often implicit.
4/20/2017.
Copyright © 2004 Pearson Education, Inc. Chapter 1 Introduction.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
Lecture 21 XML querying. 2 XSL (eXtensible Stylesheet Language) In HTML, default styling is built into browsers as tag set for HTML is predefined and.
XML-QL A Query Language for XML Charuta Nakhe
Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
Database System Concepts and Architecture
CSCE 520- Relational Data Model Lecture 2. Relational Data Model The following slides are reused by the permission of the author, J. Ullman, from the.
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
JSTL, XML and XSLT An introduction to JSP Standard Tag Library and XML/XSLT transformation for Web layout.
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
1 What Is XML? eXtensible Markup Language for data –Standard for publishing and interchange –“Cleaner” SGML for the Internet Applications: –Data exchange.
Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Jennifer Widom XML Data Introduction, Well-formed XML.
CPT-S Topics in Computer Science Big Data 1 1 Yinghui Wu EME 49.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
XML e X tensible M arkup L anguage (XML) By: Albert Beng Kiat Tan Ayzer Mungan Edwin Hendriadi.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
CSCE 520- Relational Data Model Lecture 2. Oracle login Login from the linux lab or ssh to one of the linux servers using your cse username and password.
Part One XML and Databases Soumen Chakrabarti CSE, IIT Bombay.
XML Technology. Emerging Importance of XML –HTML-tagging is display oriented. –XML-based content tagging has important uses: data mining role-oriented.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
임 순 범 숙명여대 정보과학부 멀티미디어학과 1 III. XML-QL 멀티미디어 데이터베이스 ( ~11.1)
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
XML Data Introduction, Well-formed XML.
eXtensible Markup Language (XML)
Alin Deutsch, University of Pennsylvania Mary Mernandez, AT&T Labs
Lecture 9: XML Monday, October 17, 2005.
CSE591: Data Mining by H. Liu
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.
Presentation transcript:

Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing such data requires rethinking the design of components of a DBMS: –data model, query language, optimizer, storage system. The emergence of XML data underscores the importance of semi-structured data.

Issues: Outline Semi-formal definition and examples. Modeling semi-structured data Querying semi-structured data The XML challenge

Main Characteristics Schema is not what it used to be: not given in advance (often implicit in the data) descriptive, not prescriptive, partial, rapidly evolving, may be large (compared to the size of the data) Types are not what they used to be: objects and attributes are not strongly typed objects in the same collection have different representations.

Example: XML Database Systems Date Addison-Wesley Foundation for Object/Relational Databases Date Darwen

Example: Data Integration Mediator: uniform access to multiple data sources RDBMSOODBMS Structured file Legacy system Each source represents data differently: different data models, different schemas user

Physical versus Logical Structure In some cases, data can be modeled in relational or object-oriented models, but extracting the tuples is hard –extracting data from HTML: [Ashish and Knoblock, 97], [Hammer et al., 97], [Kushmerick and Weld, 97]. Semi-structured data: when the data cannot be modeled naturally or usefully using a standard data model.

Managing Semi-structured Data How do we model it? (directed labeled graphs). How do we query it? (many proposals, all include regular path expressions). Optimize queries? (beginning to understand). Store the data? (looking for patterns) Integrity constraints, views, updates,…,

Modeling Semi-Structured Data b01 a1 a2 “DBMS” 1997 “Ullman” “Widom” “Jeff” “ author title year LastName FirstName url Labeled directed graphs: (from OEM [TSIMMIS]): Nodes are objects; labels on the arcs are attribute names.

Querying Semi-structured Data Important features: –ability to navigate the data (regular path expressions), –querying the attribute names (arc variables), –create new structures, –type coercion. Languages: Lorel (Stanford), UnQL (U. Penn), StruQL (AT&T, INRIA, UW).

The StruQL Query Language A StruQL query is a function from a set of input graphs to an output graph. A StruQL expression contains two parts: A query component, and A restructuring component. Formally: INPUT graph names WHERE conjunction of regular path expression atoms CREATE name the nodes in the output graph using Skolem functions LINK specify the links in the resulting graph. OUTPUT resulting-graph name.

Example: Reversing a graph WHERE x -> * -> y, y -> l -> z CREATE New(x), New(y), New(z) LINK New(z) -> l -> New(y)

Example Query: StruQL WHERE Articles(art), art -> l -> value, l in { "Title", "Abstract", "Date", "Text", "Image", "Topimage", "RelatedSite"}, art -> * -> art1, Article(art1) CREATE ArticlePage(art), ArticlePage(art1) LINK ArticlePage(art) -> l -> att, ArticlePage(art) -> “related article” -> ArticlePage(art1)

StruQL Details Regular path expressions are constructed by a grammar: R <- “a” |  | R1.R2 | R1|R2 | R1* | L | _ Atoms in the WHERE clause are of the form X -> R -> Y or C(X) The LINK clause includes atoms of the form: LINK f(X) --> “new link” --> g(X) or LINK f(X) --> L --> g(X) Queries can be nested, inheriting the WHERE clauses of their outer blocks.

The Test of XML XML (Extended Markup Language) is emerging as a standard for exchanging data on the Web. Enables separation of content (XML) and presentation (XSL). DTD’s (Document Type Descriptors) provide partial schemas for XML documents. Applications will need to manage XML data. Can the database community & semi-structured data be of any help?

Semi-structured Data vs. XML Attributes ---> tags objects ---> elements atomic values ---> CDATA (characters) Order? Assumed in XML. XML attributes (fixable) References in XML. Real problem: XML comes with no data model!

References and Attributes Database Systems Date Addison-Wesley Foundation for Object/Relational Databases Date Darwen

Semantics of Queries with Order select N from Bib.book X, X.reference Y, Y.reference Z, Y.author.lastname N, Z.year U where X.publisher = "Addison-Wesley" ordered-by U Semantics of the answer in unclear!

XML-QL where Addison-Wesley $t $a in " construct $a $t IBM, Oracle and Microsoft are jointly developing a query language for XML, based on various proposals.