Semi-Structured Data Models By Chris Bennett. Semi-Structured Data  What is it? Data where structure not necessarily determined in advance (often implicit.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Data Management for XML: Research Directions By: Jennifer Widom Stanford University Reviewer: Kristin Streilein.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 13-1 COS 346 Day 25.
XML Schemas Microsoft XML Schemas W3C XML Schemas.
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
Lecture 14 XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name.
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
1 COS 425: Database and Information Management Systems XML and information exchange.
Winter 2002Arthur Keller – CS 18018–1 Schedule Today: Mar. 12 (T) u Semistructured Data, XML, XQuery. u Read Sections Assignment 8 due. Mar. 14.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla
September 15, 2003Houssam Haitof1 XSL Transformation Houssam Haitof.
Unit 4 – XML Schema XML - Level I Basic.
Introduction to XML This material is based heavily on the tutorial by the same name at
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
4/20/2017.
XML files (with LINQ). Introduction to LINQ ( Language Integrated Query ) C#’s new LINQ capabilities allow you to write query expressions that retrieve.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Lecture 15 XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Lecture 21 XML querying. 2 XSL (eXtensible Stylesheet Language) In HTML, default styling is built into browsers as tag set for HTML is predefined and.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
1 XML Schemas. 2 Useful Links Schema tutorial links:
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
XSLT for Data Manipulation By: April Fleming. What We Will Cover The What, Why, When, and How of XSLT What tools you will need to get started A sample.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
CSCE 520- Relational Data Model Lecture 2. Relational Data Model The following slides are reused by the permission of the author, J. Ullman, from the.
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Processing of structured documents Spring 2003, Part 7 Helena Ahonen-Myka.
XPath. Why XPath? Common syntax, semantics for [XSLT] [XPointer][XSLT] [XPointer] Used to address parts of an XML document Provides basic facilities for.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
Waqas Anwar Next SlidePrevious Slide. Waqas Anwar Next SlidePrevious Slide XML XML stands for EXtensible Markup Language.
Database Systems Part VII: XML Querying Software School of Hunan University
An OO schema language for XML SOX W3C Note 30 July 1999.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
An Introduction to XML Sandeep Bhattaram
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
More XML: semantics, DTDs, XPATH February 18, 2004.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
Chapter 3 Part II Describing Syntax and Semantics.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
XML and Database.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
CSCE 520- Relational Data Model Lecture 2. Oracle login Login from the linux lab or ssh to one of the linux servers using your cse username and password.
XPath --XML Path Language Motivation of XPath Data Model and Data Types Node Types Location Steps Functions XPath 2.0 Additional Functionality and its.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name value pair;
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
XML: Extensible Markup Language
XML QUESTIONS AND ANSWERS
eXtensible Markup Language (XML)
CSE591: Data Mining by H. Liu
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.
Presentation transcript:

Semi-Structured Data Models By Chris Bennett

Semi-Structured Data  What is it? Data where structure not necessarily determined in advance (often implicit in data) Descriptive, not prescriptive Self-describing and flexible in structure  Where does it come from? When the data cannot (or simply is not) modeled naturally or usefully using a standard data model Merging multiple data sources, sparse user annotations, rapidly evolving schemas specific to given communities Raw data is often semi-structured Frequently a product of rapidly evolving schema  Examples HTML, XML, BibTex, Integrated data sources, etc..

Semi-Structured Data  This is great – infinite flexibility!! Is there a catch? Always a tradeoff…  In this case, retrieval and query performance can suffer greatly compared to more structured data models

Semi-Structured Data So we know what it is – how do we…  Model it? Directed labeled graphs  Query it? Many proposals, all include regular path expressions…Lorel, XML Query…  Store it? Big challenge Haystack ModelHaystack Model

Semi-Structured Data Models  What do they do? Provide a common framework In effect, they add some structure  Why? Semi-structured data often is irregular or missing, similar concepts are represented using different types, heterogeneous sets are present, or object structure is not fully Standardize information exchange Data verification (both internal and external)  Examples OEM, XML DTD, XML Schema…

OEM – Object Exchange Model  Developed at Stanford (mid 90s)  Precursor to today’s accepted semi- structured data acronyms (XML) (label, type, value, object-ID)  Main feature – self-describing  Requires a good bit of human intervention, though

Object-Oriented Model versus OEM  OEM is an information exchange model (does not specify object storage issues)  OEM is much simpler (supports object nesting…omits classes, methods, inheritance)  Uses labels in place of schema

Advantages of OEM  Simple model makes transforming and merging data simpler  Advanced features can be “emulated” (implies human intervention)  More suitable for heterogeneity  Hindsight: Extreme heterogeneity mandates more than a little human intervention without some structure

Components of OEM  Query Language OEM-QL – typical SELECT-WHERE- FROM  Translator Translates OEM-QL to specific data source and back  Mediator Collects work of translators then merges and/or combines them to make OEM structures

OEM-QL SELECT – WHERE – FROM Adaptation of SQL-like language for OO models SELECT fetch-expression FROM object WHERE condition Expressions in the SELECT and WHERE clauses use the notion of a path that describes a traversal through an object using sub-object structure and labels

OEM-QL SELECT biblio.?.topic FROM root WHERE biblio.?.internal-call-no ? - denotes match to any label  Return the topic of books where there exists an internal call number  The question mark allows the user to say that the intermediate “node” in the path through the object can be named anything

XML DTD – Document Type Definition  Let there be (a little) more structure…  DTD’s define the legal building blocks of an XML document.  It defines the document structure with a list of legal elements and/or attributes, and it can be declared inline or external to the XML document.

XML DTD Example <!DOCTYPE note [ ]>

XML DTD Advantages  An application can use a standard DTD to verify that data you receive from the outside world is valid.  It is flexible enough so that you can nest: + -- at least one occurrence * -- zero or more occurrences ? – zero or one occurrence Example:

DTD Drawbacks  What about constraints?? DTD’s do not offer much help in constraining the value of a particular attribute or element (only on the use of markup)  Automated processing of XML documents requires more rigorous and comprehensive facilities in this area.  Requirements are for constraints on how the component parts of an application fit together, the doc structure, attributes, data-typing, and so on.

XML Schema Well formatted is not enough! Let there be more structure!  XML Schema is an XML-based alternative (and ultimate successor) to DTD’s  They express shared vocabularies and allow machines to carry out rules made by people.  They provide a means for defining the structure, content and semantics of XML documents

Successor to DTD’s  XML Schema: Extensible to future additions Richer and more useful than DTD’s Written in XML Support data types Support namespaces

XML Schema Advantages  Better validation, restriction, and type conversion  Extensible – reuse, modify existing data types, reference multiple schemes

XML Schema Details Defines…  Elements that can appear in a document  Attributes that can appear in a document  Which elements are child elements  Order of child elements  Number of child elements  Whether an element is empty or can include test  Data types for elements and attributes  Default and fixed values for elements and attributes

XML Schema Components Primary components,:  Simple type definitions, Complex type definitions, attribute declarations, and elements declarations The secondary components, which must have names, are as follows:  Attribute group definitions, Identity-constraint definitions, Model group definitions, and Notation declarations Finally, the "helper" components provide small parts of other components; they are not independent of their context:  Annotations, Model groups, Particles, Wildcards, Attribute Uses

XML Namespaces (W3C Documentation) (W3C Documentation)  Collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names  XML namespaces differ from the "namespaces" conventionally used in computing disciplines in that the XML version has internal structure and is not, mathematically speaking, a set

XML Schema Example W3C XML Schema PrimerW3C XML Schema Primer (examples)

Querying Semi-Structured Data  Keys: Semi-structured data modeled on directed graphs User cannot have full knowledge of data structure, but we should exploit what structure we do know exists  Examples Lorel  Developed at Stanford (1997) as part of the Lore (lightweight object repository) project XPath  W3C standard  Language for addressing parts of an XML document

Lore System Stanford Link Stanford Link  Successor to OEM  Fully functional DBMS for XML with: Declarative query language, multiple indexing techniques, a cost-based query optimizer, multi-user support, logging, and recoveryquery languageindexingquery optimizer  Novel features include: DataGuides, DataGuides Management of external dataexternal data Proximity search. Proximity search

Lore – Novel Features  DataGuides Structural summary of all paths in that database Used by query optimizer to exploit known structure  Manage External Data  Proximity Search Ranks database objects based on their proximity to other objects Measure proximity based on distances in the graph linking the objects together

Lorel – Lore Query Language  Based on OQL  Provides powerful path traversal operators  Makes extensive use of type coercion to help yield "intuitive" results for all queries over XML data Permits flexible form of declarative navigational access Particularly suited to when details of structure are not known

Lorel – Coercion Rules ValueAtomic ObjectSet of ObjectsComplex Object ValuecoerceDereferenceExistential with == False Atomic ObjectExistential with == False Set of ObjectsExistential with == on both sides False Complex Object Value =

Lorel Example Find the names and zip codes of all “cheap” restaurants select Guide.restaurant.name, Guide.restaurant.(.address)?.zipcode where Guide.restaurant.% grep “cheap” - The ? after.address means the address is optional in the path expression - The % will match any subobject of restaurant - Comparison operator grep returns true if string “cheap” appears anywhere in the subobject value

Lorel – Another example select X.name from John.name JN, John.child X, X.name XN where JN == XN  “Retrieve the children of John bearing his name”  == expects atomic values so they are coerced Rewritten: select X.name from John.child X where John.name == X.name

Lorel – Constructing Results  S-F-W in Lorel has same semantics as SQL: results are a bag (multiset) or a set if ‘distinct’ is used  Results is always a collection of OEM objects (elimination by OID)  For each assignment of the variables in the from clause that passes the condition of the where clause, a value is generated according to the expressions in the select clause  Results could refer to database objects or could refer to new objects created by coercion

Lorel – Data Updates  Create and delete database names Delete is implicit when object becomes unreachable  Create a new atomic or complex object  Modify the value of an existing atomic or complex object  Bulk load an OEM database

Lorel – Updates cont’d…  Assigning names to objects Name myFavorite := element (select Guide.Restaurant where Guide.Restaurant.name = “Saigon”)  Creating objects new_oem (int, 5) new_oem (complex, struct(a:{new_oem(int,5)}, b:{X,Y}))

XPath Features  XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax  Provides basic facilities for manipulation of strings, numbers and booleans  XPath uses a compact, non-XML syntax to facilitate use of XPath within URIs and XML attribute values

XPath – How It Works W3C XPath Information W3C XPath Information  XPath models an XML document as a tree of nodes Root nodes, element nodes, text nodes, attribute nodes, namespace nodes, processing instruction nodes, comment nodes  Evaluation occurs with respect to a “context” which consists of: a node (the context node) a pair of non-zero positive integers (the context position and the context size) a set of variable bindings a function library the set of namespace declarations in scope for the expression

XQuery – How It Works  Location path – selects a set of nodes relative to the context node  An expression that is a location path results in a node set Examples of location paths  Includes functions for node sets, strings, numbers, etc…

XPath – Generic Example Simple: Selects all the employee children of the context node that have both a secretary attribute and an assistant attribute W3C School Examples