The Mediation of Information using Xml project BY:Amir Atauna & Michael Brautbar.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Chapter 3 – Web Design Tables & Page Layout
XML: Extensible Markup Language
1 XSLT – eXtensible Stylesheet Language Transformations Modified Slides from Dr. Sagiv.
An Introduction to XML Based on the W3C XML Recommendations.
The Web Warrior Guide to Web Design Technologies
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
MP IP Strategy Stateye-GUI Provided by Edotronik Munich, May 05, 2006.
© Janice Regan, CMPT 102, Sept CMPT 102 Introduction to Scientific Computer Programming The software development method algorithms.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Xyleme A Dynamic Warehouse for XML Data of the Web.
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
Aki Hecht Seminar in Databases (236826) January 2009
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Tutorial 16 Working with Dynamic Content and Styles.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
1 COS 425: Database and Information Management Systems XML and information exchange.
ebis/etat/ebuy/xdia Joint Effort ebis/etat/ebuy/xdia Joint Effort2 Introduction Extensible Markup language XML SCHEMA DTD.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302
1 Chapter 2 Reviewing Tables and Queries. 2 Chapter Objectives Identify the steps required to develop an Access application Specify the characteristics.
1 Chapter 20 — Creating Web Projects Microsoft Visual Basic.NET, Introduction to Programming.
4/20/2017.
Main challenges in XML/Relational mapping Juha Sallinen Hannes Tolvanen.
DHTML. What is DHTML?  DHTML is the combination of several built-in browser features in fourth generation browsers that enable a web page to be more.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Database-Driven Web Sites, Second Edition1 Chapter 8 Processing ASP.NET Web Forms and Working With Server Controls.
Overview of Previous Lesson(s) Over View  ASP.NET Pages  Modular in nature and divided into the core sections  Page directives  Code Section  Page.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Lecture 21 XML querying. 2 XSL (eXtensible Stylesheet Language) In HTML, default styling is built into browsers as tag set for HTML is predefined and.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
COMPUTER PROGRAMMING Source: Computing Concepts (the I-series) by Haag, Cummings, and Rhea, McGraw-Hill/Irwin, 2002.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
1 MySQL and phpMyAdmin. 2 Navigate to and log on (username: pmadmin)
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
1 Web-Enabled Decision Support Systems Objects and Procedures Don McLaughlin IE 423 Design of Decision Support Systems (304)
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
San Diego Supercomputer Center University of California, San Diego The MIX Project Native XML Database XML View(s) Wrappers export: 1. Schemas & Metadata.
Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content.
A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector.
Management Information Systems MS Access MS Access is an application software that facilitates us to create Database Management Systems (DBMS)
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
Microsoft ® Office Excel 2003 Training Using XML in Excel SynAppSys Educational Services presents:
Graphical Enablement In this presentation… –What is graphical enablement? –Introduction to newlook dialogs and tools used to graphical enable System i.
XML Access Control Koukis Dimitris Padeleris Pashalis.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Kevin D. Munroe Bertram Ludäscher Yannis Papakonstantinou.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
Chapter 5 Introduction To Form Builder. Lesson A Objectives  Display Forms Builder forms in a Web browser  Use a data block form to view, insert, update,
® IBM Software Group © 2007 IBM Corporation Module 1: Getting Started with Rational Software Architect Essentials of Modeling with IBM Rational Software.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML Extensible Markup Language
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
DHTML.
XML: Extensible Markup Language
Querying and Transforming XML Data
DB Implementation: MS Access Forms
DB Implementation: MS Access Forms
Tutorial 7 – Integrating Access With the Web and With Other Programs
Presentation transcript:

The Mediation of Information using Xml project BY:Amir Atauna & Michael Brautbar

What is a Mediator and Why is it Needed? Huge quantity of information on the web. Huge quantity of information on the web. Users wants to find information on the web that is related to their problem. Users wants to find information on the web that is related to their problem. Problem: The information is distributed across many sources, each source provides a different interface and exports the data in a different format. Problem: The information is distributed across many sources, each source provides a different interface and exports the data in a different format.

Mediator systems will assist the users by providing them integrated views of the data they are interested in. Mediator systems will assist the users by providing them integrated views of the data they are interested in. Example: a Web-shopping mediator will provide to the Web value-shopper a view where the lowest prices for each product are provided. Example: a Web-shopping mediator will provide to the Web value-shopper a view where the lowest prices for each product are provided. The goal of MIX is to facilitate the development of such mediators. The goal of MIX is to facilitate the development of such mediators.

Is the mediator concept new? No, the TSIMMIS mediator uses the semistructured model OEM (Object Exchange Model). No, the TSIMMIS mediator uses the semistructured model OEM (Object Exchange Model). Wrappers export the source data translated to OEM. Wrappers export the source data translated to OEM. The mediator export an integrated view of the wrapper data based on a view definition provided by the administrator. The mediator export an integrated view of the wrapper data based on a view definition provided by the administrator.

The view definition is expressed in the Mediator Specification Language (MSL). The view definition is expressed in the Mediator Specification Language (MSL). At runtime the mediator receives queries, which refer to the view objects and expressed in MSL. At runtime the mediator receives queries, which refer to the view objects and expressed in MSL. First, the incoming query is combined with the view definition into a query which refers directly to source data. First, the incoming query is combined with the view definition into a query which refers directly to source data. Then the optimizer finds a plan to execute the latter query by sending queries to the wrappers and combining their results in the mediator. Then the optimizer finds a plan to execute the latter query by sending queries to the wrappers and combining their results in the mediator.

The wrappers translate the queries they receive into queries understood by the sources. The wrappers translate the queries they receive into queries understood by the sources. The MSL specifications can be very “loose” on the amount of info they provide on the structures they provide. The MSL specifications can be very “loose” on the amount of info they provide on the structures they provide. This is a valuable feature when working with dynamic semistructured sources. This is a valuable feature when working with dynamic semistructured sources. There are two weak points: There are two weak points: - The user does not know the structure ot the underlying data and this impedes his efforts to formulate a reasonable queries. - The user does not know the structure ot the underlying data and this impedes his efforts to formulate a reasonable queries.

Second - the mediator may not have complete or any information of the metadata and structure of each source and this leads to a heavy loss of performance MIX solves this problems with DTDs MIX solves this problems with DTDs

The Philosophy of MIX: The Web as a Distributed Database The developer of this system strongly believe that the Web will emerge as a distributed database and XML (or some extension/modification of XML) will be the data model of this huge database. The developer of this system strongly believe that the Web will emerge as a distributed database and XML (or some extension/modification of XML) will be the data model of this huge database. The MIX mediator views XML as a database model and uses the mediator concept as known in the DB area. The MIX mediator views XML as a database model and uses the mediator concept as known in the DB area.

Sources will be exporting an XML view of their data along with semantic descriptions of the content (Source DTDs) and descriptions of the interfaces (XML queries) that may be used for accessing the data. Sources will be exporting an XML view of their data along with semantic descriptions of the content (Source DTDs) and descriptions of the interfaces (XML queries) that may be used for accessing the data. Users and applications will then be able to query these view documents using some XML query language. Users and applications will then be able to query these view documents using some XML query language. The MIX mediator uses the source DTDs to assist the user in query formulation and the query processors in running queries more efficiently. The MIX mediator uses the source DTDs to assist the user in query formulation and the query processors in running queries more efficiently.

MIX’s query evaluation is done in a lazy approach (on demand), i.e. XML queries (expressed in XMAS) are unfolded and rewritten at runtime. MIX’s query evaluation is done in a lazy approach (on demand), i.e. XML queries (expressed in XMAS) are unfolded and rewritten at runtime. In the other approach, the eager (warehousing), the data integration occurs in a separate materialization step, before the actual user queries. In the other approach, the eager (warehousing), the data integration occurs in a separate materialization step, before the actual user queries.

Conventional data repositories are not expected to be converted to XML. Conventional data repositories are not expected to be converted to XML. Wrappers technologies that allow us to logically view an information source (which may be a relational database, a collection of html pages, or even a legacy information system) as a large XML source. Wrappers technologies that allow us to logically view an information source (which may be a relational database, a collection of html pages, or even a legacy information system) as a large XML source. The wrappers are able to translate XMAS queries into queries or commands that the underlying source understands. The wrappers are able to translate XMAS queries into queries or commands that the underlying source understands. They are also able to translate the result of the source into XML. They are also able to translate the result of the source into XML.

Creating Mediated Views Using MIX mediator and Querying them with BBQ The XML documents have to be integrated. The XML documents have to be integrated. One goal of MIX is to develop integrated views and fast. One goal of MIX is to develop integrated views and fast. For this the developers use XMAS as the view definition language. For this the developers use XMAS as the view definition language.

The BBQ (Blended Browsing and Querying ) user interface enables the users to formulate XMAS queries using a GUI that reminds of query-by- example interfaces in relational database The BBQ (Blended Browsing and Querying ) user interface enables the users to formulate XMAS queries using a GUI that reminds of query-by- example interfaces in relational database

The MIX Architecture

The graphical user interface BBQ allows the construction of queries. The graphical user interface BBQ allows the construction of queries. In order to accomplish the integration, the MIX mediator comprises several modules. In order to accomplish the integration, the MIX mediator comprises several modules. - Its main inputs are XMAS queries generated by the BBQ, and the mediator view definition (also in XMAS) for the integrated view. - Its main inputs are XMAS queries generated by the BBQ, and the mediator view definition (also in XMAS) for the integrated view. - The resolution module resolves the user query with the mediator view definition, resulting in a set of unfolded XML queries that refer to the wrapper views. - The resolution module resolves the user query with the mediator view definition, resulting in a set of unfolded XML queries that refer to the wrapper views.

- The simplification module is used to further simplify the XML queries based on the underlying XML DTDs. - The simplification module is used to further simplify the XML queries based on the underlying XML DTDs. - The DTD inference module can be used to automatically derive view DTDs from source DTDs and queries for supporting the integration task of the mediation engineer (This is done off- line). - The DTD inference module can be used to automatically derive view DTDs from source DTDs and queries for supporting the integration task of the mediation engineer (This is done off- line). - The translation module maps the simplified queries into the XMAS algebra. - The translation module maps the simplified queries into the XMAS algebra.

- The optimization module can be used to further optimize the XMAS queries. - The optimization module can be used to further optimize the XMAS queries. - The execution engine issues XMAS queries against the wrappers, and returns the requested XML data to the user, after integrating the retrieved data according to the mediator view. - The execution engine issues XMAS queries against the wrappers, and returns the requested XML data to the user, after integrating the retrieved data according to the mediator view. The wrappers are used to export data in a uniform format to the mediator

The XMAS Language The data model of the sources of the mix mediator are valid XML docs The data model of the sources of the mix mediator are valid XML docs We need a way to formulate queries that can relate to data in multiple XML docs We need a way to formulate queries that can relate to data in multiple XML docs XML document structure may be tightly structured as in a relational databases or to have no structure at all XML document structure may be tightly structured as in a relational databases or to have no structure at all

The XMAS Language Cont So we need a query language that is as strong as relational algebra So we need a query language that is as strong as relational algebra Preferable features of the language : Preferable features of the language : Simple formulation of queries Simple formulation of queries Will logically describe what we want to say Will logically describe what we want to say

Solution : XMAS XMAS stands for XML matching and structuring language XMAS stands for XML matching and structuring language Declarative,high level language Declarative,high level language Build upon ideas of languages like XML - QL, MSL. Build upon ideas of languages like XML - QL, MSL.

General Structure Of An XMAS Query CONSTRUCT head WHERE body 1 IN source 1 (AND |OR |NOT ) body 2 IN source 2 (AND |OR |NOT ) body 3 IN source 3... (AND |OR |NOT ) body n IN source n (AND |OR) predicate CONSTRUCT head WHERE body 1 IN source 1 (AND |OR |NOT ) body 2 IN source 2 (AND |OR |NOT ) body 3 IN source 3... (AND |OR |NOT ) body n IN source n (AND |OR) predicate

Body (the “where” clause) : specifies the data which is to be extracted from the XML sources Body (the “where” clause) : specifies the data which is to be extracted from the XML sources Head (the “construct” clause) : describes how the extracted data is arranged into a new answer XML document. In this part we may use the “collection” operator and the “ordering” operator. (Will be explained later on) Head (the “construct” clause) : describes how the extracted data is arranged into a new answer XML document. In this part we may use the “collection” operator and the “ordering” operator. (Will be explained later on) ( Body and head roughly resembles the select and where in SQL) ( Body and head roughly resembles the select and where in SQL)

Predicate : defines conditions on the variables occurring in the sources Predicate : defines conditions on the variables occurring in the sources Lets look at an example Lets look at an example

For Example We Can Have The Following XML Doc For That DTD alpine rural/town alpine rural/town 4783 … alpine rural/town alpine rural/town 4783 …

Query Example Suppose we want to retrieve all names of “big” neighborhoods,say where population is greater than Suppose we want to retrieve all names of “big” neighborhoods,say where population is greater than In XMAS we can write the following query: In XMAS we can write the following query:

Construct Construct $n $n {$N} {$N} Where Where $n $n $p $p IN " IN " And $p>30000 And $p>30000

How Does It Work Lets look at the body of the query above. This tree pattern mimics the tree structure of the input XML document Lets look at the body of the query above. This tree pattern mimics the tree structure of the input XML document The variables $N and $P are used to “get a hold” of the data at the corresponding locations in the tree structure representing the input XML doc. In other words, the tree pattern specifies that : the root element of the XML doc is of type big_neighborhoods The variables $N and $P are used to “get a hold” of the data at the corresponding locations in the tree structure representing the input XML doc. In other words, the tree pattern specifies that : the root element of the XML doc is of type big_neighborhoods

Within big_neighborhoods there must be some big_neighborhood subelement,which itself contain name and population subelements Within big_neighborhoods there must be some big_neighborhood subelement,which itself contain name and population subelements In this way, the tree pattern specifies a list of pairs of variable bindings for $N and $P In this way, the tree pattern specifies a list of pairs of variable bindings for $N and $P From this list we want to select only those which satisfy the condition $P > From this list we want to select only those which satisfy the condition $P > To summarize, the body defines a list [(n1; p1);...; (nk; pk)] of all variable bindings for ($N,$P), which match (or satisfy) the body To summarize, the body defines a list [(n1; p1);...; (nk; pk)] of all variable bindings for ($N,$P), which match (or satisfy) the body

The “head” consists of an XML tree pattern which contains some or all the of the variables of the body The “head” consists of an XML tree pattern which contains some or all the of the variables of the body In the example above, the head define a root element big_neighborhoods with a big_neighborhood subelement, having in turn a name subelement. The latter is used to hold the bindings for $N which have been obtained through the body In the example above, the head define a root element big_neighborhoods with a big_neighborhood subelement, having in turn a name subelement. The latter is used to hold the bindings for $N which have been obtained through the body Using {$N} expresses that we want to have only one big_neighborhoods element that has a number of big_neighborhood subelements. (One for each name $N obtained from the body) Using {$N} expresses that we want to have only one big_neighborhoods element that has a number of big_neighborhood subelements. (One for each name $N obtained from the body)

The Collection Operator Is used to collect all binding of the subelemnt to be put under the father element Is used to collect all binding of the subelemnt to be put under the father element Has two kinds : implicit and explicit Has two kinds : implicit and explicit The usage for the explicit version is {$N} where $N is a free variable in that level The usage for the explicit version is {$N} where $N is a free variable in that level For example (of the explicit usage), consider the previous example For example (of the explicit usage), consider the previous example

The Collection Operator Cont We create exactly one big neighborhood element for each binding n 1 ;...; n k of $N (thereby biding the value of $N within the big neighborhood element to one n i ), and all these elements are collected as subelements of the parent element We create exactly one big neighborhood element for each binding n 1 ;...; n k of $N (thereby biding the value of $N within the big neighborhood element to one n i ), and all these elements are collected as subelements of the parent element

The Collection Operator Cont For elements in the head which do not have an explicit collection label, an implicit collection label may be used For elements in the head which do not have an explicit collection label, an implicit collection label may be used The implicit collection variables of an element E are those which are free in E The implicit collection variables of an element E are those which are free in E The usage for the explicit version is [... ] where ‘[ ‘ is before the beginning of the section and ‘]’ is at it’s end The usage for the explicit version is [... ] where ‘[ ‘ is before the beginning of the section and ‘]’ is at it’s end

The Collection Operator Cont For example consider the following code [ $A [ $B [ $C ] ] ] For example consider the following code [ $A [ $B [ $C ] ] ] The above corresponds to a nested loop structure The above corresponds to a nested loop structure

The Ordering Operator All subelemnts binding may be ordered by a given order All subelemnts binding may be ordered by a given order If no order is specified a default order is used.(Based on the order in which the data was found) If no order is specified a default order is used.(Based on the order in which the data was found) Example :consider the next DTD and the given query after it Example :consider the next DTD and the given query after it

And the query is: CONSTRUCT { $H} order by $H.Price WHERE $H IN " And the query is: CONSTRUCT { $H} order by $H.Price WHERE $H IN "

So,Mmm,Is XMAS So Powerful ? Home buyer's scenario. A user who wants to buy a home. he wants to make use of information available from the web to guide this decision. A possible query that the user may issue is: find all houses with 3 bedrooms, 2 baths, interior area at least 1600 sq.Ft., Priced between $ 250k and $ 350k, in regions where the school rating is at least 70 (out of 100) and the crime rate is no more than 15 incidents per year. Group the answers by region and order them by price. For each home also show the nearby schools." Home buyer's scenario. A user who wants to buy a home. he wants to make use of information available from the web to guide this decision. A possible query that the user may issue is: find all houses with 3 bedrooms, 2 baths, interior area at least 1600 sq.Ft., Priced between $ 250k and $ 350k, in regions where the school rating is at least 70 (out of 100) and the crime rate is no more than 15 incidents per year. Group the answers by region and order them by price. For each home also show the nearby schools."

Strong As Relational Algebra As mentioned before, one of the features of XMAS is that it is as expressive as relational algebra. some examples for this : As mentioned before, one of the features of XMAS is that it is as expressive as relational algebra. some examples for this : Selection : selection on a variable is made in the ‘predicate’ part of the query: Selection : selection on a variable is made in the ‘predicate’ part of the query: Projection: write in the head just those variable that you want to project Projection: write in the head just those variable that you want to project

A natural join can be obtained by equating variables in the body A natural join can be obtained by equating variables in the body Cartesian product may also be expressed easily Cartesian product may also be expressed easily

CONSTRUCT $N $S {$N, $S} WHERE $N: $Z IN " AND $S: $Z1 IN " AND $Z=$Z1 Cartesian product is easily expressed by removing the condition $Z=$Z1

Merry XMAS

DTD Inference

The MIX mediator and the advantages of living with DTD- provided structure The MIX mediator employs DTDs to assist the user in information discovery, query formulation and to allow the query processor to derive more efficient plans. The MIX mediator employs DTDs to assist the user in information discovery, query formulation and to allow the query processor to derive more efficient plans. The view DTD inference module derive view DTD given the source DTDs and the view. The view DTD inference module derive view DTD given the source DTDs and the view.

The view DTD is passed to the DTD-based query interface to enable query formulation. The view DTD is passed to the DTD-based query interface to enable query formulation. A DTD inference algorithms developed for a limited class of XMAS queries/views. A DTD inference algorithms developed for a limited class of XMAS queries/views. - pick-elements XMAS queries, i.e., queries whose SELECT clause has a single variable, called pick-variable, that binds to elements and WHERE clause consists of a single condition that is applied to only one source. - pick-elements XMAS queries, i.e., queries whose SELECT clause has a single variable, called pick-variable, that binds to elements and WHERE clause consists of a single condition that is applied to only one source.

It is easy to compute a loose DTD for a view but it is critical to the query interface and the query processor to get the one that describe the view as precisely as possible. It is easy to compute a loose DTD for a view but it is critical to the query interface and the query processor to get the one that describe the view as precisely as possible.

Also “precise” view DTDs may have other applications than ours, for example, it may be used as a toolkit for generating XSL style sheets for presentation of the view. Also “precise” view DTDs may have other applications than ours, for example, it may be used as a toolkit for generating XSL style sheets for presentation of the view. A criterion for judging the precision of a view DTD is tightness. A criterion for judging the precision of a view DTD is tightness. A DTD d1 is tighter then a DTD d2 if every document described by d1 also described by d2. A DTD d1 is tighter then a DTD d2 if every document described by d1 also described by d2. The tightness criterion can be a benchmark for other powerful view definition languages and view inference algorithms. The tightness criterion can be a benchmark for other powerful view definition languages and view inference algorithms.

So the view DTD inference algorithm attempts to derive to tightest DTD that contains all the possible documents that may appear as the content of the view. So the view DTD inference algorithm attempts to derive to tightest DTD that contains all the possible documents that may appear as the content of the view. Unfortunately, even the tightest view DTD describes structures that can never appear as the view’s content. Unfortunately, even the tightest view DTD describes structures that can never appear as the view’s content. For this the view DTD inference algorithm derive an extended form of DTDs that typically does not have non-tightness problems known as Specialized DTDs. For this the view DTD inference algorithm derive an extended form of DTDs that typically does not have non-tightness problems known as Specialized DTDs.

Model and Query Language Framework The focus is on XML documents that meet the following requirements: The focus is on XML documents that meet the following requirements: - XML always valid i.e. Have a DTD. - XML always valid i.e. Have a DTD. - There are no other attributes than the ID attribute and all elements have an ID attribute. - There are no other attributes than the ID attribute and all elements have an ID attribute. - There are no empty elements but elements with empty content are allowed. - There are no empty elements but elements with empty content are allowed. - Mix content elements are not allowed i.e elements whose content mixes strings with elements - Mix content elements are not allowed i.e elements whose content mixes strings with elements

Definition:Element - An element e is a triplet consisting a name, name(e), a unique ID and content, content(e) which is a sequence of elements or PCDATA value. Definition:Element - An element e is a triplet consisting a name, name(e), a unique ID and content, content(e) which is a sequence of elements or PCDATA value. Definition:A DTD is a set { | n is in N} where N is the set of names and type(n) is either a regular expression over N or PCDATA. Definition:A DTD is a set { | n is in N} where N is the set of names and type(n) is either a regular expression over N or PCDATA. L(r) is the regular language described by r. L(r) is the regular language described by r.

Definition:An element e satisfies a DTD D, e |= D, if the following conditions exist: Definition:An element e satisfies a DTD D, e |= D, if the following conditions exist: - name(e) is in N where N is the set of element names - name(e) is in N where N is the set of element names - if content(e) = e 1,e 2,...,e m then name(e 1 )... Name(e m ) are in L(type(name(e)) and e i |= D 1<=i<=m. - if content(e) = e 1,e 2,...,e m then name(e 1 )... Name(e m ) are in L(type(name(e)) and e i |= D 1<=i<=m. Else if content(e) is a string then type(name(e))=PCDATA. Else if content(e) is a string then type(name(e))=PCDATA.

Soundness & Tightness Definition:A view DTD D V is sound if, given source DTDs D 1,D 2,...,D n and a view definition V, for every tuple (d 1,d 2,...,d n ) of n documents such that d 1 |= D 1,d 2 |= D 2,...,d n |= D n the view document V(d 1,d 2,...,d n ) |= D V Definition:A view DTD D V is sound if, given source DTDs D 1,D 2,...,D n and a view definition V, for every tuple (d 1,d 2,...,d n ) of n documents such that d 1 |= D 1,d 2 |= D 2,...,d n |= D n the view document V(d 1,d 2,...,d n ) |= D V Definition:A DTD D is tighter then a DTD D’ if every document satisfying D satisfies D’. Definition:A DTD D is tighter then a DTD D’ if every document satisfying D satisfies D’. A type is tighter then a type if L(r) is contained in L(r’). A type is tighter then a type if L(r) is contained in L(r’).

Definition: A DTD D V is a tightest view DTD for given source DTDs D 1,D 2,...,D n and a view definition V is there is no view DTD D V ’ such that D V ’ tighter than D V. Definition: A DTD D V is a tightest view DTD for given source DTDs D 1,D 2,...,D n and a view definition V is there is no view DTD D V ’ such that D V ’ tighter than D V.

Structural Tightness In many practical cases even the tightest view DTDs describe view document structures that cannot be produced by the view. In many practical cases even the tightest view DTDs describe view document structures that cannot be produced by the view. This information loss phenomenon is formalized by introducing the structural tightness property of view DTDs. This information loss phenomenon is formalized by introducing the structural tightness property of view DTDs.

Definition: A structural class of documents is a set of documents such that for every two documents d 1,d 2 in the class there is a mapping that maps: Definition: A structural class of documents is a set of documents such that for every two documents d 1,d 2 in the class there is a mapping that maps: - every string of d 1 on a string of d 2 and vice versa. - every string of d 1 on a string of d 2 and vice versa. - every id of d 1 into an id of d 2 and vice versa - every id of d 1 into an id of d 2 and vice versa - if the mappings are applied to d 1, d 1 becomes identical to d 2 and vice versa - if the mappings are applied to d 1, d 1 becomes identical to d 2 and vice versa

Definition: A structural class of documents satisfies a DTD D if the documents of the class satisfy D. Definition: A structural class of documents satisfies a DTD D if the documents of the class satisfy D. Definition: Given a set of sources DTDs D 1,…,D n Definition: Given a set of sources DTDs D 1,…,D n and a view V, a DTD D V is structurally tight if: - it is the tightest DTD of the view given the source DTDs - it is the tightest DTD of the view given the source DTDs - for every structural class S that satisfies D V there is a view document I that satisfies D V and there are also source documents I 1,…,I n, satisfying D 1,…,D n and I = V(I 1,…,I n ). - for every structural class S that satisfies D V there is a view document I that satisfies D V and there are also source documents I 1,…,I n, satisfying D 1,…,D n and I = V(I 1,…,I n ).

Specialized DTDs Specialized DTDs resolve the inherent non- tightness problems of DTDs Specialized DTDs resolve the inherent non- tightness problems of DTDs Query: Find all the “professor” and “grad” sub- elements of “department” with one journal publication. Query: Find all the “professor” and “grad” sub- elements of “department” with one journal publication.

How specialized DTDs are computed? The DTD tightening algorithm recursively “tightens” each type of the initial DTD by means of the type refinement algorithm. The DTD tightening algorithm recursively “tightens” each type of the initial DTD by means of the type refinement algorithm. Definition: The type refinement refine(r,n) of a regular expression r given a name n is the regular expression r’ that describes all strings L(r) that contain at least one instance of n. Definition: The type refinement refine(r,n) of a regular expression r given a name n is the regular expression r’ that describes all strings L(r) that contain at least one instance of n.

Converting s-DTDs to DTDs First we obtain the images of all types of the s- DTDs. First we obtain the images of all types of the s- DTDs. Then we merge all images that have the same name. Then we merge all images that have the same name.

Schema Inference Algorithm Refinement Refinement - Tightens individual types - Tightens individual types Specialization Specialization - uses the refinement algorithm and tightens the whole input document. - uses the refinement algorithm and tightens the whole input document. Result List Type Inference. Result List Type Inference. - Discovers the names and order of the types that appear in the result. - Discovers the names and order of the types that appear in the result.

Future Work Powerful Query Languages Powerful Query Languages - group-by, nest, navigation using recursive paths in the vertical and horizontal direction, check order, manipulate order. - group-by, nest, navigation using recursive paths in the vertical and horizontal direction, check order, manipulate order. More powerful/flexible schema descriptions More powerful/flexible schema descriptions - XML-Data, DCDs, many academic proposals - XML-Data, DCDs, many academic proposals Conditions for existence of tight/tightest DTDs. Conditions for existence of tight/tightest DTDs. Other quality metrics for a view DTD. Other quality metrics for a view DTD.

The BBQ application introduction BBQ stand for “ Blended Browsing and Querying” - a graphical user interface for browsing and querying XML data sources. BBQ stand for “ Blended Browsing and Querying” - a graphical user interface for browsing and querying XML data sources. There are very few visual interfaces for querying and browsing semistructured data, and fewer for XML. There are very few visual interfaces for querying and browsing semistructured data, and fewer for XML.

introduction cont. BBQ support query refinement by having query results be sources used in subsequent queries. Users can construct a query result document (essentially a virtual view) and that document becomes a first-class data source within BBQ, meaning it can be browsed, queried, or used to construct another query result document. BBQ support query refinement by having query results be sources used in subsequent queries. Users can construct a query result document (essentially a virtual view) and that document becomes a first-class data source within BBQ, meaning it can be browsed, queried, or used to construct another query result document.

introduction cont. This is quiet useful if the user does not know, in advance, what exactly he is looking for. This is quiet useful if the user does not know, in advance, what exactly he is looking for. The interface allows users to quickly create complex queries without writing XMAS syntax by hand. The interface allows users to quickly create complex queries without writing XMAS syntax by hand. BBQ displays the structure of multiple data sources using a paradigm that resembles drilling-down in Windows’ director structures. BBQ displays the structure of multiple data sources using a paradigm that resembles drilling-down in Windows’ director structures.

Data Source XML Data Source MixMediator Blended Browsing and Querying (BBQ) interface Wrapper Computational Source

The BBQ interface BBQ,which is XML driven, uses a set of DTDs exported by the MIX mediator. They will be referred from now on as base DTDs BBQ,which is XML driven, uses a set of DTDs exported by the MIX mediator. They will be referred from now on as base DTDs The BBQ interface consists of one main window and zero or more floating windows. The main window contains a of toolbar, a split pane, and a message console, while the floating windows contain a toolbar and split pane only. The BBQ interface consists of one main window and zero or more floating windows. The main window contains a of toolbar, a split pane, and a message console, while the floating windows contain a toolbar and split pane only.

From now on we will use the following DTDs which will represent the base DTDs. From now on we will use the following DTDs which will represent the base DTDs. ]> ]>

BBQ power : selecting and browsing XML source DTD and data The DTDs are represented as trees in the obvious hierarchical manner: an element name is a parent node, and that element’s sub-elements are its children The DTDs are represented as trees in the obvious hierarchical manner: an element name is a parent node, and that element’s sub-elements are its children BBQ features special tree nodes to represent XML DTD's structural operators such as the choice and the seq(uence). BBQ features special tree nodes to represent XML DTD's structural operators such as the choice and the seq(uence).

These special tree nodes give the user a more accurate view of the DTD's structure than other semistructured-data viewing systems, and they also facilitate more complex queries. These special tree nodes give the user a more accurate view of the DTD's structure than other semistructured-data viewing systems, and they also facilitate more complex queries. For example, a default order constraint is introduced, namely the one that corresponds to the order in which elements are listed on the screen. For example, a default order constraint is introduced, namely the one that corresponds to the order in which elements are listed on the screen.

XML data corresponding to given DTD are represented as a directory tree. XML data corresponding to given DTD are represented as a directory tree. The XML data is materialized on demand from the source. The XML data is materialized on demand from the source. The buttons labeled next and previous in the XML panel retrieve the next and previous n instances, respectively. The buttons labeled next and previous in the XML panel retrieve the next and previous n instances, respectively.

BBQ power cont. Creating XMAS Queries with BBQ A query session is the set of events that occur while BBQ is connected to the mediator. A query session is the set of events that occur while BBQ is connected to the mediator. Each query session consists of one or more query cycles. A query cycle is the set of events that starts with the user constructing a query, and ends with the user browsing the query result. Each query session consists of one or more query cycles. A query cycle is the set of events that starts with the user constructing a query, and ends with the user browsing the query result.

The basic BBQ query cycles takes place in four steps : The basic BBQ query cycles takes place in four steps : First, constraints are set on the data sources. First, constraints are set on the data sources. Second, a tree representing the query result schema is created by dragging and dropping elements. Second, a tree representing the query result schema is created by dragging and dropping elements. Third, the XMAS query is generated and submitted to the mediator. Third, the XMAS query is generated and submitted to the mediator. Fourth, a DTD is generated for the query result and the query result schema and data are displayed. Fourth, a DTD is generated for the query result and the query result schema and data are displayed.

First step: constraints set Constraints can be set on the leaf nodes of the DTD tree or XML tree. Constraints cannot be set on nonleaf nodes Constraints can be set on the leaf nodes of the DTD tree or XML tree. Constraints cannot be set on nonleaf nodes The operators are a basic set of comparators (’=’,’ =’, ’ ’, ’substr’) The operators are a basic set of comparators (’=’,’ =’, ’ ’, ’substr’)

Example The user right-clicks the degree element and selects "View/Edit Constraint...” from the popup menu. This action brings up the "View/Edit Constraint" dialog box, where “=” is selected as the operator, and “PhD” is typed in as the operand. At this point, the user clicks “OK The user right-clicks the degree element and selects "View/Edit Constraint...” from the popup menu. This action brings up the "View/Edit Constraint" dialog box, where “=” is selected as the operator, and “PhD” is typed in as the operand. At this point, the user clicks “OK

Joins can take place within a data source or across data sources. Creating a join in BBQ is as simple as selecting one leaf element, and dragging and dropping it onto another leaf elements Joins can take place within a data source or across data sources. Creating a join in BBQ is as simple as selecting one leaf element, and dragging and dropping it onto another leaf elements Suppose the user is interested in CSEStudents who are also interns, and whose advisor is also their supervisor. Suppose the user is interested in CSEStudents who are also interns, and whose advisor is also their supervisor.

Second : construct the head construct a tree that the answer document(s) must conform to, called the head or query result tree. The right panel of BBQ’s main window is where the head is built. construct a tree that the answer document(s) must conform to, called the head or query result tree. The right panel of BBQ’s main window is where the head is built. The head is composed of elements (and their sub-trees) dragged from source DTDs, and tags created on the spot with the “Create New Child” popup menu item. The head is composed of elements (and their sub-trees) dragged from source DTDs, and tags created on the spot with the “Create New Child” popup menu item. Ordering and group - by operators are also used in the creation of the head. Ordering and group - by operators are also used in the creation of the head.

Third and forth steps: BBQ converts the visual layout into XMAS query language, contacts the MIX mediator and submits the query. BBQ converts the visual layout into XMAS query language, contacts the MIX mediator and submits the query. Finally, BBQ generates a DTD for the query result and it is displayed with the corresponding data Finally, BBQ generates a DTD for the query result and it is displayed with the corresponding data

Mix mediator BBQ Interface OODB Database Xml result,DTD Query in xmas wrapper

Important things to remember about the BBQ Enable the query creator to construct queries in an easy and graphical-oriented way. Enable the query creator to construct queries in an easy and graphical-oriented way. Graphically support all the features of the XMAS query language. Graphically support all the features of the XMAS query language. Supports blended browsing and querying Supports blended browsing and querying accurate representation of DTDs and XML data. accurate representation of DTDs and XML data.

Allows graphical represantion for the query result also. Allows graphical represantion for the query result also. DTD for the result XML page of the given query is created by the DTD -inference mechanism. DTD for the result XML page of the given query is created by the DTD -inference mechanism. Because of that,we may treat the query result as any other XML source we use.( so we may use this result as one of the sources used to build new queries. Because of that,we may treat the query result as any other XML source we use.( so we may use this result as one of the sources used to build new queries.

These is usually the case when we want to get some information from the internet. We don’t know exactly what we are looking for, and the results of the first queries aim us towards the goal of our search. These is usually the case when we want to get some information from the internet. We don’t know exactly what we are looking for, and the results of the first queries aim us towards the goal of our search. Mix mediator

Selected biblography Enhancing Semistructured Data Mediators with Document Type Denitions by Yannis Papakonstantinou, Pavel Velikhov Enhancing Semistructured Data Mediators with Document Type Denitions by Yannis Papakonstantinou, Pavel Velikhov BBQ: A Visual Interface for Integrated Browsing and Querying of XML Kevin D. Munroe, Yannis Papakonstantinou BBQ: A Visual Interface for Integrated Browsing and Querying of XML Kevin D. Munroe, Yannis Papakonstantinou XML-Based Information Mediation with MIX Chaitanya Baru Amarnath Gupta Bertram Ludascher XML-Based Information Mediation with MIX Chaitanya Baru Amarnath Gupta Bertram Ludascher Introduction to XMAS by the XMAS sub-group of MIX Introduction to XMAS by the XMAS sub-group of MIX