Declaratively Producing Data Mash-ups Sudarshan Murthy 1, David Maier 2 1 Applied Research, Wipro Technologies 2 Department of Computer Science, Portland.

Slides:



Advertisements
Similar presentations
XML-XSL Introduction SHIJU RAJAN SHIJU RAJAN Outline Brief Overview Brief Overview What is XML? What is XML? Well Formed XML Well Formed XML Tag Name.
Advertisements

Advanced XSLT II. Iteration in XSLT we sometimes wish to apply the same transform to a set of nodes we iterate through a node set the node set is defined.
XML: Extensible Markup Language
A Toolbox for Blackboard Tim Roberts
1 XSLT – eXtensible Stylesheet Language Transformations Modified Slides from Dr. Sagiv.
HTML 5 and CSS 3, Illustrated Complete Unit L: Programming Web Pages with JavaScript.
The Web Warrior Guide to Web Design Technologies
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 13-1 COS 346 Day 25.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
1 COS 425: Database and Information Management Systems XML and information exchange.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302
TIBCO Designer TIBCO BusinessWorks is a scalable, extensible, and easy to use integration platform that allows you to develop, deploy, and run integration.
4/20/2017.
XML Fundementals XML vs.. HTML XML vs.. HTML XML Document (elements vs. attributes) XML Document (elements vs. attributes) XML and RDBMS XML and RDBMS.
Sheet 1XML Technology in E-Commerce 2001Lecture 6 XML Technology in E-Commerce Lecture 6 XPointer, XSLT.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
W3C XForms Interactive Web Services; Powerful Client-side Interfaces Micah Dubinko Dave Navarro David Cleary.
XML for E-commerce III Helena Ahonen-Myka. In this part... n Transforming XML n Traversing XML n Web publishing frameworks.
XSLT for Data Manipulation By: April Fleming. What We Will Cover The What, Why, When, and How of XSLT What tools you will need to get started A sample.
School of Computing and Management Sciences © Sheffield Hallam University To understand the Oracle XML notes you need to have an understanding of all these.
XP New Perspectives on XML Tutorial 6 1 TUTORIAL 6 XSLT Tutorial – Carey ISBN
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
WORKING WITH XSLT AND XPATH
XP New Perspectives on XML, 2 nd Edition Tutorial 10 1 WORKING WITH THE DOCUMENT OBJECT MODEL TUTORIAL 10.
An Introduction to XML Presented by Scott Nemec at the UniForum Chicago meeting on 7/25/2006.
DP&NM Lab. POSTECH, Korea - 1 -Interaction Translation Methods for XML/SNMP Gateway Interaction Translation Methods for XML/SNMP Gateway Using XML Technologies.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
ECA 228 Internet/Intranet Design I XSLT Example. ECA 228 Internet/Intranet Design I 2 CSS Limitations cannot modify content cannot insert additional text.
JSTL, XML and XSLT An introduction to JSP Standard Tag Library and XML/XSLT transformation for Web layout.
CITA 330 Section 6 XSLT. Transforming XML Documents to XHTML Documents XSLT is an XML dialect which is declared under namespace "
Transforming Documents „a how-to of transforming xml documents“ Lecture on Walter Kriha.
CIS 375—Web App Dev II XSL. 2 XSL Introduction XSL stands for _____________________________. XSL is the language used for manipulating and displaying.
Lecture 11 XSL Transformations (part 1: Introduction)
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
1 Introduction  Extensible Markup Language (XML) –Uses tags to describe the structure of a document –Simplifies the process of sharing information –Extensible.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
1 Overview of XSL. 2 Outline We will use Roger Costello’s tutorial The purpose of this presentation is  To give a quick overview of XSL  To describe.
Sheet 1 DocEng’03, Grenoble, November 2003 Model Driven Architecture based XML Processing Ivan Kurtev, Klaas van den Berg University of Twente, the Netherlands.
XP New Perspectives on XML, 2 nd Edition Tutorial 8 1 TUTORIAL 8 CREATING ELEMENT GROUPS.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Unit 3 — Advanced Internet Technologies Lesson 11 — Introduction to XSL.
XML Query: xQuery Reference: Xquery By Priscilla Walmsley, Published by O’Reilly.
XPath. XPath, the XML Path Language, is a query language for selecting nodes from an XML document. The XPath language is based on a tree representation.
More XML XPATH, XSLT CS 431 – February 23, 2005 Carl Lagoze – Cornell University.
 Web pages originally static  Page is delivered exactly as stored on server  Same information displayed for all users, from all contexts  Dynamic.
XSLT: How Do We Use It? Nancy Hallberg Nikki Massaro Kauffman.
Lecture 23 XQuery 1.0 and XPath 2.0 Data Model. 2 Example 31.7 – User-Defined Function Function to return staff at a given branch. DEFINE FUNCTION staffAtBranch($bNo)
XML DOM  XML Document Object Model provides a robust international standard for XML Documents.  DOM Level 1 is a Dec 11, 1998 W3C recommendation.  XML.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
1 CSC160 Chapter 1: Introduction to JavaScript Chapter 2: Placing JavaScript in an HTML File.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
XML Extensible Markup Language
Connecting to External Data. Financial data can be obtained from a number of different data sources.
XML Schema – XSLT Week 8 Web site:
1 Model Driven Health Tools Design and Implementation of CDA Templates Dave Carlson Contractor to CHIO
Rendering XML Documents ©NIITeXtensible Markup Language/Lesson 5/Slide 1 of 46 Objectives In this session, you will learn to: * Define rendering * Identify.
Apache Cocoon – XML Publishing Framework 데이터베이스 연구실 박사 1 학기 이 세영.
I Copyright © 2004, Oracle. All rights reserved. Introduction.
XML: Extensible Markup Language
Programming Web Pages with JavaScript
{ XML Technologies } BY: DR. M’HAMED MATAOUI
Database Processing with XML
Sudarshan Murthy 1, David Maier 1, Lois Delcambre 1, Shawn Bowers 2
CSE591: Data Mining by H. Liu
More XML XML schema, XPATH, XSLT
Presentation transcript:

Declaratively Producing Data Mash-ups Sudarshan Murthy 1, David Maier 2 1 Applied Research, Wipro Technologies 2 Department of Computer Science, Portland State University

31-Oct-15Declaratively Producing Data Mash-ups2 Mash-ups Web applications that combine information from multiple sources [Wikipedia] –A mash-up does not need to be a web app Data that includes or transcludes content from multiple sources In either case, a source is likely only a fragment This work is about data mash-ups –In this talk, a mash-up is an XML document

31-Oct-15Declaratively Producing Data Mash-ups3 Portland State University Campus Map 45 markers, 53 landmarks –Marker: Balloon on map –Landmark: Building, department, … Information from 188 fragments in 58 web pages Fragments selected manually

31-Oct-15Declaratively Producing Data Mash-ups4 Portland Metro Food Markets 154 markers, 154 landmarks 154 fragments harvested programmatically from 4 MS Word documents Developed for the Oregon Department of Agriculture Producing Data Mash-ups/oda-1.1/

An HTML Review Report 31-Oct-15Declaratively Producing Data Mash-ups5

31-Oct-15Declaratively Producing Data Mash-ups6 Problem Areas Development –Getting data from heterogeneous fragments –Might use a DBMS, yet code operators such as sort, join, and aggregate for external data Execution –When to get external data, how much to get? Design: Expressing that –A part comes from an external fragment –A part is data (such as page number) which cannot be “selected” in the source

31-Oct-15Declaratively Producing Data Mash-ups7 Outline Introduction The conceptual approach Sixml: Condensed mash-ups Sixml DOM: Reconstituted mash-ups Sixml Navigator: Formatted mash-ups Evaluation Summary Discussion

31-Oct-15Declaratively Producing Data Mash-ups8 Superimposed Information (SI) SI is new data and structure overlaid on existing base information Mark: A reference to an external fragment Benefits –Multiple, simultaneous organizations –Make new connections among base fragments –Preserve context Heterogeneous sources: Word, Excel, PDF, HTML,…

31-Oct-15Declaratively Producing Data Mash-ups9 The Mash-up Production Process Collect marks, add new data and structure Extract data from marks and combine with added data Collect and ClassifyExtract and Combine Transform Docs DBMS Services Format reconstituted data for display and other purposes Services Condensed mash-up Reconstituted mash-up DBMS Docs DBMS Docs Formatted mash-up

31-Oct-15Declaratively Producing Data Mash-ups10 SI, Bi-level Information, Mash-ups A condensed mash-up is SI –Links mash-up parts to external fragments –Relates to mash-up design: Sixml A reconstituted mash-up and a formatted mash-up are both bi-level information –SI plus reconstituted parts –Relates to runtime mash-up manipulation and execution: Sixml DOM and Sixml Navigator

31-Oct-15Declaratively Producing Data Mash-ups11 Outline Introduction The conceptual approach Sixml: Condensed mash-ups Sixml DOM: Reconstituted mash-ups Sixml Navigator: Formatted mash-ups Evaluation Summary Discussion

31-Oct-15Declaratively Producing Data Mash-ups12 Sixml A mash-up specification language –SI represented as XML; Sixml is XML A condensed mash-up is encoded as a Sixml document A mark association is encoded as an XML element of a type we define –Associate marks with six kinds of content –Validated using standard schema constructs –Uniform and comprehensible serialization

Contradicts prior work … … … Contradicts prior work … … … Contradicts prior work Contradicts prior work … Contradicts prior work … … 31-Oct-15Declaratively Producing Data Mash-ups13 Sixml Mark Associations By default text excerpt is assigned at run time, but possible to declare that the value should be something other than the excerpt Mark association names shown here are same as type name, but custom names are possible (with both static and dynamic typing)

Contradicts prior work … OfficeAgents.MSWord 31-Oct-15Declaratively Producing Data Mash-ups14 Sixml Mark Descriptors Any internal structure OK. An implementation specific to an xsi:type interprets the structure

31-Oct-15Declaratively Producing Data Mash-ups15 Outline Introduction The conceptual approach Sixml: Condensed mash-ups Sixml DOM: Reconstituted mash-ups Sixml Navigator: Formatted mash-ups Evaluation Summary Discussion

31-Oct-15Declaratively Producing Data Mash-ups16 Sixml DOM Extends W3C XML DOM to easily manipulate Sixml documents –Using DOM can be tedious and inefficient Automatic and lazy reconstitution –Detects mark associations and interprets attributes such as sixml:valueSource –Developer uses only the DOM interface Access to descriptors and “context” of external fragments

31-Oct-15Declaratively Producing Data Mash-ups17 Run-time Representation Contradicts prior work … … … DOM tree

31-Oct-15Declaratively Producing Data Mash-ups18 Generating a Sixml DOM Tree Sixml DOM tree A mark association is “attached” to its target, but is not a child - The DOM interface suffices to access the reconstituted mash-up Descriptor is not a child Value reconstituted

31-Oct-15Declaratively Producing Data Mash-ups19 Context Information Information retrieved from the context of an external fragment An xsi:type -specific implementation determines (statically or dynamically) what is in context provide... system Times New Roman 11 3

Programming with Sixml DOM 1.procedure WriteComment(SixmlElement c) 2. XmlElement ctxt = c.markAssociations[0].Context 3. XmlNode page = ctxt.getElementsByTagName("Page")[0] 4. Writeln("Page: ", page.firstChild.nodeValue) 5. Writeln("Excerpt: ", c.getAttribute("excerpt")) 6. Writeln("Comment: ", c.firstChild.nodeValue) Only Lines 1 and 2 use the Sixml DOM interface Lines 2–4 get page number; Line 5 the reconstituted excerpt; and Line 6 the comment text 31-Oct-15Declaratively Producing Data Mash-ups20

31-Oct-15Declaratively Producing Data Mash-ups21 Outline Introduction The conceptual approach Sixml: Condensed mash-ups Sixml DOM: Reconstituted mash-ups Sixml Navigator: Formatted mash-ups Evaluation Summary Discussion

31-Oct-15Declaratively Producing Data Mash-ups22 Sixml Navigator Alternative to the traditional path navigator Extends XDM so that Sixml documents can be declaratively queried using existing languages and query processors –Also applies to XPath 1.0 and XSLT 1.0 Performs automatic and lazy reconstitution

31-Oct-15Declaratively Producing Data Mash-ups23 XDM Extensions Allow child elements for any kind of node with which a mark may be associated Make a mark association a child of its target node Represent a mark descriptor and context as children of a mark association These extensions allow reuse of existing query languages and processors

31-Oct-15Declaratively Producing Data Mash-ups24 An Extended-XDM Tree Extended-XDM tree

31-Oct-15Declaratively Producing Data Mash-ups25 Queries over Bi-level Information With Comment as current node, get the comment text./text() Get excerpt of commented Get page number of commented region./sixml:EMark/sixml:Context/Placement/Page 3

31-Oct-15Declaratively Producing Data Mash-ups26 Outline Introduction The conceptual approach Sixml: Condensed mash-ups Sixml DOM: Reconstituted mash-ups Sixml Navigator: Formatted mash-ups Evaluation Summary Discussion

31-Oct-15Declaratively Producing Data Mash-ups27 Implementation and Usage Element types for Sixml mark associations defined in XML Schema Sixml DOM and Sixml Navigator in C# on the.NET Framework –Sixml DOM implemented by extending DOM and by revising DOM –Three implementations of Sixml DOM: 2 extensions (MS and Mono), 1 revision (Mono) Sixml, Sixml DOM, and Sixml Navigator used in mash-ups for several applications

Experimental Data 8 mash-ups –4 each from 2 apps; different scale factors –File size: 200 KB to 26.1 MB –#Docs referenced: 18 to 426 –#Mark associations: 1.9K to over 311K 3 traditional XML documents –File size: 484 KB to MB –Tree depth: 4, 8, Oct-15Declaratively Producing Data Mash-ups28

Evaluation Summary Sixml DOM –Saves time over DOM when accessing mark associations –When accessing SI, savings decrease as the amount of SI increases –It is better to use DOM to access large traditional XML documents Sixml Navigator –Saves time over traditional navigator for both mark associations and SI 31-Oct-15Declaratively Producing Data Mash-ups29

31-Oct-15Declaratively Producing Data Mash-ups30 Outline Introduction The conceptual approach Sixml: Condensed mash-ups Sixml DOM: Reconstituted mash-ups Sixml Navigator: Formatted mash-ups Evaluation Summary Discussion

Summary A mash-up has three forms: condensed, reconstituted, and formatted Sixml, Sixml DOM, and Sixml Navigator support the three forms, respectively Sixml makes it easier to specify mash-ups; Sixml DOM and Navigator provide a more efficient means of manipulating mash-ups The XML Schema instance documents and the source code are on 31-Oct-15Declaratively Producing Data Mash-ups31

31-Oct-15Declaratively Producing Data Mash-ups32 Outline Introduction The conceptual approach Sixml: Condensed mash-ups Sixml DOM: Reconstituted mash-ups Sixml Navigator: Formatted mash-ups Evaluation Summary Discussion

31-Oct-15Declaratively Producing Data Mash-ups33 Our Mash-up Framework XSLT and XQuery Processors XPath Processor Client Application SixmlSixml DOMSixml Navigator SPARCEBulk AccessorCloaker Reference and retrieve fragments of arbitrary types Efficiently retrieve large number of fragments Hide data to improve query expression and execution

Bi-level Query Processors Sixml Navigator uses Sixml DOM internally: Does not construct extended-XDM trees Existing query processors use the Sixml Navigator instead of using the traditional navigator 31-Oct-15Declaratively Producing Data Mash-ups34

31-Oct-15Declaratively Producing Data Mash-ups35 Mark Creation Superimposed Application Mark Manager Clipboard Superimposed Info Descriptors Repository AcrobatAgents.PDFAgent AcrobatPDFTextMark 2|395|439 … D6 M4 S1

31-Oct-15Declaratively Producing Data Mash-ups36 Activation and Context Retrieval Superimposed Application Mark Manager Context Manager Superimposed Info Base Application Descriptors Repository Base Info AcrobatAgents.PDFAgent AcrobatPDFTextMark 2|395|439 … D6 M4 S1

31-Oct-15Declaratively Producing Data Mash-ups37 About Context PDF Mark PowerPoint Mark Context information is modeled as a hierarchical property set