Download presentation
Presentation is loading. Please wait.
Published byKory Ward Modified over 9 years ago
1
Declaratively Producing Data Mash-ups Sudarshan Murthy 1, David Maier 2 1 Applied Research, Wipro Technologies 2 Department of Computer Science, Portland State University http://www.sixml.org
2
31-Oct-15Declaratively Producing Data Mash-ups2 Mash-ups Web applications that combine information from multiple sources [Wikipedia] –A mash-up does not need to be a web app Data that includes or transcludes content from multiple sources In either case, a source is likely only a fragment This work is about data mash-ups –In this talk, a mash-up is an XML document
3
31-Oct-15Declaratively Producing Data Mash-ups3 Portland State University Campus Map 45 markers, 53 landmarks –Marker: Balloon on map –Landmark: Building, department, … Information from 188 fragments in 58 web pages Fragments selected manually http://sparce.cs.pdx.edu/cmap/
4
31-Oct-15Declaratively Producing Data Mash-ups4 Portland Metro Food Markets 154 markers, 154 landmarks 154 fragments harvested programmatically from 4 MS Word documents Developed for the Oregon Department of Agriculture http://sparce.cs.pdx.edu/Declaratively Producing Data Mash-ups/oda-1.1/
5
An HTML Review Report 31-Oct-15Declaratively Producing Data Mash-ups5
6
31-Oct-15Declaratively Producing Data Mash-ups6 Problem Areas Development –Getting data from heterogeneous fragments –Might use a DBMS, yet code operators such as sort, join, and aggregate for external data Execution –When to get external data, how much to get? Design: Expressing that –A part comes from an external fragment –A part is data (such as page number) which cannot be “selected” in the source
7
31-Oct-15Declaratively Producing Data Mash-ups7 Outline Introduction The conceptual approach Sixml: Condensed mash-ups Sixml DOM: Reconstituted mash-ups Sixml Navigator: Formatted mash-ups Evaluation Summary Discussion
8
31-Oct-15Declaratively Producing Data Mash-ups8 Superimposed Information (SI) SI is new data and structure overlaid on existing base information Mark: A reference to an external fragment Benefits –Multiple, simultaneous organizations –Make new connections among base fragments –Preserve context Heterogeneous sources: Word, Excel, PDF, HTML,…
9
31-Oct-15Declaratively Producing Data Mash-ups9 The Mash-up Production Process Collect marks, add new data and structure Extract data from marks and combine with added data Collect and ClassifyExtract and Combine Transform Docs DBMS Services Format reconstituted data for display and other purposes Services Condensed mash-up Reconstituted mash-up DBMS Docs DBMS Docs Formatted mash-up
10
31-Oct-15Declaratively Producing Data Mash-ups10 SI, Bi-level Information, Mash-ups A condensed mash-up is SI –Links mash-up parts to external fragments –Relates to mash-up design: Sixml A reconstituted mash-up and a formatted mash-up are both bi-level information –SI plus reconstituted parts –Relates to runtime mash-up manipulation and execution: Sixml DOM and Sixml Navigator
11
31-Oct-15Declaratively Producing Data Mash-ups11 Outline Introduction The conceptual approach Sixml: Condensed mash-ups Sixml DOM: Reconstituted mash-ups Sixml Navigator: Formatted mash-ups Evaluation Summary Discussion
12
31-Oct-15Declaratively Producing Data Mash-ups12 Sixml A mash-up specification language –SI represented as XML; Sixml is XML A condensed mash-up is encoded as a Sixml document A mark association is encoded as an XML element of a type we define –Associate marks with six kinds of content –Validated using standard schema constructs –Uniform and comprehensible serialization
13
Contradicts prior work … … … Contradicts prior work … … … Contradicts prior work Contradicts prior work … Contradicts prior work … … 31-Oct-15Declaratively Producing Data Mash-ups13 Sixml Mark Associations By default text excerpt is assigned at run time, but possible to declare that the value should be something other than the excerpt Mark association names shown here are same as type name, but custom names are possible (with both static and dynamic typing)
14
Contradicts prior work http://www.w3.org/#element(/1/2) … OfficeAgents.MSWord 31-Oct-15Declaratively Producing Data Mash-ups14 Sixml Mark Descriptors Any internal structure OK. An implementation specific to an xsi:type interprets the structure
15
31-Oct-15Declaratively Producing Data Mash-ups15 Outline Introduction The conceptual approach Sixml: Condensed mash-ups Sixml DOM: Reconstituted mash-ups Sixml Navigator: Formatted mash-ups Evaluation Summary Discussion
16
31-Oct-15Declaratively Producing Data Mash-ups16 Sixml DOM Extends W3C XML DOM to easily manipulate Sixml documents –Using DOM can be tedious and inefficient Automatic and lazy reconstitution –Detects mark associations and interprets attributes such as sixml:valueSource –Developer uses only the DOM interface Access to descriptors and “context” of external fragments
17
31-Oct-15Declaratively Producing Data Mash-ups17 Run-time Representation Contradicts prior work … … … DOM tree
18
31-Oct-15Declaratively Producing Data Mash-ups18 Generating a Sixml DOM Tree Sixml DOM tree A mark association is “attached” to its target, but is not a child - The DOM interface suffices to access the reconstituted mash-up Descriptor is not a child Value reconstituted
19
31-Oct-15Declaratively Producing Data Mash-ups19 Context Information Information retrieved from the context of an external fragment An xsi:type -specific implementation determines (statically or dynamically) what is in context provide... system Times New Roman 11 3
20
Programming with Sixml DOM 1.procedure WriteComment(SixmlElement c) 2. XmlElement ctxt = c.markAssociations[0].Context 3. XmlNode page = ctxt.getElementsByTagName("Page")[0] 4. Writeln("Page: ", page.firstChild.nodeValue) 5. Writeln("Excerpt: ", c.getAttribute("excerpt")) 6. Writeln("Comment: ", c.firstChild.nodeValue) Only Lines 1 and 2 use the Sixml DOM interface Lines 2–4 get page number; Line 5 the reconstituted excerpt; and Line 6 the comment text 31-Oct-15Declaratively Producing Data Mash-ups20
21
31-Oct-15Declaratively Producing Data Mash-ups21 Outline Introduction The conceptual approach Sixml: Condensed mash-ups Sixml DOM: Reconstituted mash-ups Sixml Navigator: Formatted mash-ups Evaluation Summary Discussion
22
31-Oct-15Declaratively Producing Data Mash-ups22 Sixml Navigator Alternative to the traditional path navigator Extends XDM so that Sixml documents can be declaratively queried using existing languages and query processors –Also applies to XPath 1.0 and XSLT 1.0 Performs automatic and lazy reconstitution
23
31-Oct-15Declaratively Producing Data Mash-ups23 XDM Extensions Allow child elements for any kind of node with which a mark may be associated Make a mark association a child of its target node Represent a mark descriptor and context as children of a mark association These extensions allow reuse of existing query languages and processors
24
31-Oct-15Declaratively Producing Data Mash-ups24 An Extended-XDM Tree Extended-XDM tree
25
31-Oct-15Declaratively Producing Data Mash-ups25 Queries over Bi-level Information With Comment as current node, get the comment text./text() Get excerpt of commented region./@excerpt Get page number of commented region./sixml:EMark/sixml:Context/Placement/Page 3
26
31-Oct-15Declaratively Producing Data Mash-ups26 Outline Introduction The conceptual approach Sixml: Condensed mash-ups Sixml DOM: Reconstituted mash-ups Sixml Navigator: Formatted mash-ups Evaluation Summary Discussion
27
31-Oct-15Declaratively Producing Data Mash-ups27 Implementation and Usage Element types for Sixml mark associations defined in XML Schema Sixml DOM and Sixml Navigator in C# on the.NET Framework –Sixml DOM implemented by extending DOM and by revising DOM –Three implementations of Sixml DOM: 2 extensions (MS and Mono), 1 revision (Mono) Sixml, Sixml DOM, and Sixml Navigator used in mash-ups for several applications
28
Experimental Data 8 mash-ups –4 each from 2 apps; different scale factors –File size: 200 KB to 26.1 MB –#Docs referenced: 18 to 426 –#Mark associations: 1.9K to over 311K 3 traditional XML documents –File size: 484 KB to 113.7 MB –Tree depth: 4, 8, 16 31-Oct-15Declaratively Producing Data Mash-ups28
29
Evaluation Summary Sixml DOM –Saves time over DOM when accessing mark associations –When accessing SI, savings decrease as the amount of SI increases –It is better to use DOM to access large traditional XML documents Sixml Navigator –Saves time over traditional navigator for both mark associations and SI 31-Oct-15Declaratively Producing Data Mash-ups29
30
31-Oct-15Declaratively Producing Data Mash-ups30 Outline Introduction The conceptual approach Sixml: Condensed mash-ups Sixml DOM: Reconstituted mash-ups Sixml Navigator: Formatted mash-ups Evaluation Summary Discussion
31
Summary A mash-up has three forms: condensed, reconstituted, and formatted Sixml, Sixml DOM, and Sixml Navigator support the three forms, respectively Sixml makes it easier to specify mash-ups; Sixml DOM and Navigator provide a more efficient means of manipulating mash-ups The XML Schema instance documents and the source code are on www.sixml.orgwww.sixml.org 31-Oct-15Declaratively Producing Data Mash-ups31
32
31-Oct-15Declaratively Producing Data Mash-ups32 Outline Introduction The conceptual approach Sixml: Condensed mash-ups Sixml DOM: Reconstituted mash-ups Sixml Navigator: Formatted mash-ups Evaluation Summary Discussion
33
31-Oct-15Declaratively Producing Data Mash-ups33 Our Mash-up Framework XSLT and XQuery Processors XPath Processor Client Application SixmlSixml DOMSixml Navigator SPARCEBulk AccessorCloaker Reference and retrieve fragments of arbitrary types Efficiently retrieve large number of fragments Hide data to improve query expression and execution
34
Bi-level Query Processors Sixml Navigator uses Sixml DOM internally: Does not construct extended-XDM trees Existing query processors use the Sixml Navigator instead of using the traditional navigator 31-Oct-15Declaratively Producing Data Mash-ups34
35
31-Oct-15Declaratively Producing Data Mash-ups35 Mark Creation Superimposed Application Mark Manager Clipboard Superimposed Info Descriptors Repository AcrobatAgents.PDFAgent AcrobatPDFTextMark 2|395|439 … D6 M4 S1
36
31-Oct-15Declaratively Producing Data Mash-ups36 Activation and Context Retrieval Superimposed Application Mark Manager Context Manager Superimposed Info Base Application Descriptors Repository Base Info AcrobatAgents.PDFAgent AcrobatPDFTextMark 2|395|439 … D6 M4 S1
37
31-Oct-15Declaratively Producing Data Mash-ups37 About Context PDF Mark PowerPoint Mark Context information is modeled as a hierarchical property set
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.