The PADS-Galax Project Enabling XQuery over Ad-hoc Data Sources Yitzhak Mandelbaum.

Slides:



Advertisements
Similar presentations
1 ICS-FORTH EU-NSF Semantic Web Workshop 3-5 Oct Christophides Vassilis Database Technology for the Semantic Web Vassilis Christophides Dimitris Plexousakis.
Advertisements

Introduction to the BinX Library eDIKT project team Ted Wen Robert Carroll
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
DSLs: The Good, the Bad, and the Ugly Kathleen Fisher AT&T Labs Research.
SilkRoute: A Framework for Publishing Relational Data in XML Mary Fernández, AT&T Labs - Research Dan Suciu, Univ. of Washington Yada Kadiyska, Univ. of.
WTX Overview.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
2005rel-xml-ii1 The SilkRoute system  The system goals  Scenario, examples  View Forests  View forest and query composition  View forest efficient.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Natix Done by Asmaa Hassanain CSC 5370 Dr. Hachim Haddoutti 12/8/2003.
WWW and Internet The Internet Creation of the Web Languages for document description Active web pages.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
CVSQL 2 The Design. System Overview System Components CVSQL Server –Three network interfaces –Modular data source provider framework –Decoupled SQL parsing.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.
GRITS Working with AVM Data Astronomy Visualization Metadata June 11th, 2010 Casey Rosenthal
DP&NM Lab. POSTECH, Korea - 1 -Interaction Translation Methods for XML/SNMP Gateway Interaction Translation Methods for XML/SNMP Gateway Using XML Technologies.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
Copyright © Orbeon, Inc. All rights reserved. Erik Bruchez Applications of XML Pipelines XML Prague, June 16 th, 2007.
Interoperability in Information Schemas Ruben Mendes Orientador: Prof. José Borbinha MEIC-Tagus Instituto Superior Técnico.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Open Data Protocol * Han Wang 11/30/2012 *
Introducing CoMI Aligned with RestCONF (draft-ietf-netconf-restconf-04) Common data modeling language (YANG defined in RFC 6020) Protocol (CoAP instead.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
© GMV S.A., 2004 Property of GMV S.A. All rights reserved 2004/05/13 XML in CCSDS CCSDS Spring Meeting - Montreal Fran Martínez GMVSA 4081/04.
NERC DataGrid NERC DataGrid Vocabulary Server Use Cases Vocabulary Workshop, RAL, February 25, 2009.
2005 Epocrates, Inc. All rights reserved. Integrating XML with legacy relational data for publishing on handheld devices David A. Lee Senior member of.
XML Web Services Architecture Siddharth Ruchandani CS 6362 – SW Architecture & Design Summer /11/05.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Chapter 38 Persistence Framework with Patterns 1CS6359 Fall 2011 John Cole.
PADS: Processing Arbitrary Data Streams Kathleen Fisher Robert Gruber.
XML Databases by Sebastian Graf Hier beginnt mein toller Vortrag.
Slide 1Reproduction prohibited without permission from Computas AS © METIS An Open Architecture Toolkit ADM and ADML support Don Hodge Principle Knowledge.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
XML, XSL, and SOAP Building Object Systems from Documents CSC/ECE 591o Summer 2000.
Apr. 8, 2002Calibration Database Browser Workshop1 Database Access Using D0OM H. Greenlee Calibration Database Browser Workshop Apr. 8, 2002.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Open Archives Initiative Protocol for Metadata Harvesting.
PADS: Processing Arbitrary Data Streams Kathleen Fisher Robert Gruber.
Object storage and object interoperability
Martin Kruliš by Martin Kruliš (v1.1)1.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
14 October 2002GGF6 / CGS-WG1 Working with CIM Ellen Stokes
Lecture 23 XQuery 1.0 and XPath 2.0 Data Model. 2 Example 31.7 – User-Defined Function Function to return staff at a given branch. DEFINE FUNCTION staffAtBranch($bNo)
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
DC Architecture WG meeting Wednesday Seminar Room: 5205 (2nd Floor)
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Application Web Service Toolkit Allow users to quickly add new applications GGF5 Edinburgh Geoffrey Fox, Marlon Pierce, Ozgur Balsoy Indiana University.
1 IVOA Registry Interoperability Meeting Strasbourg 16-Oct-2003.
PYTHON FOR HIGH PERFORMANCE COMPUTING. OUTLINE  Compiling for performance  Native ways for performance  Generator  Examples.
SOAP, Web Service, WSDL Week 14 Web site:
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Open Source distributed document DB for an enterprise
Middleware independent Information Service
XML in Web Technologies
PDAP Query Language International Planetary Data Alliance
1.1 The Evolution of Database Systems
Data Model.
Implementing ATML Lessons Learned
CMPT 354: Database System I
New (Applications of) Compiler Techniques for Data Grids
OPeNDAP/Hyrax Interfaces
Presentation transcript:

The PADS-Galax Project Enabling XQuery over Ad-hoc Data Sources Yitzhak Mandelbaum

What is PADS? Declarative data description language Syntax & semantics of semi-structured, legacy data sources From description, compiler generates: –Data-parsing library –In-memory representation You write C program

What are XQuery and Galax? XQuery –Functional, strongly typed XML query language –Well-suited to querying semi-structured sources Galax –Complete, extensible implementation of XQuery 1.0

HTTP Common Log Format HTTP CLF Data [15/Oct/1997:18:46: ] "GET /tk/p.txt HTTP/1.0" PADS Description Pstruct http_request_t { '\"'; http_method_t meth; ' '; Pa_string(:' ':) req_uri; ' '; http_v_t version: checkVersion (version, meth); '\"'; };

CLF as XML … "GET /tk/p.txt HTTP/1.0" … GET /tk/p.txt HTTP/1.0...

Querying HTTP CLF Selection & projection using XQuery –Return list of URI’s requested by host $x. $log/http_clf[host=$x][request/meth= GET]/req_uri Vet errors in data using XQuery –Return locations of records with error in host field

PADS-Galax Architecture

Technical Challenges Define mapping from PADS description to XML Schema Materialize PADS data as virtual XML –Galax has abstract data model –Implement Galax’s abstract data model on top of PADS

Technical Challenges Memory management of PADS records –Data exceeding memory limits requires clever memory management –PADS program typically reads records sequentially –Galax may not access records sequentially User-friendly interface –Describe PADS data, compile library, write & execute queries

Challenges & Solutions (1) Define mapping from PADS description to XML Schema –Canonical mapping defined Summer 2003 Materialize PADS data as virtual XML –Started Summer 2003 but incomplete –Align with current Galax Data Model

Abstract Node Interface Fragment of Galax’s abstract XML node interface –Full navigation of XML tree –Access to atomic values method virtual node_name : unit -> atomicQName option method virtual typed_value : unit -> atomicValue cursor method virtual parent : unit -> node option method virtual children : unit -> node cursor method virtual docorder : unit -> Nodeid.docorder Cursor : lazy iterator access to node sequence Node identity & document order : canonical order

Challenges & Solutions (2) Memory management of PADS records –Choose record as read granularity –Read records on demand –Maintain meta-data for fast re-retrieval User-friendly interface –Integrated docorder, cursors, and MM into compiler –Room for improvement

A Smart Array … 0 6 GB GET log meth Meta-Data

Project Status Integration effort successful More thorough regression testing Demonstrate to potential users Research problems –Extending Galax’s data model to leverage streams access –More efficient meta-data structures in PADS

Thanks to … Kathleen Fisher Robert Gruber Mary Fernandez

Viewing & Querying HTTP CLF Virtual XML Data /Oct/1997:18:46: GET /tk/p.txt HTTP/

Describing HTTP Common Log Format HTTP CLF Data [15/Oct/1997:18:46: ] "GET /tk/p.txt HTTP/1.0" PADS Description Pstruct http_request_t { '\"'; http_method_t meth; ' '; Pa_string(:' ':) req_uri; ' '; http_v_t version: chkVn(version, meth); '\"'; \}; Pstruct http_clf_t { Pint8 ip_t[4] : Psep('.') && Pterm(' '); … http_request_t request; };

Accessing Record Sequences Access to record (node) sequence –Read all items in sequence –Produce items on demand Each record field materialized strictly as needed Solution: –Choose record as read granularity –Read records on demand –Maintain meta-data for fast re-retrieval