Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

Slides:



Advertisements
Similar presentations
A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
Advertisements

Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Querying Workflow Provenance Susan B. Davidson University of Pennsylvania Joint work with Zhuowei Bao, Xiaocheng Huang and Tova Milo.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard.
A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses George Candea (EPFL & Aster Data) Neoklis Polyzotis (UC Santa Cruz) Radek Vingralek.
Transaction.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
Chapter 14 The Second Component: The Database.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
Automatic Data Ramon Lawrence University of Manitoba
1 Provenance in O RCHESTRA T.J. Green, G. Karvounarakis, Z. Ives, V. Tannen University of Pennsylvania Principles of Provenance (PrOPr) Philadelphia, PA.
Methodology Conceptual Database Design
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Transaction Processing Systems, & Management Information Systems.
Class 3 Data and Business MIS 2000 Updated: January 2014.
Overview of the ODP Data Provider Sergey Sukhonosov National Oceanographic Data Centre, Russia Expert training on the Ocean Data Portal technology, Buenos.
Approximated Provenance for Complex Applications
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
A Generic Provenance Middleware for Database Queries, Updates, and Transactions Bahareh Sadat Arab 1, Dieter Gawlick 2, Venkatesh Radhakrishnan 2, Hao.
Web Explanations for Semantic Heterogeneity Discovery Pavel Shvaiko 2 nd European Semantic Web Conference (ESWC), 1 June 2005, Crete, Greece work in collaboration.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Databases and LINQ Visual Basic 2010 How to Program 1.
Systems analysis and design, 6th edition Dennis, wixom, and roth
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Unifying Data and Domain Knowledge Using Virtual Views IBM T.J. Watson Research Center Lipyeow Lim, Haixun Wang, Min Wang, VLDB Summarized.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Components of Database Management System
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Usage of `provenance’: A Tower of Babel Luc Moreau.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.
NoSQL Databases NoSQL Concepts SoftUni Team Technical Trainers Software University
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
Automatic Rule Refinement for Information Extraction Bin Liu University of Michigan Laura Chiticariu IBM Research - Almaden Vivian Chu IBM Research - Almaden.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
Methodology - Conceptual Database Design. 2 Design Methodology u Structured approach that uses procedures, techniques, tools, and documentation aids to.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Bdbms: A Database System for Scientific Data Management Mohamed Y. Eltabakh, Mourad Ouzzani, Walid G. Aref, Ahmed Elmagarmid, Yasin Silva, Umer Arshad,
Retrospective computation makes past states available inline with current state in a live system What is the language for retrospective computation? What.
Okalo Daniel Ikhena Dr. V. Z. Këpuska December 7, 2007.
3-Tier Client/Server Internet Example. TIER 1 - User interface and navigation Labeled Tier 1 in the following graphic, this layer comprises the entire.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
C-Store: RDF Data Management Using Column Stores Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 24, 2009.
ReproZip Packing Experiments for Sharing and Publication Fernando Chirigati, Juliana Freire | NYU-Poly Dennis Shasha | NYU.
XML and Database.
Benjamin Post Cole Kelleher.  Availability  Data must maintain a specified level of availability to the users  Performance  Database requests must.
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Managing Enterprise GIS Geodatabases
Mining real world data RDBMS and SQL. Index RDBMS introduction SQL (Structured Query language)
Object storage and object interoperability
Towards Unifying Vector and Raster Data Models for Hybrid Spatial Regions Philip Dougherty.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
REX: RECURSIVE, DELTA-BASED DATA-CENTRIC COMPUTATION Yavuz MESTER Svilen R. Mihaylov, Zachary G. Ives, Sudipto Guha University of Pennsylvania.
1 Section 1 - Introduction to SQL u SQL is an abbreviation for Structured Query Language. u It is generally pronounced “Sequel” u SQL is a unified language.
CERN IT Department CH-1211 Genève 23 Switzerland t Load testing & benchmarks on Oracle RAC Romain Basset – IT PSS DP.
Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Sivaramakrishnan Narayanan, Tahsin Kurc, Umit Catalyurek and Joel Saltz.
Overview of the Semantic Web Ralph R. Swick World Wide Web Consortium (W3C) 17 October 2009.
Hadoop.
Relational Model.
ece 627 intelligent web: ontology and beyond
Laura Bright David Maier Portland State University
Performance And Scalability In Oracle9i And SQL Server 2000
Toward an Ontology-Driven Architectural Framework for B2B E. Kajan, L
Presentation transcript:

Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor, Boris Glavic Illinois Institute of Technology Venkatesh Radhakrishnan Facebook Xing Niu Illinois Institute of Technology

Outline ① Introduction ② Related work ③ Overview ④ Export and Import ⑤ Experimental Results ⑥ Conclusions and Future Work

Introduction The PROV standards  A standardized, extensible representation of provenance graphs  Exchange of provenance information between systems Provenance-aware DBMS  Computing the provenance of database operations  E.g., Perm[1], GProM [2], DBNotes[3], Orchestra[4], LogicBlox[5] 3 [1] B. Glavic, R. J. Miller, and G. Alonso. Using SQL for Efficient Generation and Querying of Provenance Information. In In Search of Elegance in the Theory and Practice of Computation, pages 291–320. Springer, [2] YB. Arab, D. Gawlick, V. Radhakrishnan, H. Guo, and B. Glavic. A generic provenance middleware for database queries, updates, and transactions. In TaPP, [3] D. Bhagwat, L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. An Annotation Management System for Relational Databases. VLDB Journal, 14(4):373–396, [4] G. Karvounarakis, T. J. Green, Z. G. Ives, and V. Tannen. Collaborative data sharing via update exchange and provenance. TODS, 38(3):19, [5] Huang, S., Green, T., Loo, B.: Datalog and emerging applications: an interactive tutorial. In: SIGMOD, pp. 1213–1216 (2011)

Introduction Example: extracting demographic information from tweets 4

Introduction Problem:  No relational database system supports tracking of database provenance as well as import and export of provenance in PROV  Not capable of exporting provenance into standardized formats E.g., GProM:  Essentially produces wasDerivedFrom edges Between the output tuples of a query Q and its inputs.  However, not available as PROV graphs No way to track the derivation back to non-database entities 5

Introduction GProM System 6  Computes provenance for database operations Queries, updates, transactions  Using SQL language extensions e.g., PROVENANCE OF (SELECT...)

Introduction Example of GProM in action  The result of PROVENANCE OF for query Q  Each tuple in this result represents one wasDerivedFrom assertion E.g., tuple t o1 was derived from tuple t 1 7

Introduction Goal: make databases interoperable with other provenance systems Approach:  Export and import of provenance PROV-JSON  Propagation of imported provenance  Implemented in GProM using SQL 8

Outline ① Introduction ② Related work ③ Overview ④ Export and Import ⑤ Experimental Results ⑥ Conclusion and future work

Related Work How to integrate provenance graphs by identifying common elements? [6] Address interoperability problem between databases and other provenance-aware systems through – Common model for both types of provenance [7][8][9] – Monitoring database access to link database provenance with other provenance systems [10][11] 10 [6] A. Gehani and D. Tariq. Provenance integration. In TaPP, [7] U. Acar, P. Buneman, J. Cheney, J. van den Bussche, N. Kwasnikowska, and S. Vansummeren. A graph model of data and workflow provenance. In TaPP, [8] Y. Amsterdamer, S. Davidson, D. Deutch, T. Milo, J. Stoyanovich, and V. Tannen. Putting Lipstick on Pig: Enabling Database-style Workflow Provenance. PVLDB, 5(4):346–357, [9] D. Deutch, Y. Moskovitch, and V. Tannen. A provenance framework for data-dependent process analysis. PVLDB, 7(6), [10] F. Chirigati and J. Freire. Towards integrating workflow and database provenance. In IPAW, pages 11–23, [11] Q. Pham, T. Malik, B. Glavic, and I. Foster. LDV: Light-weight Database Virtualization. In ICDE, pages 1179–1190, 2015.

Outline ① Introduction ② Related works ③ Overview ④ Export and Import ⑤ Experimental Results ⑥ Conclusion and future work

Overview We introduce techniques for exporting database provenance as PROV documents Importing PROV graphs alongside data Linking outputs of SQL operations to imported provenance for their inputs – Implementation in GProM offloads generation of PROV documents to backend database SQL and string concatenation 12

Outline ① Introduction ② Related works ③ Overview ④ Export and Import ⑤ Experimental Results ⑥ Conclusion and future work

Export and Import Export – Added TRANSLATE AS clause e.g., PROVENANCE OF (SELECT...) TRANSLATE AS … – Construct PROV-JSON document from database provenance ① Running several projections over the provenance computation – E.g., ‘”_:wgb\(’ || F0.STATE || ‘|’ || F0.”AVG(AGE)” || ‘\)’… ② Uses aggregation to concatenate all snippets of a certain type – E.g., entity nodes, wasGeneratedBy edges, allUsed edges ③ Uses string concatenation to create final document 14

Export and Import Example: part of the final PROV document 15 Red dotted lines in DB

Export and Import Import  Import PROV for an existing relation  Provide a language construct IMPORT PROV FOR...  Import available PROV graphs for imported tuples and store them alongside the data  Add three columns to each table to store imported provenance prov doc: store a PROV-JSON snippet representing its provenance Prov_eid: indicates which of the entities in this snippet represents the imported tuple Prov_time: stores a timestamp as of the time when the tuple was imported 16

Export and Import Import : example  Relation user with imported provenance  Attribute value d is the previous PROV graph without database activities and entities 17

Export and Import Using Imported Provenance During Export  Include the imported provenance as bundles in the generated PROV graph Bundles [13] enable nesting of PROV graphs within PROV graphs, treating a nested graph as a new entity.  Connect the entities representing input tuples in the imported provenance to the query activity and output tuple entities 18 [13] P. Missier, K. Belhajjame, and J. Cheney. The W3C PROV family of specifications for modelling provenance metadata. In EDBT, pages 773–776, 2013.

Export and Import Example of Bundles: 19

Export and Import Handling Updates  If a tuple is modified, that should be reflected when provenance is exported E.g., by running an SQL UPDATE statement Example  Assume the user has run an update to correct tuple t 1 ’s age value (setting age to 70) before running the query 20

Export and Import Challenge  How to track the provenance of updates under transactional semantics Solution  GProM using the novel concept of reenactment queries User can request the provenance of an past update, transaction, or set of updates executed within a given time interval Construct PROV document using provenance for updates computed on-the-fly 21

Outline ① Introduction ② Related works ③ Overview ④ Export and Import ⑤ Experimental Results ⑥ Conclusion and future work

Experimental Results TPC-H [14] benchmark datasets  Scale factor from 0.01 to 10 (10MB up to 10GB size) Run on a machine with  2 x AMD Opteron 3.3Ghz Processors  128GB RAM  4 x 1 TB 7.2K RPM disks configured in RAID 5 Queries  Provenance of a three way join between relations customer, order, and nation  With additional selection conditions to control selectivity (and, thus, the size of the exported PROV-JSON document). 23 [14] TPC. TPC-H Benchmark Specification, 2009.

Experimental Results 24 1 GB 10 GB

Outline ① Introduction ② Related works ③ Overview ④ Export and Import ⑤ Experimental Results ⑥ Conclusions and Future Work

Conclusions and Future Work Conclusions Integrated import and export of provenance represented as PROV-JSON into/from provenance-aware databases Construct PROV graphs on-the-fly using SQL Connect database provenance to imported PROV data Future Work Full implementation for updates Automatic storage management (e.g., deduplication) for imported provenance Automatic cross-referencing 26

Questions My Webpage – Our Group’s Webpage – GProM – p p 27

Others Provenance querying Provenance for JSON 28