Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and.

Slides:



Advertisements
Similar presentations
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Advertisements

CACORE TOOLS FEATURES. caCORE SDK Features caCORE Workbench Plugin EA/ArgoUML Plug-in development Integrated support of semantic integration in the plugin.
Feedback on OPM Yogesh Simmhan Microsoft Research Synthesis of pairwise conversations with: Roger Barga Satya Sahoo Microsoft Research Beth Plale Abhijit.
Using Asterisk to Implement Intelligent Call Center Solutions James Kleckner AMTELCO.
Web Service Ahmed Gamal Ahmed Nile University Bioinformatics Group
Open Provenance Model Tutorial Session 6: Interoperability.
Presented at AMSR Science Team Meeting September 23-24, 2014 AMSR SIPS STATUS Helen Conover Information Technology & Systems Center The University of Alabama.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
As computer network experiments increase in complexity and size, it becomes increasingly difficult to fully understand the circumstances under which a.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
Sponsored by the National Science Foundation netKarma Spiral 2 Year-end Project Review Indiana University Beth Plale (PI) School of Informatics and Computing.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Integrating Complementary Tools with PopMedNet TM 27 July 2015 Rich Schaaf
© 2012 Cisco and/or its affiliates. All rights reserved. CDN-4698 Cisco Public Collaboration Enabled Business Transformation (CEBT) Integration Platform.
Windows.Net Programming Series Preview. Course Schedule CourseDate Microsoft.Net Fundamentals 01/13/2014 Microsoft Windows/Web Fundamentals 01/20/2014.
JWST Integrated Modeling Environment James Webb Space Telescope.
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
Crystal Hoyer Program Manager IIS Team Preview of features that will be announced at MIX09 Please do not blog, take pictures or video of session.
September 23-24, 2014Dawn Conway, AMSR-E / AMSR2 TLSCF Lead Software Engineer AMSR-E / AMSR2 Team Lead Science Computing Facility TLSCF at UAH Dr. Roy.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
EQUELLA Product Strategy and Development
Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and.
1 OPeNDAP/ECHO Demo Integrating and Chaining services September, 2006 CEOS WGISS 22 Annapolis, MD.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
FI-CORE Data Context Media Management Chapter Release 4.1 & Sprint Review.
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Phase II Additions to LSG Search capability to Gene Browser –Though GUI in Gene Browser BLAST plugin that invokes remote EBI BLAST service Working set.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Application portlets within the PROGRESS HPC Portal Michał Kosiedowski
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
2015 GLM Annual Science Team Meeting: Cal/Val Tools Developers Forum 9-11 September, 2015 DATA MANAGEMENT For GLM Cal/Val Activities Helen Conover Information.
Enterprise Integration Patterns CS3300 Fall 2015.
User Working Group 2013 Data Access Mechanisms – Status 12 March 2013
ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana Cardiff University, UK.
JDF – An Overview.
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
ECS Metadata Considerations for Preservation SiriJodha S. Khalsa National Snow and Ice Data Center.
LANCE Processing at the AMSR-E SIPS Presented by Kathryn Regner Information Technology and Systems Center at the University of Alabama in Huntsville Joint.
EO Dataset Preservation Workflow Data Stewardship Interest Group WGISS-37 Meeting Cocoa Beach (Florida-US) - April 14-18, 2014.
HDF-EOS Workshop IV September 19-21, 2000 Richard E. Ullman ESDIS Information Architect NASA/ GSFC, Code 423.
XMC Cat: An Adaptive Catalog for Scientific Metadata Scott Jensen and Beth Plale School of Informatics and Computing Indiana University-Bloomington Current.
1 CLASS – Simple NOAA Archive Access Portal SNAAP Eric Kihn and Rob Prentice NGDC CLASS Developers Meeting July 14th, 2008 Simple NOAA Archive Access Portal.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
1 ECHO ECHO 9.0 for Data Partners Rob Baker January 23, 2007.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Application Web Service Toolkit Allow users to quickly add new applications GGF5 Edinburgh Geoffrey Fox, Marlon Pierce, Ozgur Balsoy Indiana University.
International Planetary Data Alliance Registry Development and Coordination Project Report 7 th IPDA Steering Committee Meeting July 13, 2012.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
OAIS (archive) OAIS (archive) Producer Management Consumer.
AIRS Meeting GSFC, February 1, 2002 ECS Data Pool Gregory Leptoukh.
ArcGIS for Server Security: Advanced
Microwave Radiometer Processing for Carbon in the Arctic Reservoirs Experiment (CARVE) Nick Steiner1 1 - Advanced Science Research Center, City University.
Statistical Information Systems Introducing SIS tool .Stat
OAIS Producer (archive) Consumer Management
Metadata The metadata contains
Lab 2: Information Retrieval
Presentation transcript:

Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and Systems Center, University of Alabama in Huntsville Joint AMSR-E Science Team Meeting June 2-3, 2010 Huntsville, AL 1

Objective Approach Co-Is/Partners Key Milestones Instant Karma: Applying a Proven Provenance Tool to NASA’s AMSR-E Data Production Stream PI: Michael Goodman, NASA MSFC Improve the collection, preservation, utility and dissemination of provenance information within the NASA Earth Science community Customize and integrate Karma, a proven provenance tool into NASA data production Collect and disseminate provenance of AMSR-E (Advanced Microwave Scanning Radiometer – Earth Observing System) standard data products, initially focusing on Sea Ice Engage the Sea Ice science team and user community Adhere to the Open Provenance Model (OPM) Apply Karma to Sea Ice data production workflows Customize Karma’s provenance dissemination user interface Evaluate usefulness of provenance collected -Measure traffic to Karma Provenance Browser -Collect user feedback -Expand use of Karma to other AMSR-E data production streams Thorsten Markus, NASA GSFC; Beth Plale, Indiana University; Rahul Ramachandran, Helen Conover, UAHuntsville TRL in = 7 TRL current = 7 11/09 Evaluate current AMSR-E SIPS product generation06/10 Extend Karma provenance collection tools for SIPS 09/10 Enhance Karma Provenance Browser interface10/10 Instrument AMSR-E Sea Ice production in Testbed12/10 Evaluate with Sea Ice science team03/11 Introduce Provenance Browser to NSIDC DAAC06/11 Instrument AMSR-E Sea Ice production in Ops09/11 Evaluate with AMSR-E Sea Ice user community02/12 Instrument other AMSR-E data streams02/12

Types of Provenance Information  Lots of information already available, but scattered across multiple locations – Processing system configuration – Dataset and file level metadata – Processing history information – Quality assurance information – Software documentation (e.g., algorithm theoretical basis documents, release notes) – Data documentation (e.g., guide documents, README files)  Instant Karma project aims to collate and organize information from multiple sources 3

Sea Ice Processing Flow and Dependencies 4 One day’s worth of Level-2A Tbs Delivered Algorithm Package (Sea Ice) Daily Processing Script Sea Ice Products 6.25 km Sea Ice Products 25 km Snow Melt Mask is a 5 day running averages that is updated and replaced daily. Masks generated yesterday are used for today’s products. Snow Depth on Sea Ice Product Multi-year Ice Mask Sea Ice Products 12.5 km Multi-year Ice Mask Snow Melt Mask Sea Ice Concentration Snow depth over Sea Ice Default Multi- year Ice Mask

Karma provenance collection and representation Karma analysis tool suite and portal = Optionally installed in future 5

AMSR-E daily processing workflow Workflow executes once per day of input files received Uses configuration files, data files, mask files Invokes processes, programs, algorithms Generates data files, images 6

Graph Viz client Subscriber Interface (provenance listener) Notification Ingester Interface Relational store Synchronous ingest Web service Query Service WS messenger Bus (future) WSM OPM 1.0 XML events OPM 1.0 RDF XML Axis 2other Instrumented apps Query client Karma 3.0 architecture Preserv client Preservati on object Prov Track lib Client Toolkit Ingester Implementer Interface Knowledge discovery: Inferencing, quality, completeness Database Setup script RESTful Service Axis 2 Prov Track lib Xregistry (Optional) Xregistry (Optional) XMC Cat metadata catalog (optional) 7

Karma Architecture Service Core – Bridge pattern for independent Ingester and IngesterImplementer implementation – Core components for ingesting notifications – Asynchronously shredding raw notifications to populate tables Axis2 Web Service Layer – API layer to ingest notifications from clients’ push – Also allows another layer to ingest notifications by pulling from message bus Axis2 Handlers – Gather information by intercepting SOAP message from host services – Minimal intrusiveness and lightweight instrumentation 8

Scavenging: for Stand-alone Provenance Collection Collects provenance using scavenging – Use existing collection mechanisms e.g., logging tool, auditing tool – Low burden on both users and programmers User AnnotationScavengingFull Instrumentation Application BurdenLow High Human BurdenHighLow Information QualityError rates and omissions lead to incomplete information Could have incompleteness Complete 9

Open Provenance Model (OPM) Karma is generic and stand-alone – Not coupled to any particular system Karma 3.0 Utilizes OPM v1.01 to represent provenance graph – OPM is a standard v1.01.pdf – Enables provenance information exchange with other OPM-compliant tools 10

Types of Provenance Information 11

Types of Provenance Info (2) [1] launches – Whom: user ID or name – What: service e.g., service URI – When: launch context, time [2] consumes and [3] produces – File (e.g., file URL, owner) – Service: program, algorithm version [4] invokes – Invoking service – Invoked service – Parameters – Results/faults 12

Additional types of provenance Information Captured by Karma Execution Status – Terminated or Failed Transfer of Data – Sending of results – Receiving of results Workflow and Program Lifecycles Unknown Notifications – Stored as raw notifications Forthcoming: Spatial and temporal information, simple and complex data values, quality information 13

Partial provenance graph for sea ice product run of 14 July 2010 – attribute data is incomplete 14 Execut- ion end date Santa Daily Execut- ion start date Product name Execut- ion date Sea ice 12 mask Bright- ness file L3 25km sea ice product Processing_ty pe = sea ice; … Bright- ness file 12km sea ice product 6km sea ice product used WasControlledBy WasGener atedBy used URI = qqqq; Generation time = xxxx; file name = yyyy; URI = qqqq; Generation time = xxxx; file name = yyyy; Service URI = qqqq; Execution time = xxxx; version no. = yyyy; Service URI = qqqq; Execution time = xxxx; version no. = yyyy; Value = 14 July 2010 URI = gggg; Filename = yyyy; URI = gggg; Filename = yyyy;

Mask file Sea ice file Input files Provenance graph for sea ice product

Provenance used to explain difference in images 1/28/2010 and 2/09/2010 as change in sea mask due to missing data (underlined in blue in lower graph) 16

Example The provenance visualization is obtained using a simulated Karma provenance database and in this use case its aim is to help scientist identify the mask file being used and provenance information about mask file. The provenance graph gives the user annotated lineage about a sea ice data product: inputs required for its creation, the files created as a result of processing of the file. Provenance visualization in this form allows for deeper examination. – e. g. : for a recurring error, the scientist can view all related provenance information to get to source of error. 17

Ongoing work Better graph layout with detail for each data product and process used generating a sea ice product. Give nodes different shape and color depending on whether input node or generated output node etc. The user will be able to add annotations to edges by simply right clicking on them. Thus capturing semantic annotations to the existing causal dependencies. Forthcoming: Spatial and temporal information, simple and complex data values, quality information Provenance bundle archived with data or embedded in HDF file, in addition to Karma database 18

AMSR-E Provenance Use Cases Browse provenance graphs : convey rich information about final data product details – Spatial location, time of observation, algorithms employed, quality propagation Answer “Something isn’t right” question Example illustrated earlier: did not receive data for several days so mask can be inaccurate. Provenance “bundle” includes relevant science papers New communication satellites interfere with NASA satellites for certain channels Identify channels affected by RFI and channels used to generate each product 19