Download presentation
Presentation is loading. Please wait.
Published byNelson Short Modified over 9 years ago
1
Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and Systems Center, University of Alabama in Huntsville Joint AMSR-E Science Team Meeting June 2-3, 2010 Huntsville, AL 1
2
Objective Approach Co-Is/Partners Key Milestones Instant Karma: Applying a Proven Provenance Tool to NASA’s AMSR-E Data Production Stream PI: Michael Goodman, NASA MSFC Improve the collection, preservation, utility and dissemination of provenance information within the NASA Earth Science community Customize and integrate Karma, a proven provenance tool into NASA data production Collect and disseminate provenance of AMSR-E (Advanced Microwave Scanning Radiometer – Earth Observing System) standard data products, initially focusing on Sea Ice Engage the Sea Ice science team and user community Adhere to the Open Provenance Model (OPM) Apply Karma to Sea Ice data production workflows Customize Karma’s provenance dissemination user interface Evaluate usefulness of provenance collected -Measure traffic to Karma Provenance Browser -Collect user feedback -Expand use of Karma to other AMSR-E data production streams Thorsten Markus, NASA GSFC; Beth Plale, Indiana University; Rahul Ramachandran, Helen Conover, UAHuntsville TRL in = 7 TRL current = 7 11/09 Evaluate current AMSR-E SIPS product generation06/10 Extend Karma provenance collection tools for SIPS 09/10 Enhance Karma Provenance Browser interface10/10 Instrument AMSR-E Sea Ice production in Testbed12/10 Evaluate with Sea Ice science team03/11 Introduce Provenance Browser to NSIDC DAAC06/11 Instrument AMSR-E Sea Ice production in Ops09/11 Evaluate with AMSR-E Sea Ice user community02/12 Instrument other AMSR-E data streams02/12
3
Types of Provenance Information Lots of information already available, but scattered across multiple locations – Processing system configuration – Dataset and file level metadata – Processing history information – Quality assurance information – Software documentation (e.g., algorithm theoretical basis documents, release notes) – Data documentation (e.g., guide documents, README files) Instant Karma project aims to collate and organize information from multiple sources 3
4
Sea Ice Processing Flow and Dependencies 4 One day’s worth of Level-2A Tbs Delivered Algorithm Package (Sea Ice) Daily Processing Script Sea Ice Products 6.25 km Sea Ice Products 25 km Snow Melt Mask is a 5 day running averages that is updated and replaced daily. Masks generated yesterday are used for today’s products. Snow Depth on Sea Ice Product Multi-year Ice Mask Sea Ice Products 12.5 km Multi-year Ice Mask Snow Melt Mask Sea Ice Concentration Snow depth over Sea Ice Default Multi- year Ice Mask
5
Karma provenance collection and representation Karma analysis tool suite and portal = Optionally installed in future 5
6
AMSR-E daily processing workflow Workflow executes once per day of input files received Uses configuration files, data files, mask files Invokes processes, programs, algorithms Generates data files, images 6
7
Graph Viz client Subscriber Interface (provenance listener) Notification Ingester Interface Relational store Synchronous ingest Web service Query Service WS messenger Bus (future) WSM OPM 1.0 XML events OPM 1.0 RDF XML Axis 2other Instrumented apps Query client Karma 3.0 architecture Preserv client Preservati on object Prov Track lib Client Toolkit Ingester Implementer Interface Knowledge discovery: Inferencing, quality, completeness Database Setup script RESTful Service Axis 2 Prov Track lib Xregistry (Optional) Xregistry (Optional) XMC Cat metadata catalog (optional) 7
8
Karma Architecture Service Core – Bridge pattern for independent Ingester and IngesterImplementer implementation – Core components for ingesting notifications – Asynchronously shredding raw notifications to populate tables Axis2 Web Service Layer – API layer to ingest notifications from clients’ push – Also allows another layer to ingest notifications by pulling from message bus Axis2 Handlers – Gather information by intercepting SOAP message from host services – Minimal intrusiveness and lightweight instrumentation 8
9
Scavenging: for Stand-alone Provenance Collection Collects provenance using scavenging – Use existing collection mechanisms e.g., logging tool, auditing tool – Low burden on both users and programmers User AnnotationScavengingFull Instrumentation Application BurdenLow High Human BurdenHighLow Information QualityError rates and omissions lead to incomplete information Could have incompleteness Complete 9
10
Open Provenance Model (OPM) Karma is generic and stand-alone – Not coupled to any particular system Karma 3.0 Utilizes OPM v1.01 to represent provenance graph – OPM is a standard http://eprints.ecs.soton.ac.uk/16148/1/opm- v1.01.pdf – Enables provenance information exchange with other OPM-compliant tools 10
11
Types of Provenance Information 11
12
Types of Provenance Info (2) [1] launches – Whom: user ID or name – What: service e.g., service URI – When: launch context, time [2] consumes and [3] produces – File (e.g., file URL, owner) – Service: program, algorithm version [4] invokes – Invoking service – Invoked service – Parameters – Results/faults 12
13
Additional types of provenance Information Captured by Karma Execution Status – Terminated or Failed Transfer of Data – Sending of results – Receiving of results Workflow and Program Lifecycles Unknown Notifications – Stored as raw notifications Forthcoming: Spatial and temporal information, simple and complex data values, quality information 13
14
Partial provenance graph for sea ice product run of 14 July 2010 – attribute data is incomplete 14 Execut- ion end date Santa Daily Execut- ion start date Product name Execut- ion date Sea ice 12 mask Bright- ness file L3 25km sea ice product Processing_ty pe = sea ice; … Bright- ness file 12km sea ice product 6km sea ice product used WasControlledBy WasGener atedBy used URI = qqqq; Generation time = xxxx; file name = yyyy; URI = qqqq; Generation time = xxxx; file name = yyyy; Service URI = qqqq; Execution time = xxxx; version no. = yyyy; Service URI = qqqq; Execution time = xxxx; version no. = yyyy; Value = 14 July 2010 URI = gggg; Filename = yyyy; URI = gggg; Filename = yyyy;
15
Mask file Sea ice file Input files Provenance graph for sea ice product
16
Provenance used to explain difference in images 1/28/2010 and 2/09/2010 as change in sea mask due to missing data (underlined in blue in lower graph) 16
17
Example The provenance visualization is obtained using a simulated Karma provenance database and in this use case its aim is to help scientist identify the mask file being used and provenance information about mask file. The provenance graph gives the user annotated lineage about a sea ice data product: inputs required for its creation, the files created as a result of processing of the file. Provenance visualization in this form allows for deeper examination. – e. g. : for a recurring error, the scientist can view all related provenance information to get to source of error. 17
18
Ongoing work Better graph layout with detail for each data product and process used generating a sea ice product. Give nodes different shape and color depending on whether input node or generated output node etc. The user will be able to add annotations to edges by simply right clicking on them. Thus capturing semantic annotations to the existing causal dependencies. Forthcoming: Spatial and temporal information, simple and complex data values, quality information Provenance bundle archived with data or embedded in HDF file, in addition to Karma database 18
19
AMSR-E Provenance Use Cases Browse provenance graphs : convey rich information about final data product details – Spatial location, time of observation, algorithms employed, quality propagation Answer “Something isn’t right” question Example illustrated earlier: did not receive data for several days so mask can be inaccurate. Provenance “bundle” includes relevant science papers New communication satellites interfere with NASA satellites for certain channels Identify channels affected by RFI and channels used to generate each product 19
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.