Instant Karma: Accessing Provenance Information for AMSR-E Science Data Products AMSR-E Science Team Meeting 29 June 2011 UAHuntsville The University of Alabama in Huntsville
Instant Karma Overview 229 June 2011 Data Center Operations Provenance Research Earth Science Instant Karma Project Collaboration among – AMSR-E SIPS (MSFC Earth Science Office and UAHuntsville ITSC) – Provenance researchers at Indiana University’s Data to Insight Center – AMSR-E Sea Ice science team (GSFC) AMSR-E Science Team Meeting Asheville, NC Primary goal is to improve the collection, preservation, utility and dissemination of provenance information within the NASA Earth Science community – Using Karma provenance tool – Initial focus on Sea Ice processing
Team Members NameOrganizationRole Michael GoodmanNASA MSFCPrincipal Investigator Thorsten Markus Don Cavalieri Alvaro Ivanoff NASA GSFCCo-Investigator – Science Team Science Algorithm Developer Science Software Developer Helen Conover Bruce Beaumont Anurag Bhaskar Ajinkya Kulkarni Michael McEniry Kathryn Regner Cara Stein UAHuntsville Information Technology & Systems Center Co-Investigator – Project Manager AMSR-E SIPS Implementation Student, Metadata Utilities Provenance Browser Systems Administration AMSR-E SIPS Systems Engineer Provenance Browser Beth Plale Mehmet Aktas Scott Jensen Harsh Joshi Robert Ping Prajakta Purohit Yuan Luo Indiana University Data to Insight Center Co-Investigator – Karma System Karma Database and Services Provenance Collection Services Project Management Provenance Collection Services Karma System Monitoring Dawn ConwayUAHuntsville Earth System Science AMSR-E Science Metadata Consultant 29 June AMSR-E Science Team Meeting Asheville, NC
Key Responsibility at Data Creation Archivists are traditionally responsible for curation (i.e., adding metadata and provenance) but not present at creation of scientific data. Moment of creation is where most knowledge about product is present. To be effective, requires tools in earliest stages of data’s life that help with preservation. 29 June Diagram from Berman et al. “Sustainable Economics for a Digital Planet” AMSR-E Science Team Meeting Asheville, NC Create Long Term Access Archive Distribute Preservation Action
Instant Karma Provenance Research How to’s of provenance capture are becoming better understood Collecting right (and only “right”) provenance is a harder challenge, and one we hope to make progress on in this project Delivering on representation and use case support simultaneously for today’s users and next generation uses is ongoing challenge – Sea ice research, climate, classroom, informatics June AMSR-E Science Team Meeting Asheville, NC
Types of Provenance Information Data lineage: product generation “recipe” (data inputs, software and hardware) Additional knowledge about science algorithms, instrument variations, etc. Lots of information already available, but scattered across multiple locations – Processing system configuration – Dataset and file level metadata – Processing history information – Quality assurance information – Software documentation (e.g., algorithm theoretical basis documents, release notes) – Data documentation (e.g., guide documents, README files) Instant Karma project aims to collate and organize information from multiple sources 629 June 2011 AMSR-E Science Team Meeting Asheville, NC
SIPS-GHRC Processing Architecture Algorithm Packages from the Science Team and Team Lead of the Science Computing Facility Processing automation controlled by SIPS scripts – Pass processing is data driven – L3 product generation is scheduled after nominal availability of input products 7 L2A Brightness Temps L2B Ocean L2B Land L2B Rain Pass Script Ocean Land Snow Sea Ice Daily Script Snow Pentad Script Ocean Weekly Script Ocean Rain Snow Monthly Script Input Data and Metadata File(s) Ancillary File(s) Science Data Metadata Processing History Quality Assurance Browse imagery Custom subsets Delivered Algorithm Package Control Script 29 June 2011 AMSR-E Science Team Meeting Asheville, NC
Sea Ice Product Generation 8 One day’s worth of Level-2A Tbs Ice Mask Snow Melt Mask Sea Ice Products (6, 12, and 25 km) Metadata Processing History Quality Assurance Browse imagery Custom subsets Delivered Algorithm Package Daily Processing Script Sea Ice Algorithms C and FORTRAN Provided by Science Team C and FORTRAN Provided by Science Team Perl with C Provided by SIPS Perl with C Provided by SIPS Bourne shell Provided by SCF Bourne shell Provided by SCF Hardware Platform Operating System, Compilers Software Libraries 29 June 2011 AMSR-E Science Team Meeting Asheville, NC
25 km grid resolution; polar stereographic grid 12.5 km grid resolution; polar stereographic grid 6.25 km grid resolution; polar stereographic grid Brightness temperatures 6, 10, 18, 23, 36, 89 GHz 18, 23, 36, 89 GHz89 GHz Sea ice concentration NASA Team 2 (NT2) Bootstrap – NT2 NASA Team 2 (NT2) Bootstrap – NT2 Snow depth on sea ice 5-day product Except for the snow depth product, all products daily averages using (a) ascending orbits only, (b) descending orbits only, (c) all orbits. AMSR-E Standard Sea Ice Products 29 June AMSR-E Science Team Meeting Asheville, NC
Perspectives on Processing Lineage 1029 June 2011 Data Processing ViewProvenance View AMSR-E Science Team Meeting Asheville, NC
AMSR-E Provenance Use Cases Browse provenance graphs : convey rich information about final data granule details [Use case 1] – Spatial location, time of observation, algorithms employed, input data and ancillary files – Provenance bundle to include pointers to relevant documentation Answer “Something isn’t right” question [Use case 1 variant] -E.g., did not receive data for several days so snow melt mask may be inaccurate. Compare two data granules [Use case 2] -Query system to get list of provenance differences (e.g., versions of software, number and versions of input files) General provenance graph for a given science process, e.g., Sea Ice processing [Use case 3] -Current algorithms and versions, nominal number and versions of input files, pointers to relevant documentation 1129 June 2011 AMSR-E Science Team Meeting Asheville, NC
Karma Provenance Repository Provenance Collection, Storage and Access Instrumented AMSR-E SIPS processing workflow for sea ice in the testbed environment. – Provenance information is captured in experiment run log files – Log files are parsed to generate provenance notifications. – These notifications are then imported into the Karma database. – The Karma Service Query API is used to generate OPM- compatible XML graphs, each corresponding to a processing run. 29 June Ocean Weekly Script Ocean Rain Snow Monthly Script Snow Pentad Script Ocean Land Snow Sea Ice Daily Script L2A Brightness Temps Pass Script L2B Ocean L2B Land L2B Rain Provenance Collection Query API Processing Testbed AMSR-E Science Team Meeting Asheville, NC Provenance Browser
Additional Science-Relevant Provenance and Context Information Harvesting granule information from ECS metadata Also recording processing location associated with each data granule Working with AMSR-E Science Computing Facility to identify algorithm and data product information – Algorithm versions and descriptions – Parameters and data fields – Ancillary files – Flag values and explanations – Pointers to full documentation Defining how to harvest, transmit and display this information 29 June AMSR-E Science Team Meeting Asheville, NC
Provenance Browser for AMSR-E Products Initial Prototype – Interactive browsing of provenance graphs and information about workflows and final data products – Uses Karma Service Query API to extract provenance graphs from the Karma provenance repository. 29 June AMSR-E Science Team Meeting Asheville, NC
Instant Karma Major Milestones 29 June AMSR-E Science Team Meeting Asheville, NC
Project Schedule 29 June 2011 AMSR-E Science Team Meeting Asheville, NC 16
Near Term Plans Begin routine daily Sea Ice processing with provenance collection in Testbed Enhance context metadata available to Karma system Refine Provenance Browser and Karma Query API Develop joint Karma project / SIPS approach for instrumenting additional processing workflows Work with other NASA Earth science provenance projects to develop common approaches to providing provenance information with science data files Begin to engage user community – Present to AMSR-E Science Team – Outreach to Sea Ice community at GSFC – Work with NSIDC DAAC to identify friendly users for beta testing 1729 June 2011 AMSR-E Science Team Meeting Asheville, NC
Conclusions Instant Karma project is on track to incorporate provenance capture into the AMSR-E SIPS Ready to begin testing the AMSR-E Provenance Browser with the wider user community Continuing to work with NASA ESDIS, ESDSWG and ESIP Federation in defining common provenance practices across NASA data systems 29 June 2011 AMSR-E Science Team Meeting Asheville, NC 18
Instant Karma Project Contacts Instant Karma Project Web Site – – AMSR-E Provenance Browser – Contacts – Michael Goodman – Helen Conover – Beth Plale – Thorsten Markus 29 June 2011 AMSR-E Science Team Meeting Asheville, NC 19
Acronyms AMSR-EAdvanced Microwave Scanning Radiometer for EOS APIApplication Programming Interface DAACDistributed Active Archive Center ECSEOSDIS Core System EOSEarth Observing System EOSDISEarth Observing System Data and Information System ESDSWGEarth Science Data Systems Working Group ESIPEarth Science Information Partners GSFCGoddard Space Flight Center ITSCInformation Technology & Systems Center MSFCMarshall Space Flight Center MODISModerate Resolution Imaging Spectroradiometer NASANational Aeronautics and Space Administration NSIDCNational Snow and Ice Data Center OPMOpen Provenance Model SCFScience Computing Facility SIPSScience Investigator-led Processing System UAHuntsvilleUniversity of Alabama in Huntsville XMLeXtensible Markup Language 29 June AMSR-E Science Team Meeting Asheville, NC