Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
SAN DIEGO SUPERCOMPUTER CENTER Choonhan Youn Viswanath Nandigam, Nancy Wilkins-Diehr, Chaitan Baru San Diego Supercomputer Center, University of California,
As computer network experiments increase in complexity and size, it becomes increasingly difficult to fully understand the circumstances under which a.
A Java Architecture for the Internet of Things Noel Poore, Architect Pete St. Pierre, Product Manager Java Platform Group, Internet of Things September.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
The Experience Factory May 2004 Leonardo Vaccaro.
1 genSpace: Community- Driven Knowledge Sharing for Biological Scientists Gail Kaiser’s Programming Systems Lab Columbia University Computer Science.
GenSpace: Exploring Social Networking Metaphors for Knowledge Sharing and Scientific Collaborative Work Chris Murphy, Swapneel Sheth, Gail Kaiser, Lauren.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
Interpret Application Specifications
Lecture Nine Database Planning, Design, and Administration
Sponsored by the National Science Foundation netKarma Spiral 2 Year-end Project Review Indiana University Beth Plale (PI) School of Informatics and Computing.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Microsoft Access Ervin Ha.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
Discussion and conclusion The OGC SOS describes a global standard for storing and recalling sensor data and the associated metadata. The standard covers.
© Paradigm Publishing, Inc. 5-1 Chapter 5 Application Software Chapter 5 Application Software.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Computers Are Your Future Tenth Edition Chapter 12: Databases & Information Systems Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall1.
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
, Increasing Discoverability and Accessibility of NASA Atmospheric Science Data Center (ASDC) Data Products with GIS Technology ASDC Introduction The Atmospheric.
, Implementing GIS for Expanded Data Accessibility and Discoverability ASDC Introduction The Atmospheric Science Data Center (ASDC) at NASA Langley Research.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
Software for Science Gateways: Open Grid Computing Environments Marlon Pierce, Suresh Marru Pervasive Technology Institute Indiana University
Presented by Abirami Poonkundran.  Introduction  Current Work  Current Tools  Solution  Tesseract  Tesseract Usage Scenarios  Information Flow.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
material assembled from the web pages at
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Through the development of advanced middleware, Grid computing has evolved to a mature technology in which scientists and researchers can leverage to gain.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Phase II Additions to LSG Search capability to Gene Browser –Though GUI in Gene Browser BLAST plugin that invokes remote EBI BLAST service Working set.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
EUROGRID – An Integrated User–Friendly Grid System Hans–Christian Hoppe, Karl Solchenbach A Member of the ExperTeam Group Pallas GmbH Hermülheimer Straße.
A Practical Approach to Metadata Management Mark Jessop Prof. Jim Austin University of York.
Switch off your Mobiles Phones or Change Profile to Silent Mode.
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana Cardiff University, UK.
A Demonstration of Collaborative Web Services and Peer-to-Peer Grids Minjun Wang Department of Electrical Engineering and Computer Science Syracuse University,
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
Don’t Duck Metadata March 2005 Introducing Setting Up a Clearinghouse Node Topic: Introduction to Setting Up a Clearinghouse Node Objective: By.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Partnerships in Innovation: Serving a Networked Nation Grid Technologies: Foundations for Preservation Environments Portals for managing user interactions.
XMC Cat: An Adaptive Catalog for Scientific Metadata Scott Jensen and Beth Plale School of Informatics and Computing Indiana University-Bloomington Current.
Chapter 2 Database Environment.
OGCE Workflow and LEAD Overview Suresh Marru, Marlon Pierce September 2009.
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Business System Development
Database Management:.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
University of Technology
Chapter 2 Database Environment Pearson Education © 2009.
Database Environment Transparencies
Code Analysis, Repository and Modelling for e-Neuroscience
Introduction to DBMS Purpose of Database Systems View of Data
Code Analysis, Repository and Modelling for e-Neuroscience
Data Management Components for a Research Data Archive
Presentation transcript:

Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE Ames Laboratory 1

Outline Motivation Background: Scientific workflow system Provenance capturing for scientific applications Previous work on data management system for MFDn LCCI upstream codes and MFDn Workflow system in LCCI Potential provenance integration for MFDn Conclusion 2 2

Motivation We want to increase nuclear physics scientific discovery in several ways : Sustainability : to ensure the codes, runs, results obtainable by the future users. Provenance : to be able to get the information of the existing large run in a form which the new method can be compared. Easy of entry : to lower the learning curve for getting young people involved, to avoid the inappropriate usage on super computers. A trustful provenance system is needed to record a multi-step data generation process, which allows to reuse the data product and data setting. 3

What is a Scientific Workflow? A workflow consists of a sequence of connected steps. A Scientific Workflow system is designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, in a scientific application. Domain specified workflow systems are: Linked Environments for Atmospheric Discovery (LEAD) Project ( GridChem Computational Chemistry Grid ( 4

Major components in a workflow system Compute Hosts – where the application will be executed Application Services – Web service wrappers to command line science applications Service Hosts – where application services, service factory and registry service are running Scientific Workflows – tasks graphs of a combination of application services Application Provider – who constructs the application graphs End User – who is interested on the outcome of the workflow 5

General Workflow System Architecture 6

Tools – OGCE Workflow Suite A domain independent workflow suite facilitates users to securely share their applications as web services and construct workflows with these services. Suite includes a toolkit to Wrap tasks as web service A service registry A workflow composition – xBaya Enactment and monitoring GUI These tools are supported through Open Grid Computing Environments (OGCE) project ( 7

What is the Provenance for a scientific application? The provenance of a scientific application is when a collection was created, by whom, where and more experimental related information. Provenance can identify event causality; enable broader forms of sharing, reuse, and long-term preservation of scientific data; can be used to attribute ownership; determine the quality of a particular data set ; insure the trust of the data set. 8

Provenance Tools Karma is a provenance capturing system that originally worked in cooperation with the GPEL workflow system in LEAD gateway. Karma is developed at Indiana University from the mid-2000’s, and it evolves to be a standalone system, independent of any workflow system and free of the workflow systems have. Karma supports three ways to collect the provenance: User annotation Scavenging Full Provenance Instrumentation 9

Logical architecture of a Provenance system 10

Projects using Karma Instant Karma: applying a proven provenance tool to NASA’s AMSR-E (Advanced Microwave Scanning Radiometer – Earth Observing System) data production stream. Collect and disseminate provenance of AMSR-E standard data products, initially focusing on Sea Ice. ( ser/view). netKarma: exploring networks of the future ( 11

A Data Management System for MFDn An efficient tool for retrieving output from Ab- Initio CI calculations, the first step towards the provenance capturing Record not only results, but also now, when and where those results are obtained. Record meta-data of every run in database Data: results from LCCI ab-initio codes, typically stored on platforms where runs are performed Meta-data: key information about each run, consolidated and formatted in the.info file 12

System Design and Components 13 Remote supercomutpers Local server /projects/mfdn.info DBM code Database on Local server Web Server Client (www user) 1. DB Manageer: parses the mfdn.info file and inserts the run record to Database 2.Web based front end: searches and lists existing run 3.DB Server: stores all the related metadata for each of MFDn run

Key Contributions It addresses provenance issue for computational nuclear physics community. It records the experimental information for a reproducible research in which the computational experiments can be run by others. The system is online since 08/2010 at A paper is published at HPC 2011, Boston, MA 14

Current Ab initio NCSM Flowcharts 15

From a Data Management System to a Workflow System Using a database to store the provenance information for each participated software components. Workflow constructed with those components can also be recorded in a data management system. Provenance information will be extended to not only including the activities of individual component but also the combinations of the provenance information of the components. 16

Demo workflow on upstream codes and MFDn M2 Gosc Mfdn Heff M2 A B C D E 17

Detailed data flow between components Heff M2 Gosc Mfdn M2v9,.dat/M ZRO.SAV.gz mfdn.dat Heff_LS.dat gosc.dat Bare_inputRC M_matrices Veff_outputRCM _matrice Heff_LS.out m2.out0 m2tout0 gosc.out Output_altfmt _vxx mfdn.out 18

Turn into a Workflow Mode 19

Provenance Capturing in MFDn Based on existing mfdn.info file, a Karma adaptor will be written to convert the two column text file into Open provenance model compliant database schema Other related tools will be used to retrieve the data later. To record the provenance for both upstream and MFDn codes including the dependency between those codes, Karma tool will be used with OGCE toolkit. 20

Conclusion A workflow approach is adapted to LCCI code to demonstrate the feasibility of the project. It is first step towards the fully provenance capturing in nuclear physics community. An introduction of the scientific workflow and provenance capturing system is made to the community. 21