The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.

Slides:



Advertisements
Similar presentations
Open repositories: value added services The Socionet example Sergey Parinov, CEMI RAS and euroCRIS.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
© 2006 Open Grid Forum The Astro Community and DCIs in Europe and the role of Astro-CG C. Vuerli - INAF.
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
CSIRO ASKAP Science Data Archive (CASDA) Project Kick-Off IM&T AND CASS Dan Miller| Project Manager 17 July 2014.
SEVENPRO – STREP KEG seminar, Prague, 8/November/2007 © SEVENPRO Consortium SEVENPRO – Semantic Virtual Engineering Environment for Product.
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS Ravi K Madduri University of Chicago and ANL.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Advanced Data Mining and Integration Research for Europe ADMIRE – Framework 7 ICT ADMIRE Overview European Commission 7 th.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Sean Making Metadata Work, ISKO London, 23 rd June 2014 Metadata for Research Objects 1.
Desired Quality Characteristics in Cloud Application Development Leah Riungu-Kalliosaari.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
Susana Sánchez Expósito Instituto de Astrofísica de Andalucía - CSIC Pablo Martin, Jose Enrique Ruiz, Lourdes Verdes-Montenegro, Julian Garrido, Raül Sirvent,
Commissioning the NOAO Data Management System Howard H. Lanning, Rob Seaman, Chris Smith (National Optical Astronomy Observatory, Data Products Program)
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
The CAVES Project Collaborative Analysis Versioning Environment System The CODESH Project COllaborative DEvelopment SHell Dimitri Bourilkov University.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
Integrated e-Infrastructure for Scientific Facilities Kerstin Kleese van Dam STFC- e-Science Centre Daresbury Laboratory
Preserving the Scientific Record: Preserving a Record of Environmental Change Matthew Mayernik National Center for Atmospheric Research Version 1.0 [Review.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Markus Dolensky, ESO Technical Lead The AVO Project Overview & Context ASTRO-WISE ((G)A)VO Meeting, Groningen, 06-May-2004 A number of slides are based.
Systems Design Approaches The Waterfall vs. Iterative Methodologies.
Wf4Ever: Preserving workflows as digital Research Objects EGI Community Forum 2012, Workflow Systems workshop Leibniz Supercomputing Centre, Münich,
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
E-Science for the SKA WF4Ever: Supporting Reuse and Reproducibility in Experimental Science Lourdes Verdes-Montenegro* AMIGA and Wf4Ever teams Instituto.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
FEA DRM Management Strategy Presented by : Mary McCaffery, US EPA.
Scientific Data Management - From the Lab to the Web Semantic Data Management Dagstuhl Seminar April 2012 José Manuel Gómez Pérez, iSOCO
CERN-PH-SFT-SPI August Ernesto Rivera Contents Context Automation Results To Do…
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Interoperability from the e-Science Perspective Yannis Ioannidis Univ. Of Athens and ATHENA Research Center
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
1 Curation and Characterization of Web Services Jose Enrique Ruiz October 23 rd IVOA Fall Interop Meeting - Sao Paolo.
Electronic labnotes Mari Wigham COMMIT/. Information WUR  Organising, sharing, finding and reusing data  Expertise in: ● Modelling data.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
1 COMPUTER SCIENCE DEPARTMENT COLORADO STATE UNIVERSITY 1/9/2008 SAXS Software.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
AN ORGANISATION FOR A NATIONAL EARTH SCIENCE INFRASTRUCTURE PROGRAM Virtual Geophysics Laboratory (VGL): Scientific workflows Exploiting the Cloud Josh.
Knowledge Modeling and Discovery. About Thetus Thetus develops knowledge modeling and discovery infrastructure software for customers who: Have high-value.
1 Pioneer Investments Legal and Compliance System Assessment Weekly Status Update June 23, 2005.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
The 10 Best Practices for Workflow Design BioVeL M6 Workshop Göteborg, May 10-11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft, Carole Goble.
E ARTHCUBE C ONCEPTUAL D ESIGN A Scalable Community Driven Architecture Overview PI:
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
1 This Changes Everything: Accelerating Scientific Discovery through High Performance Digital Infrastructure CANARIE’s Research Software.
1 Workflows Access and Massage VO Data José Enrique Ruiz on behalf of the Wf4Ever Team IVOA INTEROP SPRING MEETING 2013 HEIDELBERG, MAY16th 2013.
Virtual Research Environments as-a-Service Donatella Castelli CNR-ISTI EGI Conference 2016, 6-8 April.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Introduction: AstroGrid increases scientific research possibilities by enabling access to distributed astronomical data and information resources. AstroGrid.
WP1:Definition & Production of the GRDI2020 Roadmap Roadmap Report To address the Technological, Organizational and Policy problems which hinder the building.
Research Objects Preserving scientific data and methods Stian Soiland-Reyes, Khalid Belhajjame School of Computer Science, Univ of Manchester myGrid NIHBI.
Enhancements to Galaxy for delivering on NIH Commons
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
University of Chicago and ANL
Donatella Castelli CNR-ISTI
Joseph JaJa, Mike Smorul, and Sangchul Song
Leigh Grundhoefer Indiana University
e-Science for the Square Kilometre Array
Presentation transcript:

The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios Santander-Vela and Wf4Ever team. Instituto de Astrofísica de Andalucía – CSIC SHIWA Summer School. 3 July 2012

2 Summary » Introduction to AMIGA group » The astronomy challenge and context of Wf4Ever project » How Wf4ever tools can help the astronomers » Our astronomy use case

3 AMIGA Group Introduction to AMIGA group AMIGA Analysis of the interstellar Medium of Isolated GAlaxies An international collaboration coordinated from the IAA-CSIC P.I. Lourdes Verdes-Montenegro Statistical baseline of isolated galaxies to compare with the behavior of galaxies in denser environments Multi-λ study of ~1000 galaxies: Need of intensive and complex analysis of multidimensional data

4 Past (or current?) situation The astronomy challenge and context of Wf4Ever project Analysis Python Science Ready Data

5 The astronomy challenge and context of Wf4Ever project LHC – Tier 1 Antenna Correlation

6 A disruptive change in the methodology is needed The astronomy challenge and context of Wf4Ever project GRID Super Computer Cloud Analysis Tools for: Sharing Inspecting Visualizing Discovering Searching

7 The efficient use of the data and the production of reliable science. The astronomy challenge and context of Wf4Ever project » Keywords: efficient use of data and reliable science › Scientific Workflows: Enable automation and expose the flow of scientific methods Encourage best practices in packing the experiment Provide a way to share the experiment › But more is needed: Reusability, fundamental for incremental scientific development Reproducibility, key for reliable science  Preserve the data and the scientific method Workflow preservation

8 The astronomy challenge and context of Wf4Ever project Wf4Ever project Wf4Ever - Preservation of scientific workflows in data-intensive science

9 The aim of Wf4Ever The astronomy challenge and context of Wf4Ever project Technological infrastructure for the preservation and efficient retrieval and reuse of scientific workflows in a range of disciplines Encapsulate the scientific methodology (the workflows and all the associated information) in an artefact called Research Object. Archival, classification and indexing of the research object in scalable semantic repositories, providing advanced access and recommendation capabilities based on monitoring and metrics to evaluate similarities, decay, quality, stability, completeness. Creation of scientific communities to collaboratively share, reuse and evolve Research Objects stimulating the development of new scientific knowledge Use Cases: Astronomy (IAA) Genome-wide Analysis and Biobanking (LUMC)

10 -Inputs (files or reference) -Scripts -Workflows -Documentation -Web services Sharing the experiment How Wf4ever tools can help the astronomers scientists Completeness Rating Downloads 36 Citations [2] Re-used [1] Comments [4] Keywords [galaxies][catalogs] [Previous version | Next version]

11 How to import the experiment How Wf4ever tools can help the scientists INTEROPERABILITY

12 How to import the experiment How Wf4ever tools can help the scientists Extract !

13 Describing the experiment How Wf4ever tools can help the scientists

14 Visualizing the experiment How Wf4ever tools can help the scientists A good understanding of the experiment is key for the reusability and repurposability

15 Checking the live experiment How Wf4ever tools can help the scientists Stability: Changes made by different kind of users on the RO, can improve it or make it worse Completeness: It contains all the resources needed to be run, published, shared or repeated Runnable Publishable Shareable Repeatable Service up, software working, etc. The output can be reproduced Enough annotation. Service up, software working and all well commented

16 Checking the published experiment How Wf4ever tools can help the scientists Decay: The health of the RO: state of the services (up or down), of the applications (updated or deprecated), permissions to access the input data Decay Information Last check was performed 2 days ago and returned one error: The service SDSS-DR7, needed by the workflow Calculate_galaxy_distances is down Check now Try to repair Tracking: Rating by other users, who used the RO, comments, etc. Rating Downloads 36 Citations [2] Re-used [1] Comments [4]

17 Discovering an experiment How Wf4ever tools can help the scientists Search by keywords Past ratings Ratings of similar users Past ratings Ratings of similar users

18 Functionalities of the RO and tools How Wf4ever tools can help the scientists REUSABILITY AND REPURPOSABILITY Annotations for the description of the whole and its components Visualization of the relationships between the components Versioning of the whole or its components SHARING Restricted access on data and processes Ensuring authorship and allow to RO to be cited REPRODUCIBILITY Completeness checking Decay monitoring and notification, reacting to decay Execution and interoperability support DISCOVERY Semantic discovery of ROs, processes, web services Recommendation capabilities.

19 Why I am here as student Our astronomy Use Case Deploy web-services-based workflows for analysis of multidimensional data on heterogeneous e-infrastructures ASKAP Cubes Prof. Kevin Vinsen Standards for publishing and accessing astronomical data (Service Oriented Architecture) Data provider  analysis service provider Cloud Super Computer GRID VIRTUAL OBSERVATORY

20 Susana Sánchez: Questions