Virtual Geophysics Laboratory (VGL): Exploiting the Cloud and HPC

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Strategies for solving scientific problems using computers.
MINERALS DOWN UNDER Using Spatial Data Infrastructure (SDI) to: enable interoperable data exchange & usage to deliver corporate decisions Ryan Fraser 19.
Ryan Fraser (CSIRO), Lesley Wyborn (GA), Richard Chopping (GA), Terry Rankine (CSIRO), Robert Woodcock (CSIRO) MINERALS DOWN UNDER Virtual Geophysics Laboratory.
Web Accessible Virtual Research Environment for Ecosystem Science Community Presentation by Siddeswara Guru.
Integrated Scientific Workflow Management for the Emulab Network Testbed Eric Eide, Leigh Stoller, Tim Stack, Juliana Freire, and Jay Lepreau and Jay Lepreau.
Virtual Geophysics Laboratory (VGL) VGL v1.1 Launch Ryan Fraser, Terry Rankine, Joshua Vote, Lesley Wyborn, Ben Evans, Robert Woodcock February 2013 CSIRO.
Virtual Geophysics Laboratory (VGL) VGL v1.2 NeCTAR Project Close R.Fraser, T.Rankine, J.Vote, L.Wyborn, B.Evans, R.Woodcock, C.Kemp July 2013 CSIRO |
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Virtual Geophysics Laboratory Exploiting the Cloud and Empowering Geophysicists Ryan Fraser, Terry Rankine, Lesley Wyborn, Joshua Vote, Ben Evans. Presented.
Ryan Fraser, Nicholas Carr Connecting RDSI, NeCTAR and ANDS via Provenance - VHIRL.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
National Earth Science Infrastructure Program AuScope Limited Headquarters School of Earth Sciences University of Melbourne Victoria 3010 Tel
Configuration Management (CM)
Linking AuScope to the broader minerals industry value chain Jonathan Law, Robert Woodcock, Ryan Fraser, Terry Rankine, Guillaume Duclaux.
AN ORGANISATION FOR A NATIONAL EARTH SCIENCE INFRASTRUCTURE PROGRAM The Spatial Information Services Stack – infrastructure for the AuScope Community Earth.
Virtual Geophysics Laboratory Scientific workflows exploiting the cloud Ryan Fraser, Terry Rankine, Lesley Wyborn, Joshua Vote, Ben Evans... Presented.
AN ORGANISATION FOR A NATIONAL EARTH SCIENCE INFRASTRUCTURE PROGRAM Virtual Geophysics Laboratory (VGL): Scientific workflows Exploiting the Cloud Josh.
National Spatial Data Infrastructure The Spatial Information Services Stack Dr Robert Woodcock.
Virtual Laboratories VGL and Friends R.Fraser, T.Rankine, J.Vote, R.Woodcock AuScope Grid Roadshow 2014 CSIRO | MINERAL RESOURCES FLAGSHIP.
Digital Earth Communities GEOSS Interoperability for Weather Ocean and Water GEOSS Common Infrastructure Evolution Roberto Cossu ESA
Future Directions MINERALS DOWN UNDER SISS – Spatial Information Services Stack Ryan Fraser| Project Lead 20 th March 2012.
1 Ilkay ALTINTAS - July 24th, 2007 Ilkay ALTINTAS Director, Scientific Workflow Automation Technologies Laboratory San Diego Supercomputer Center, UCSD.
Lessons from SEEGrid/AuScope Grid Bruce Simons GeoScience Victoria.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
AuScope Spatial Data Infrastructure Supporting Earth Science Dr Robert Woodcock CSIRO.
Technical Session Virtual Geophysics Laboratory MINERAL RESOURCES Josh Vote | Software Developer September 2014.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
Virtual Geophysics Laboratory (VGL) VGL v1.2 NeCTAR Project Close Ryan Fraser, Terry Rankine, Joshua Vote, Lesley Wyborn, Ben Evans, Robert Woodcock July.
Mantid Stakeholder Review Nick Draper 01/11/2007.
AN ORGANISATION FOR A NATIONAL EARTH SCIENCE INFRASTRUCTURE PROGRAM Virtual Geophysics Laboratory (VGL): Scientific workflows Exploiting the Cloud Josh.
AN ORGANISATION FOR A NATIONAL EARTH SCIENCE INFRASTRUCTURE PROGRAM The NCRIS AuScope Community Earth Model Bruce Simons.
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
Web Technologies Lecture 13 Introduction to cloud computing.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
AN ORGANISATION FOR A NATIONAL EARTH SCIENCE INFRASTRUCTURE PROGRAM “Building Clients for the AuScope Spatial Information Services Stack (SiSS)” AuScope.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Nci.org.au © National Computational Infrastructure 2016 Virtual Laboratories in Australia Lesley Wyborn (NCI) With contributions from:
Enabling Digital Earth by focussing on ‘accessibility’ rather than ‘delivery’. Ryan Fraser CSIRO.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Accessing the VI-SEEM infrastructure
USGS EROS LCMAP System Status Briefing for CEOS
Laboratory Information Management Systems (LIMS)
Cloud EO Big Data processing
Customer Support Strategic Pillars
14 Compilers, Interpreters and Debuggers
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
MIRACLE Cloud-based reproducible data analysis and visualization for outputs of agent-based models Xiongbing Jin, Kirsten Robinson, Allen Lee, Gary Polhill,
Tools and Services Workshop
University of Chicago and ANL
Joslynn Lee – Data Science Educator
CyVerse Discovery Environment
Infrastructure Orchestration to Optimize Testing
Data Flows in ACTRIS: Considerations for Planning the Future
Recap: introduction to e-science
A Web-enabled Approach for generating data processors
Demonstrator Stuart Girvan – Geoscience Australia
Introduction to Cloud Computing
Project tracking system for the structure solution software pipeline
VHIRL: Virtual Hazard Impact and Risk Laboratory – “A reuse story”
Code Analysis, Repository and Modelling for e-Neuroscience
Dtk-tools Benoit Raybaud, Research Software Manager.
Visualizing the Attracting Structures Results and Conclusions
Overview of Workflows: Why Use Them?
The Interoperable Watersheds Network
Code Analysis, Repository and Modelling for e-Neuroscience
Final Review 27th March Final Review 27th March 2019.
Presentation transcript:

Virtual Geophysics Laboratory (VGL): Exploiting the Cloud and HPC Ryan Fraser (CSIRO), Lesley Wyborn (GA), Richard Chopping (GA), Terry Rankine (CSIRO), Robert Woodcock (CSIRO) Minerals down under

Scientific workflow – Virtual Geophysics Laboratory (VGL) Scientific Workflow Engine (or Virtual Laboratory) Automates and massively expands (GA) Geophysicists computational capacity via the Cloud Amazon – EC2 / S3 (and others using this interface) OpenStack Collaboration between CSIRO, GA, NCI, Monash, UQ, and ANU VGL is just a pretty face User Driven GUI Leverages data providers and cloud technologies to do all the heavy lifting Non-Restrictive Open-source Ask audience about their level of experience with the cloud (or general HPC) Explain the cloud (if required) Computation Storage

V(what)GL VEGL – Virtual Exploration Geophysics Laboratory Done VEGL – Virtual Exploration Geophysics Laboratory One primary science collaboration One primary workflow One collection of geophysical data sets VGL - Virtual Geophysics Laboratory NeCTAR funded activity Collaboration with multiple partners (CSIRO, NCI, GA, UQ, Monash, ANU) Supporting multiple workflows New data sets New data types New Use – Not just exploration. What is enabling workflow?? Its not about you, its about the next person

Geophysics (as seen by a software dev) Apply Mathematics Raw data Geophysics is taking physical measurements… Magnetism over an area Acceleration due to gravity over an area …Applying lots of mathematics… …to infer the structure of what is under the surface… It is not geology Samples are never taken, only measurements Crash course in geophysics Integration – where I tuned out But -- Where is all the ????

Our Geophysics Problem Measurements coming from the field are ‘raw’ Varying spatial reference systems Noisy Artefacts from collection process This data needs processing From raw data to a data product Data products are valuable They will be re-used and referenced repeatedly Processing is a time consuming process Made worse by a purely manual workflow Scientific models usually require strict input conditions

The Past Compile raw data using proprietary FORTRAN Also use software – Intrepid Transform to a regular grid using more software MATLAB, Intrepid, ER Mapper, ESRI ArcGIS, QGIS Crop data spatially to suit final data product eg: everything in Victoria Transform data into a file format that can be read by proprietary scientific code. This is usually done with some handwritten python or c There is no version control, code is often rewritten / redone Upload data to HPC Manually enter input parameters/start job

Let’s map it out… Hardcopy of data Intrepid MATLAB Get handed field data Visualise data Transform to a regular grid Download results Crop data to area of interest Reformat data for processing SSH Client Configure job and start processing Upload data to NCI

There seems to be a problem… Reproducibility – there is none What was the input of your model? What transformations occurred? It’s a manual process Time consuming Error prone Expensive Licensing costs If you are making big decisions – you want confidence in your evidence Data products can be interpretations - subjective – even more important to know its source Documentation

The Recent Past - Our solution Virtual Exploration Geophysics Laboratory - http://siss1.anu.edu.au/VEGL- Portal Provenance (“I hate this word”) All input data is saved and then published with the final data product VEGL automates portions of the workflow Allowing scientists to focus on science Built entirely on open source tools No licensing costs

Obligatory architecture diagram CLOUD

Data Discovery All data discovered via registry Data providers can be from anywhere in the world Easy to integrate from multiple organisations you can get more than geophysics information. You can get drillholes and sample information that could be used to constrain the geophysical inversion. There are advantages to having auscope infrastructure make finding all available data simpler.

Data Selection Data can be subset easily, reprojected and reformatted (work done by data providers)

Script Builder Processing of the data involves file format conversion science!! Building blocks for common tasks Reading/Writing to cloud storage This is a “SHARED” script library -> shared improvements AND fixes Version controlled for governance etc

Job Monitoring Multiple jobs can be run and monitored separately Results can be published (Provenance information)

Published provenance records This final step persists the results AND how we got there * ticks data quality and reproducibility boxes

From this… Hardcopy of data Intrepid MATLAB Get handed field data Visualise data Transform to a regular grid Download results Crop data to area of interest Reformat data for processing SSH Client Configure job and start processing Upload data to NCI

Virtual Geophysics Laboratory …to this Virtual Geophysics Laboratory Select spatial bounds Build “science” from existing libraries Discover raw data Run job Collect and publish results

VEGL – Point-of-View benefits User: Data all accessible in one place Same science but MUCH more efficient Bigger scale, quicker Repeatability Developer/Tech: Its concepts can be re-used for other scientific workflows It can produce actual scientific data products Is capable of integrating with any SISS (OGC) data provider The power lies with the underlying services, accessed using standardised protocols Its concepts can be re-used for other scientific workflows Is in the process of being deployed at GA It can produce actual scientific data products Is capable of integrating with any SISS data provider Or any provider that understands the OGC standards It’s built from many ‘generic’ components that can be repurposed Is just a pretty face The power lies with the underlying services These services are accessed using standardised protocols 1) Steps 1 and 2, best summed up as Data prep (includes synthesising phys prop data as we can do that from within vegl already, just would need to enable access to borehole data) 2) Lets us do the same science much more efficiently. Hard to ensure repeatability if all the data prep, selection on various parameters etc is done by individuals without tracking of what they last did. Previously keeping track of all of it came from written notes etc rather than it being recorded somewhere automatically. Also a big problem with the inversions is people correctly or incorrectly preparing data: lots of ways to get it wrong, a lot less ways to do it right.

NOW - Opportunities: Virtual Geophysics Laboratory Collaboration with multiple partners – CSIRO, GA, NCI, Monash, UQ, ANU Supporting multiple workflows Model Registry (3D) New Scientific Codes – Underworld, eScript, UBC, Airborne EM inversion codes New data sets from GA: National Airborne Geophysical DB including Gravity, Radiometric, AEM, Magnetics New Use – Broad application.

Opportunities Exploiting the generic Modularising the workflow for general scientific usage Repurposing for other use cases – nature hazards, climate prediction, etc Commercial uptake Integration with other VLs to achieve ultimate aim Supercomputing - Pawsey Centre, NCI Cloud – NeCTAR, NCI and commercial providers NeCTAR VGL funded – opportunity to use and/or repurpose the platform for other uses Essentially $20M capital expenditure for geosciences directed straight to HPC

Thank you and for more information: http://siss1.anu.edu.au/VEGL-Portal/ https://twiki.auscope.org/wiki/Grid/VEGLPortalDevelopment CSIRO Earth Science & Resource Engineering Ryan Fraser Project Lead Phone: +61 8 6436 8760 Email: ryan.fraser@csiro.au Web: www.csiro.au siss.auscope.org