Download presentation
Presentation is loading. Please wait.
Published byRandolph Mills Modified over 6 years ago
1
Virtual Geophysics Laboratory (VGL): Exploiting the Cloud and HPC
Ryan Fraser (CSIRO), Lesley Wyborn (GA), Richard Chopping (GA), Terry Rankine (CSIRO), Robert Woodcock (CSIRO) Minerals down under
2
Scientific workflow – Virtual Geophysics Laboratory (VGL)
Scientific Workflow Engine (or Virtual Laboratory) Automates and massively expands (GA) Geophysicists computational capacity via the Cloud Amazon – EC2 / S3 (and others using this interface) OpenStack Collaboration between CSIRO, GA, NCI, Monash, UQ, and ANU VGL is just a pretty face User Driven GUI Leverages data providers and cloud technologies to do all the heavy lifting Non-Restrictive Open-source Ask audience about their level of experience with the cloud (or general HPC) Explain the cloud (if required) Computation Storage
3
V(what)GL VEGL – Virtual Exploration Geophysics Laboratory
Done VEGL – Virtual Exploration Geophysics Laboratory One primary science collaboration One primary workflow One collection of geophysical data sets VGL - Virtual Geophysics Laboratory NeCTAR funded activity Collaboration with multiple partners (CSIRO, NCI, GA, UQ, Monash, ANU) Supporting multiple workflows New data sets New data types New Use – Not just exploration. What is enabling workflow?? Its not about you, its about the next person
4
Geophysics (as seen by a software dev)
Apply Mathematics Raw data Geophysics is taking physical measurements… Magnetism over an area Acceleration due to gravity over an area …Applying lots of mathematics… …to infer the structure of what is under the surface… It is not geology Samples are never taken, only measurements Crash course in geophysics Integration – where I tuned out But -- Where is all the ????
5
Our Geophysics Problem
Measurements coming from the field are ‘raw’ Varying spatial reference systems Noisy Artefacts from collection process This data needs processing From raw data to a data product Data products are valuable They will be re-used and referenced repeatedly Processing is a time consuming process Made worse by a purely manual workflow Scientific models usually require strict input conditions
6
The Past Compile raw data using proprietary FORTRAN
Also use software – Intrepid Transform to a regular grid using more software MATLAB, Intrepid, ER Mapper, ESRI ArcGIS, QGIS Crop data spatially to suit final data product eg: everything in Victoria Transform data into a file format that can be read by proprietary scientific code. This is usually done with some handwritten python or c There is no version control, code is often rewritten / redone Upload data to HPC Manually enter input parameters/start job
7
Let’s map it out… Hardcopy of data Intrepid MATLAB
Get handed field data Visualise data Transform to a regular grid Download results Crop data to area of interest Reformat data for processing SSH Client Configure job and start processing Upload data to NCI
8
There seems to be a problem…
Reproducibility – there is none What was the input of your model? What transformations occurred? It’s a manual process Time consuming Error prone Expensive Licensing costs If you are making big decisions – you want confidence in your evidence Data products can be interpretations - subjective – even more important to know its source Documentation
9
The Recent Past - Our solution
Virtual Exploration Geophysics Laboratory - Portal Provenance (“I hate this word”) All input data is saved and then published with the final data product VEGL automates portions of the workflow Allowing scientists to focus on science Built entirely on open source tools No licensing costs
10
Obligatory architecture diagram
CLOUD
11
Data Discovery All data discovered via registry
Data providers can be from anywhere in the world Easy to integrate from multiple organisations you can get more than geophysics information. You can get drillholes and sample information that could be used to constrain the geophysical inversion. There are advantages to having auscope infrastructure make finding all available data simpler.
12
Data Selection Data can be subset easily, reprojected and reformatted (work done by data providers)
13
Script Builder Processing of the data involves file format conversion
science!! Building blocks for common tasks Reading/Writing to cloud storage This is a “SHARED” script library -> shared improvements AND fixes Version controlled for governance etc
14
Job Monitoring Multiple jobs can be run and monitored separately
Results can be published (Provenance information)
15
Published provenance records
This final step persists the results AND how we got there * ticks data quality and reproducibility boxes
16
From this… Hardcopy of data Intrepid MATLAB Get handed field data
Visualise data Transform to a regular grid Download results Crop data to area of interest Reformat data for processing SSH Client Configure job and start processing Upload data to NCI
17
Virtual Geophysics Laboratory
…to this Virtual Geophysics Laboratory Select spatial bounds Build “science” from existing libraries Discover raw data Run job Collect and publish results
18
VEGL – Point-of-View benefits
User: Data all accessible in one place Same science but MUCH more efficient Bigger scale, quicker Repeatability Developer/Tech: Its concepts can be re-used for other scientific workflows It can produce actual scientific data products Is capable of integrating with any SISS (OGC) data provider The power lies with the underlying services, accessed using standardised protocols Its concepts can be re-used for other scientific workflows Is in the process of being deployed at GA It can produce actual scientific data products Is capable of integrating with any SISS data provider Or any provider that understands the OGC standards It’s built from many ‘generic’ components that can be repurposed Is just a pretty face The power lies with the underlying services These services are accessed using standardised protocols 1) Steps 1 and 2, best summed up as Data prep (includes synthesising phys prop data as we can do that from within vegl already, just would need to enable access to borehole data) 2) Lets us do the same science much more efficiently. Hard to ensure repeatability if all the data prep, selection on various parameters etc is done by individuals without tracking of what they last did. Previously keeping track of all of it came from written notes etc rather than it being recorded somewhere automatically. Also a big problem with the inversions is people correctly or incorrectly preparing data: lots of ways to get it wrong, a lot less ways to do it right.
19
NOW - Opportunities: Virtual Geophysics Laboratory
Collaboration with multiple partners – CSIRO, GA, NCI, Monash, UQ, ANU Supporting multiple workflows Model Registry (3D) New Scientific Codes – Underworld, eScript, UBC, Airborne EM inversion codes New data sets from GA: National Airborne Geophysical DB including Gravity, Radiometric, AEM, Magnetics New Use – Broad application.
20
Opportunities Exploiting the generic
Modularising the workflow for general scientific usage Repurposing for other use cases – nature hazards, climate prediction, etc Commercial uptake Integration with other VLs to achieve ultimate aim Supercomputing - Pawsey Centre, NCI Cloud – NeCTAR, NCI and commercial providers NeCTAR VGL funded – opportunity to use and/or repurpose the platform for other uses Essentially $20M capital expenditure for geosciences directed straight to HPC
21
Thank you and for more information:
CSIRO Earth Science & Resource Engineering Ryan Fraser Project Lead Phone: Web: siss.auscope.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.