Reproducible Groundwater Science Workflows for the Future: A case for Texas Groundwater Availability Models Nalbeat “Sonny” Kwon, M.S. The University of.

Slides:



Advertisements
Similar presentations
Defining Decision Support System
Advertisements

Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
James Martin CpE 691, Spring 2010 February 11, 2010.
CIM2564 Introduction to Development Frameworks 1 Overview of a Development Framework Topic 1.
Computer Basics. What is a Computer? A computer is a machine that can take inputs from the user, process that information, store that information as needed.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 4-1 Chapter 4 Modeling and Analysis Turban,
1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Presenter: Joshan V John Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan & Tien N. Nguyen Iowa State University, USA Instructor: Christoph Csallner 1 Joshan.
Introduction to Systems Analysis and Design
GUI for Computer Architecture Simulation Technical Problem Currently there are tools to aid in the study of computer architecture, but they lack a flexible.
Chapter 3 Software Two major types of software
Paper on Best implemented scientific concept for E-Governance projects Virtual Machine By Nitin V. Choudhari, DIO,NIC,Akola.
Introduction to Computer Aided Process Planning
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
S. Shumilov – Zürich Analytical Visualization Framework - a visual data processing and knowledge discovery system Ivan Denisovich, Serge Shumilov Department.
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
Content Challenges for Open Government Dale Waldt Sr. Analyst / Consultant
What’s GAMs got to do with DFC/MAGs Cindy Ridgeway, P.G. Contract Manager and Manager Groundwater Availability Modeling Texas Water Development Board.
Ansible and Ansible Tower 1 A simple IT automation platform November 2015 Leandro Fernandez and Blaž Zupanc.
The Records Management Vision The Records Management Vision: Our Journey Towards Solutions for Everyday Life Ronald G. Smith, CRM Records and Information.
Accessing the VI-SEEM infrastructure
Project Cost Management
Data Platform and Analytics Foundational Training
Lecture 1-Part 2: Operating-System Structures
Information Systems Development
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
Microcontroller Enhancement Design Project
Decision Support Systems
Dag Toppe Larsen UiB/CERN CERN,
Modern Systems Analysis and Design Third Edition
Computing models, facilities, distributed computing
The Marshall University Experience with Implementing Project Server 2003 August 9, 2005 Presented by: Chuck Elliott, M.S. Associate Director, Customer.
Working Group 4 Facilities and Technologies
Dag Toppe Larsen UiB/CERN CERN,
Jarek Nabrzyski Director, Center for Research Computing
Modern Systems Analysis and Design Third Edition
WLCG: TDR for HL-LHC Ian Bird LHCC Referees’ meting CERN, 9th May 2017.
Initial Adaptation of the Advanced Regional Prediction System to the Alliance Environmental Hydrology Workbench Dan Weber, Henry Neeman, Joe Garfield and.
FESA evolution and the vision for Front-End Software
TYPES OFF OPERATING SYSTEM
Graduation Project Kick-off presentation - SET
XSEDE’s Campus Bridging Project
Drought Triggers in Barton Springs Segment of Edwards Aquifer
חוברת שקפים להרצאות של ד"ר יאיר ויסמן מבוססת על אתר האינטרנט:
Introduction to Software Testing
Modern Systems Analysis and Design Third Edition
EOSCpilot All Hands Meeting 8 March 2018 Pisa
C.U.SHAH COLLEGE OF ENG. & TECH.
Chapter 3 Hardware and software 1.
More on Estimation In general, effort estimation is based on several parameters and the model ( E= a + b*S**c ): Personnel Environment Quality Size or.
Modern Systems Analysis and Design Third Edition
Eastern Mediterranean University Department of Mechanical Engineering
Chapter 3 Hardware and software 1.
<Service provider solution name>
A GUI Based Aid for Generation of Code-Frameworks of TMOs
Introduction To CAD/CAM
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Adoption of Building Information Modeling Top Benefits of BIM data for Facilities Managers.
Java Programming Introduction
Overview of Workflows: Why Use Them?
Self-Managed Systems: an Architectural Challenge
Modeling and Analysis Tutorial
OmegaPS Users’ Group Meeting OUGM19
C++/Java/COM Interoperability
Introduction to Computers, Internet and the World Wide Web
Modern Systems Analysis and Design Third Edition
Executive Sponsor: Tom Church, Cabinet Secretary
Presentation transcript:

Reproducible Groundwater Science Workflows for the Future: A case for Texas Groundwater Availability Models Nalbeat “Sonny” Kwon, M.S. The University of Texas at Austin GSA South-Central Meeting 2017 March 13th, 2017

Data and Models The best tools we have to understand our critical Earth resources However, information contained within data and models is often misunderstood or misinterpreted by people who need to use it to make group decisions.

Environmental Decision Support Decision Support Systems (DSS) use the best available science to aid users in making informed choices. DSS can be a major bridge between science and policy.

Texas Groundwater Availability Models (GAMs) Unique policy setting Establishes science-vetted groundwater models Engages stakeholders and planners to develop Desired Future Conditions (DFCs) Requires use of models in DFC planning Mandated by the Texas Legislature and approved by TWDB Numerical simulation code used is MODFLOW (USGS)

Challenges to Creating DSS Must be capable of fast and powerful computations Need to integrate various knowledge realms Need to be flexible and easy to use Very few off-the-shelf tools to design DSS

Toward Reproducible Science Reproducibility, a cornerstone of science Difficult to uphold in computer-assisted research Hindering reproducibility: Lack of backward compatibility Undocumented workflows Data with no provenance (origin and processing history) Restricted access to needed data/software Reproducibility of science can only be achieved after reusability of the tools has been established.

(Unintentional) Abandonment of Research Software Multiple reasons: Paper makes it to publication Researcher Retires Graduate student finishes defense Funding is cut Hinders widespread reusability and causes significant effort to be lost

Case Study: GWDSS Groundwater Decision Support System (Pierce, 2006) Detailed model for research purposes; simpler model for real-time negotiation settings Developed for participatory decision making Barton Springs segment of the Edwards Aquifer as alpha test case (well studied with abundant historical data) Architecture: MODFLOW-96 + optimization + systems dynamics + database + visualization + GUI

Resurrection History of GWDSS Active work paused couple of years after development. When revisited in 2014, ran into problem of outdated and unsupported dependencies In 2015, an attempt to replicate old development settings within a virtual machine (VM) Could freeze a working state of the software Unsuccessful for a number of reasons

New Approach to Create GWDSS-Descendent New architecture aims to replicate and improve original features Leverages High Performa-nce Computing (HPC) and modern web-based technologies

The Need for High Performance Computing Brute force approach not only feasible but scalable to larger and more complex simulations Job name Quantity generated CPU time per file Total CPU time Total file size Input generation 9,382 files 3 minutes 470 hours 120 gigabytes Input assembly 37,528 files 50 milliseconds 30 minutes 1.36 terabytes Output execution 150,112 files 0.3 seconds 13 hours 4.12 terabytes *stats extrapolated using a 2015 MacBook Pro Retina laptop

GAM Version Compatibility Vital for adaptable research Currently most are outdated USGS conversion utilities? MF96toMF2K MF2KtoMF05UC

Best Practices to Preserve Software Reusability: Or, Lessons Learned (the Hard Way) Backed up on non-local persistent storage Openly accessible in a public repository under version control Curation and documentation

Questions?