GEON: The User Perspective Choonhan Youn Dogan Seber, Chaitan Baru, Ashraf Memon San Diego Supercomputer Center, University of California at San Diego
GEON (GEOscience Network) A cyberinfrastructure project for geosciences funded by NSF ITR. creating an IT infrastructure to “enable” interdisciplinary geoscience research -- not a group of researchers, but the entire community will benefit Vision: Enable new discoveries in the geosciences by building an easy-to-use and “comprehensive” data, software, tools, and information network by utilizing state-of the-art information technology resources.
Current GEON member institutions Members Arizona State University Bryn Mawr College Penn State University Rice University San Diego State University San Diego Supercomputer Center / University of California, San Diego University of Arizona University of Idaho University of Missouri, Columbia University of Texas at El Paso University of Utah Virginia Tech UNAVCO, Inc. Digital Library for Earth System Education (DLESE) Partners California Institute for Telecommunications and Information Technology Cal-IT2 Chronos CUAHSI ESRI Geological Survey of Canada Georeference Online IBM Kansas Geological Survey Lawrence Livermore National Laboratory U.S. Geological Survey (USGS) Other Affiliates Southern California Earthquake Center (SCEC), EarthScope, IRIS, NASA
GEOSCIENCE CHALLENGES Exponential Increase in Data Volume – How to manage vast amounts of data can be used by all scientists in an easy-to-use environment Data Storage, Access and Preservation – How to build a framework to exchange data and help preserving collected data sets Data Integration (semantic and syntactic) – How to merge multiple geology maps to make a seamless (“integrated”) map Computational Challenges – How to build a system that helps scientists run advance software without having access to significant resources (computers and technical), focusing on the science problem Advance Visualization (3D/4D) – How to build a visualization system that helps scientists analyze large and complex data sets dynamically Archiving and publications of results with reusable components (reusability) – How to preserve scientific results and help others to repeat the analysis as efficiently as possible?
GEON Cyberinfrastructure (CI) Principles CI: Support the “day to day” conduct of science (e- science), in addition to “hero” computations An equal partnership – IT works in close conjunction with science Create shared “science infrastructure” – Integrated online databases, with advanced search and query engines – Online models, robust tools and applications Leverage from other intersecting projects – Much commonality in the technologies, regardless of science disciplines, e.g. BIRN, SEEK, and many others
Main e-Research facilities I A Resource Registration System for Data Providers – Register ontologies (domain knowledge) and ontology articulations – Register datasets with metadata including data access information – Optionally register datasets to ontologies (which is crucial for data integration and smart search): Ontology enabled semantic integration – Shapefile, ASCII, Excel, GMT Raster, Geo TIFF, Relational Database, PDF, tool, WMS service, Web service, etc. A Search Engine for Data Users – Metadata based search – Spatial coverage based search – Temporal coverage based search – Concept based search – Ontology based data discovering
Main e-Research facilities II The user workspace, called myGEON area. – Users are able to search and collect their data sets from the GEON search engine and integrate them. – For example, users can review and analyze "SYNSEIS“ ouputs that are generated by job running. Computational HPC – SYNSEIS (Synthetic Seismogram toolkit) Workflow – LiDAR: an end-to-end solution for the distribution, interpolation and analysis of LiDAR / ALSM point data. – Atype workflow: generates map for all plutonic bodies in Virginia from the VA Igneous rocks database based on the certain inputs.
Constraints for main e-Research facilities Dynamic workflow issues due to the web-based system on the GEON Large computational clusters for simulating GEON applications as needed – GEON has three small cluster nodes on partner sites
GEON Portal Usability Easy of use – GEON Search, SYNSEIS, many of them, etc. Make complex tasks easy to specify – LiDAR Highly interactive – SYNSEIS Integrated access to tools and resources – myGEON, Mapping Integration
Computational HPC for SYNSEIS
Lessons Learnt Its main strengths – Standard-compliant ways – Using open source libraries and tools for most of implementations Its main weaknesses – Highly user interactive, friendly interface issues within the portlet franework Would you consider alternatives to a portal solution? – Currently, No
Future Plan Will add and develop new functionalities based on the requests from GEON PIs and geoscience community. Will keep improving the portal usability. – For example, in case of SYNSEIS, add more user capabilities in the user interface for complex earthquake simulations. Will expand its use within geoscience community internationally – Center on GEON PIs first
GEON: The Developer Perspective Choonhan Youn Dogan Seber, Chaitan Baru, Ashraf Memon San Diego Supercomputer Center, University of California at San Diego
Methods of GEON’s Design Several workshops were held with participation from scientists from different disciplines like geochemistry, geophysics etc. Also Principal Investigators (PIs) visits SDSC for focused discussion on their requirements Prototypes are built using gathered requirements and then spiral model of software development is followed to enhance the prototype.
Service-Oriented Approach
Priority of Functional and non-functional requirements Start with functional requirement from the principal investigators or local geo-science PI Prototypes are built and functional requirements are tested Then focus on to non-functional requirements like usability
Technical Strategy The “two-tier” approach – Use best practices, including use of commercial tools and open standards, where applicable… start with development using the technology available now – …while developing advanced technology, and doing CS research push for open source and best practices as much as possible
GEONmiddleware GEONSearch, Registration, myGEON Portlet myOntology.owl myDataset.foo metadata User Access (via Portal) Gazetteer, DLESE, … Geologic Age, Chronos, … external services GEONsearch Search condition(s) spatial temporal concept Log myGEON GEON Workspace (user) User actions add delete manipulate GEON Catalog ResourceRegistration SRB Client Access (via web services) Other distributed apps Kepler, DLESE, …
Flash Application SYNSEIS toolkit SYNSEIS Portlet Data Model Service Job Submission/Monitoring and File Service Data Archives Service HPC Resources Data Repository Job Database SOAP JDBC CORBA(IIOP) Grid Services GEONGrid Portal User Access (via Web Browser) Cornell Map Server IRIS DMC HTTP SOAP Grid FTP Web Services myGEON Portlet SAC Service TeraGrid clusters
Development Issues Constraints – Interoperability issues due to use of existing tools Use of existing tools developed in Fortran and some machine dependent algorithms and code GRASS based GIS processing. Incompatible implementation of same standard (OGC’s WMS) – Usability requirements Portlets UI is designed by the software developers and so they are not very user friendly – Part of our tension in the project is that while this is an R&D project for the IT folks, the science folks want some of it to look like production software – lack of user input in some cases, because some users are still trying to get up to speed with the IT concepts so they haven’t really used the system.
Evaluation Usually success of our GEON services is determined by user satisfaction! Usability workshop was held recently with domain scientist involved and their feedback was taken. – Based on this report, we are working on it Another workshop will be held after the implementation of the suggested changes.
Lessons Learnt The most successful aspects – Integrating with other grid, such as TeraGrid – Data registration, search capabilities for geoscience community – Community involvement The least successful aspects – Community still is evaluating this system.
Future Plans Will provide a secure role-based authorization control (using SAML) to fully integrate into the GEON portal. Will add WSRP service. The definition of conventions for managing state may be handled through standard ways such as WSRF so that applications discover, bind, and communicate with stateful resources in standard and interoperable ways.
GEON Search Portlet
GEON Resource Registration Portlet
User Workspace
Mapping Integration Portlet Client Portlet Map Integration Portlet (Mediator) Map Integration Portlet (Mediator) Geon Dataset Ids Gridsphere GEON Metadata Catalogue Ontology Service SRB Query Tracking DB Query Service Mapping Services Webservices 1.Dataset Ids to Dataset Names 2.Dataset Ids to Ontology Ids 3.Ontology Ids to Ontology Names 4.Ontology Ids to Ontology Concepts Ontology Engine ArcIMS Knowledge Representation Redefine Query Mapping Execute Query Download Datasets Store Query Results Query Result Indexing Generate Map GET_EXTRACT GET_MAP
IBM DB2 GEON Portal NFS Mounted Disk Data Processing Algorithms Compute Cluster x,y,z and attribute raw data process output maps/data Client WWW GEON Search Portlet LiDAR Process Portlet Other Portlet LiDAR Processing Service Spatial Query Service GEON Search Service Software Tools DB2 Spatial Function GRASSARCINFOGMT GEON Catalog DATA PROCESSING(LiDAR Portlet) TeraGrid DataStar