Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual.

Slides:



Advertisements
Similar presentations
Data Management Expert Panel - WP2. WP2 Overview.
Advertisements

The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
Earth System Curator Spanning the Gap Between Models and Datasets.
Metadata Development in the Earth System Curator Spanning the Gap Between Models and Datasets Rocky Dunlap, Georgia Tech.
OPeNDAP’s Server4 Building a High Performance Data Server for the DAP Using Existing Software Building a High Performance Data Server for the DAP Using.
High Performance Computing Course Notes Grid Computing.
SWIM WEB PORTAL by Dipti Aswath SWIM Meeting ORNL Oct 15-17, 2007.
Toni Saarinen, Tite4 Tomi Ruuska, Tite4 Earth System Grid - ESG.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Presented by The Earth System Grid: Turning Climate Datasets into Community Resources David E. Bernholdt, ORNL on behalf of the Earth System Grid team.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
Database System Concepts and Architecture
GCMD/IDN STATUS AND PLANS Stephen Wharton CWIC Meeting February19, 2015.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
ESG The Earth System Grid (ESG) Presented by Don Middleton & Luca Cinquini NCAR Scientific Computing Division On Behalf of the ESG Team SCD Executive Committee.
Mid-Course Review: NetCDF in the Current Proposal Period Russ Rew
The Earth System Grid (ESG) Goals, Objectives and Strategies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
Integrated Model Data Management S.Hankin ESMF July ‘04 Integrated data management in the ESMF (ESME) Steve Hankin (NOAA/PMEL & IOOS/DMAC) ESMF Team meeting.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Intergrid KoM Santander 22 june, 2006 E-Infraestructure shared between Europe and Latin America José Manuel Gutiérrez
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
The Earth System Grid: A Visualisation Solution Gary Strand.
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
Web Portal Design Workshop, Boulder (CO), Jan 2003 Luca Cinquini (NCAR, ESG) The ESG and NCAR Web Portals Luca Cinquini NCAR, ESG Outline: 1.ESG Data Services.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
NIEeS Workshop, Cambridge (UK), Sep 2002 Luca Cinquini for the Earth System Grid METADATA DEVELOPMENT for the EARTH SYSTEM GRID Luca Cinquini (SCD/NCAR)
Metadata Standards for Gridded Climate Data in the Earth System Grid Robert Drach LLNL/PCMDI UCRL-PRES
Fox 2 AISRP April 4-6, 2005  Earth System Grid  Grid-enabled OPeNDAP  Architecture - Server and Application access  Framework experience.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
The VIRTUAL SOLAR-TERRESTRIAL OBSERVATORY - Exploring paradigms for interdisciplinary data-driven science Peter Fox 1 Don Middleton 2,
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
1 Registry Services Overview J. Steven Hughes (Deputy Chair) Principal Computer Scientist NASA/JPL 17 December 2015.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
1 Overall Architectural Design of the Earth System Grid.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
USGS GRID Exploratory Status Review Stuart Doescher Mike Neiers USGS/EDC May
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
1 2.5 DISTRIBUTED DATA INTEGRATION WTF-CEOP (WGISS Test Facility for CEOP) May 2007 Yonsook Enloe (NASA/SGT) Chris Lynnes (NASA)
GO-ESSP The Earth System Grid The Challenges of Building Web Client Geo-Spatial Applications Eric Nienhouse NCAR.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
1 Scientific Data Management Group LBNL SRM related demos SC 2002 DemosDemos Robust File Replication of Massive Datasets on the Grid GridFTP-HPSS access.
Data Browsing/Mining/Metadata
The Earth System Grid: A Visualisation Solution
improve the efficiency, collaborative potential, and
HAO/SCD: VO, metadata, catalogs, ontologies, querying
Metadata Development in the Earth System Curator
Data Management Components for a Research Data Archive
Presentation transcript:

Fox 2 January 4, 2005 Project sponsors âEarth System Grid - DOE/SciDAC âCoupled Energetics and Dynamics of Atmospheric Regions - NSF/GEO/ATM âVirtual Solar-Terrestrial Observatory - NSF/CISE/SCI âRelated DODS/OPeNDAP work - NASA and NCAR/HAO

Fox 3 January 4, 2005  Report on experience with data ‘systems’ and data ‘frameworks’  CEDARWEB  Earth System Grid  Compare and contrast success in terms of use(rs)  Technology integration - when and how does it work and scale?  Outline a merged approach for Virtual Observatory concept Overview

Fox 4 January 4, 2005 CEDARWEB

Fox 5 January 4, 2005 CEDARWEB: heritage  CEDAR is a large scientific and technical community focusing on the Earth’s middle and upper atmosphere. The program features ground-based observing networks, models and integrative studies. Funded by NSF, in third phase (3rd decade)  CEDAR data history  Started as an incoherent radar database in 1983 as a tape archive (back to 1966)  Grew by late 80’s adding other instruments, models, indices  Went on-line in early 90’s (became a single-tiered data system)  Web access in 1996, three versions of the interface  Holdings - some satellite data, geophysical indices, modesl (GCM, empirical, tides, etc.), ISRs, HF Radars, Digisondes, FPIs, IR Michelson Interferometers, Spectrometers, Airglow Imagers, All-Sky Cameras, LIDARs, Multi-Channel Photometers, MST Radars, MF Radars, LF Radars, Meteor Wind Radars, Campaigns, Presentations, Surveys, Jobs, Workshops, etc.  Community, 600+, 300+ registered users, ~ 100 active data users per year  NCAR tasked with community support, and especially in the early days to ‘take care’ of the data and work with data providers and users  Significant effort in catalogs, metadata, controlled vocabulary  System has labored in getting past the code/mnemonic schemes of the past, base data format

Fox 6 January 4, 2005 CEDAR pre-web Data query, selection and retrieval interface, without any integrated tools or ability to preview data before retrieving it.

Fox 7 January 4, 2005 CEDARWEB 2.0

Fox 8 January 4, 2005 CEDARWEB 2.0

Fox 9 January 4, 2005 CEDARWEB 3.x Data query, selection and retrieval interface, with integrated tools, e.g. ability to plot (preview) data before retrieving it.

Fox 10 January 4, 2005 CEDARWEB - OPeNDAP

Fox 11 January 4, 2005 CEDARWEB - OPeNDAP

Fox 12 January 4, 2005 CEDARWEB 3.1 Ability to quickly plot data to assess suitability, quality, and produce a quick copy with some customization for a preliminary study.

Fox 13 January 4, 2005 Experience: CEDARWEB Don’t just provide data, but also build in community information and ancillary information that is of value.

Fox 14 January 4, 2005 Inside CEDARWEB  Rich metadata; categorized  OPeNDAP for data access and transport  MySQL for catalog and user records  https and cookies for session authentication  Script-enabled interface with plotting built in (ION) delivers html to browsers  ‘Hides’ organizational data record structure (sort of)  Low-level data product, but also high-level  Disconnect between delivery of data and attributes  Today: framework is inside the data system!

Fox 15 January 4, 2005 Experience: CEDARWEB CEDARWEB has been developed and improved over more than 10 years of interaction with users, data providers, and a community steering committee. Each of these elements has directly contributed to changes in what services are provided, what information and materials are made available via the web site and what levels of authorization and authentication are required. Biggest lesson : systems approach has worked because of the heritage of the data collection but users (esp. new or very experienced) see a barrier to entry and don’t understand where system starts/stops.

Fox 16 January 4, 2005  The goal of ESG is to make climate data – particularly climate model data – an easily accessible community resource. The project is funded by the SciDAC program: Scientific Discovery through Advanced Computing.  Enabling researchers to understand and make effective use of very large, distributed climate datasets is critical. The broad strategy is to develop a collection of server-side capabilities – minimize the amount of data movement.  Multiple interfaces to ESG will allow researchers to focus on science rather than issues of data transfer, format, and data set manipulation.  Foundation is Globus Grid technology Earth System Grid Overview

Fox 17 January 4, 2005 ESG: U.S. Collaborations & Development ORNL: Climate storage & computational resources ORNL: Climate storage & computational resources LANL: Next generation coupled models & computing LANL: Next generation coupled models & computing ANL: Computational grids, & grid-based applications ANL: Computational grids, & grid-based applications USC/ISI: Computational grids, & grid-based applications USC/ISI: Computational grids, & grid-based applications NCAR: Climate change predication and scenarios NCAR: Climate change predication and scenarios LBNL: Climate storage facility LBNL: Climate storage facility LLNL: Model diagnostics & inter-comparison LLNL: Model diagnostics & inter-comparison

Fox 18 January 4, 2005  DODS/OPeNDAP: Distributed Oceanographic Data System (Unidata)  Integrations of Globus GridFTP, DODS data access  THREDDS: THematic Real ‑ time Environmental Distributed Data Services (Unidata)  LAS: Live Access Server (NOAA Pacific Marine Environmental Laboratory)  Works with CDAT, Ferret, GrADS, …  CDAT: Climate Data Analysis Tools (PCMDI), includes CDMS: Climate Data Management System, VCDAT visualization  Community Data Portal project (NCAR)  NCL (NCAR)  Globus Grid technology(ANL, ISI): GridFTP, CAS Community Access Portal ESG leverages existing software and projects

Fox 19 January 4, 2005 ESG: Requirements & Priority Matrix

Fox 20 January 4, 2005 ESG areas of development âAuthentication and Authorization services : application of Globus technologies for secure data management and access (PKI certificates, proxy delegation, Community Authentication Services, web interfaces) âData Transport Services: based on gridFTP protocol and implementation (high speed, tunable, multi-stream, reliable), extensions for multi-file management and connection to offline storage systems (Hierarchical Storage Management), and for transparent data access and operations (grid-enabled OPeNDAP) âMetadata services (for data management, access, search & discovery, annotation, analysis, etc.) âOther services: Data Analysis and Visualization, Task Management, Monitoring and Control, etc.

Fox 21 January 4, 2005 ESG: ESG-II Architecture

Fox 22 January 4, 2005 TOMCAT Servlet engine TOMCAT Servlet engine MCS Metadata Cataloguing Services MCS Metadata Cataloguing Services RLS Replica Location Services RLS Replica Location Services SOAP RMI MyProxy server MyProxy server MCS client RLS client MyProxy client GRAM gatekeeper GRAM gatekeeper CAS Community Authorization Services CAS Community Authorization Services CAS client disk MSS Mass Storage System HPSS High Performance Storage System disk HPSS High Performance Storage System disk SRM Storage Resource Management SRM Storage Resource Management SRM Storage Resource Management SRM Storage Resource Management SRM Storage Resource Management SRM Storage Resource Management SRM Storage Resource Management SRM Storage Resource Management gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server openDAPg server openDAPg server CAS-enabled Striped-gridFTP server CAS-enabled Striped-gridFTP server LBNL LLNL ISI NCAR ORNL ANL Striped gridFTP client Striped gridFTP client gridFTP openDAPg server openDAPg server CAS-enabled Striped-gridFTP server CAS-enabled Striped-gridFTP server gridFTP openDAPg server openDAPg server CAS-enabled Striped-gridFTP server CAS-enabled Striped-gridFTP server gridFTP LAS Live Access Server LAS Live Access Server

Fox 23 January 4, 2005 NCAR LBNL LLNL ISI ANL ORNL GSI CAS server CAS client MyProxy clientMyProxy server TOMCAT SECURITY services GRAM METADATA services FRAMEWORK services Auth metadata RLS MySQL RLS MySQL RLS MySQL RLS MySQL NERSC HPSS NCAR MSS DISK ORNL HPSS DATA storage The Earth System Grid THREDDS catalogs Xindice MySQL OGSA-DAISMCS TRANSPORT services gridFTP server/client HRM openDAPg server ANALYSIS & VIZ services NCL openDAPg clientLAS server CDAT openDAPg client MONITORING services SLAMON daemon TOMCAT AXIS

Fox 24 January 4, 2005 Earth System Grid Portal

Fox 25 January 4, 2005 Community Data Portal Free text search Applications Live Access News Authentication THREDDS catalog

Fox 26 January 4, 2005 Community Data Portal

Fox 27 January 4, 2005 LAS/CDAT: Example of a Web- based Data Portal  Technology: Web Based (end user requirements) LAS, DODS, ESG (i.e., Globus), CDAT  Portal should hide/simplify the Grid for users Single sign-on Community-based authorization Simplified resource location Remote job submission, management  Accesses the ESG Grid Testbed

Fox 28 January 4, 2005 ESG: Example of a Web- based Data Portal ( serving 40+ simulations: AMIP, CMIP, and PCM )

Fox 29 January 4, 2005 ESG: Example of a Client Application

Fox 30 January 4, 2005 Metadata-centric view of ESG services METADATA SERVICES METADATA SERVICES USER AUTHENTICATION AND AUTHORIZATION USER AUTHENTICATION AND AUTHORIZATION ACCESS AND AUTHORIZATION METADATA DATA TRANSPORT LOCATION METADATA SYSTEM MONITORING AND CONTROL SYSTEM MONITORING AND CONTROL LOGGING METADATA DATA SEARCH & DISCOVERY CONTENT METADATA ANNOTATION & HISTORY METADATA DATA ANALYSIS & VISUALIZATION DATA ANALYSIS & VISUALIZATION AGGREGATION METADATA DATA BROWSING CATALOGUING METADATA

Fox 31 January 4, 2005 ESG Metadata Services Architecture 3-layer architecture: âMetadata Holdings: physical metadata content, stored in a system of relational and/or XML native databases âCore Metadata Services: modules and libraries that mediates all access to the Metadata Holdings (insert, update, delete, query) – expose an API that hides the specific implementation of the databases and query languages âHigh Level Metadata Services: system of applications that make use of the Core Metadata Services to fulfill a specific atomic functionality – will be invoked by external clients

Fox 32 January 4, 2005 METADATA EXTRACTION METADATA EXTRACTION METADATA DISPLAY METADATA DISPLAY METADATA BROWSING METADATA BROWSING METADATA SEARCH, QUERY & DISCOVERY METADATA SEARCH, QUERY & DISCOVERY ESG CLIENTS API & USER INTERFACES Replica Location Services Metadata Cataloguing Services XML DB THREDDS catalogs METADATA HOLDINGS METADATA ANNOTATION METADATA ANNOTATION METADATA VALIDATION METADATA VALIDATION METADATA ACCESS (update, insert, delete, query) METADATA ACCESS (update, insert, delete, query) SERVICE TRANSLATION LIBRARY SERVICE TRANSLATION LIBRARY CORE METADATA SERVICES METADATA AGGREGATION METADATA AGGREGATION METADATA CONVERSION METADATA CONVERSION METADATA & DATA REGISTRATION METADATA & DATA REGISTRATION PUBLISHING HIGH LEVEL METADATA SERVICES SEARCH & DISCOVERY ADMINISTRATION BROWSING & DISPLAY ANALYSIS & VISUALIZATION

Fox 33 January 4, 2005 ESG Metadata Services Goal Functionality âServices responsible for the creation, management and utilization of metadata associated with geophysical data âFunctionality: âMetadata extraction (automatically, from files in different format and according to various possible metadata standards) âMetadata conversion (from one standard to another) âMetadata aggregation (associated with data collections) âMetadata annotation (manually by humans) âMetadata validation (basic quality control of metadata) âRegistration (population of metadata holdings) âHarvesting (combination of metadata from different repositories) âMetadata browsing and display (for humans) âSearch and discovery of data through metadata âMetadata query (by agents or clients for data analysis and visualization)

Fox 34 January 4, 2005 ESG Metadata Services Current Development Currently have in production the following technologies : âReplica Location Services : database to manage and index multiple copies of the same data stored at different centers âMetadata Cataloguing Services : relational database to store scientific metadata (developed for high energy physics and geophysical data) âXML native (**) and SQL databases âTHREDDS (by Unidata ) : system for hierarchical cataloguing of datasets and associated metadata ( âNcML (Netcdf Markup Language) : XML language for encoding of metadata associated with data in netcdf format (and more…)

Fox 35 January 4, 2005 ESG Metadata Policy âPremise : geophysical sciences are too broad and complex to impose a single, omnicomprehensive metadata standard to capture the relevant information for all datasets, projects, instruments, scientists âESG will not mandate use of any metadata schema or convention âAllow data providers, scientists to use their metadata of choice, provide technologies and tools to store and access metadata through common services (MCS, XML DB, THREDDS catalogs) âEncourage development and reuse of a limited set of domain- specific standards (climate data, radar data, airborn instrumentation etc), encoding in XML (according to community developed schemas), interoperability and combination of schemas (XML namespaces and RDF-based ontologies - developed but not used)

Fox 36 January 4, 2005 OPeNDAP for ESG II âDODS since ~ 1995 was been based on http and cgi-style architecture âTwo concerns âApplication support and performance of HTTP âHousekeeping abilities of cgi architecture âSolution evolve OPeNDAP the discipline neutral aspect of DODS

Fox 37 January 4, 2005 OPeNDAP ctd. âData transport protocol and access protocol separated âRevised server architecture âAddress Grid-style authentication âMemory management âException handling âAll these changes and retain interoperation with HTTP and cgi âAdvanced requirements: URL should support more than one dataset, or object, i.e. aggregation

Fox 38 January 4, 2005 OPeNDAP 3.x vs OPeNDAP-g Architecture Simple and easy to install One CGI process per URL request Limited memory management – external Limited scalability Limited status reporting to web server Returns data stream from one format Standalone server or httpd module Can manage multiple daemon processes Strong memory management – internal Reuse processes, scales Coupled to OPeNDAP server for status Returns multiple formats in a single stream, multiple protocols

Fox 39 January 4, 2005

Fox 40 January 4, 2005 Application development

Fox 41 January 4, 2005 Status âRefactor core classes to remove http/libwww, etc. âOperational/production release of standalone OPeNDAP server (no dependence on web server) âMulti-protocol support: file, http, GridFTP, ftp, etc. âRe-architected for aggregation support and performance âRun OPeNDAP server as a client to GridFTP server âPortal application client in production, prototype of netCDF client operational âAuthentication is handled outside OPeNDAP server âURL syntax is more complex

Fox 42 January 4, 2005 ESG: Framework experience  ESG is a highly collaborative effort and will allow users to quickly access data storage facilities storing petabytes of raw or processed data in an application independent manner.  Payoffs of this distributed collaborative infrastructure have included: Distributed data-sharing, RLS works! SRM/HRM work! OPeNDAP-g works! Simplified data discovery of climate data, the work on metadata paid off! Scalability? Large-scale climate data processing and analysis via highly integrated portal Increased collaboration among climate research scientists, people use it! Aid in climate assessments and estimates of future climate variability and trends, IPCC! âAuthentication and authorization have been a significant challenge 7GSI to CAS MyProxy - session based and seems to work well, more compatible with heterogeneous framework services SAML is working for multi-file batch transfer

Fox 43 January 4, 2005 ESG: Framework experience âPrivatization 7Portal interface (and much of the holdings) are cloned 7Closed communities are breeding dead-end alley developments, e.g. delivering netCDF âTransport - GridFTP versus HTTP 3Server to server 3Very good performance 7Depends on a very specific version of GRIDftp server (stripped) 7Clients are not as capable due to ‘weight’ of globus, revert to HTTP âScalability and response times (data AND metadata) 3Framework architecture supports re-layered for tuning âService monitoring 3to support the distributed collaborative infrastructure 7need lots or all services to really make a production environment work  Many Globus services not used (GRIS, MDS, GIIS, … )  Feeling lucky? Try out ESG by visiting the website at:

Fox 44 January 4, 2005 Success?  Users are generally happy  Exploited new technology components  Integration - when and how does it work and scale? 7XML 3SQL 7DODS 3OPeNDAP and OPeNDAP-g  Portals  P2P - clients are not as ready as we think  Globus provides a suite of framework components, some are easier to integrate than others, some just don’t fit our use-cases and architecture  Data framework - e.g. OPeNDAP has been extremely successful

Fox 45 January 4, 2005 User needs In discussions with data providers and users, the needs are clear: ``Fast access to `portable' data, in a way that works with the tools we have; information must be easy to access, retrieve and work with.'’ Too often users (and data providers) have to deal with the organizational structure of the data sets which varies significantly --- data may be stored at one site in a small number of large files while similar data may be stored at another site in a large number of relatively smaller files. There is an equally large problem with the range of metadata descriptions for the data. Users often only want subsets of the data and struggle with getting it efficiently. One user expresses it as: ``(Please) solve the interface problem.''

Fox 46 January 4, 2005 Vision for building science cyberinfrastructure  Use-case, then requirements  Then derive architecture and choose technology components  Build a working system for users from the start  Get your funding source and community to commit to an evolving architecture  If you choose a major framework technology, e.g. Globus, OPeNDAP, THREDDS, partner with them  Data framework - e.g. OPeNDAP has been extremely successful

Fox 47 January 4, 2005 One paradigm Goal - find the right balance of data/model holdings, portals and client software that a researchers can use without effort or interference as if all the materials were available on his/her local computer. E.g. The Virtual Solar-Terrestrial Observatory (VSTO) is proposed to be: a distributed, scalable education and research environment for searching, integrating, and analyzing observational, experimental and model databases in the fields of solar, solar-terrestrial and space physics Comprises: a system-like framework which provides virtual access to specific data, model, tool and material archives containing items from a variety of space- and ground-based instruments and experiments, as well as individual and community modeling and software efforts bridging research and educational use

Fox 48 January 4, 2005 Virtual Observatory? Need better glue Basic problem: schema are categorized rather than developed from an object model/class hierarchy -> significantly limits non-human use. However, they all form the basis to organize catalog interfaces for all types of data, images, etc. This limits data systems utilizing frameworks and prevents frameworks from truly interoperating (SOAP, WSDL only a start) Directories, e.g. NASA GCMD, CEDAR catalog, FITS (flat) keyword/ value pairs, are being turned into ontologies (SWEET, VSTO) Markup languages, e.g. ESML, SPDML, ESG/ncML are excellent bases Evolve, recast, merge (where appropriate) using formal processes, tools with intended use in mind - for interface specifications, reasoning, validation, etc. beyond the usual search and access

Fox 49 January 4, 2005 Summary  Basic success in both data systems and data framework approaches  Satisfying user and sponsor needs (from ‘just’ to ‘outstanding’)  Experience with Globus ranges from very good, to not ready for our need  Experience with OPeNDAP is very good, especially with core services  Scalability and performance require an adaptable architecture which is something system-level interfaces can still hide from the user  Challenge - to bring these attributes to a framework, i.e. in which the user is more exposed  Interoperate, interoperate, interoperate - interface, interface, interface  User interfaces still require significant HCI efforts  Metadata services are extremely important