Download presentation
Presentation is loading. Please wait.
Published byLawrence Greer Modified over 9 years ago
1
Cyberinfrastructure Overview Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of California, Santa Barbara DataONE Kick-off Meeting October 20-22, 2009
2
Cyberinfrastructure Objectives Support synthesis in earth observation sciences Support full lifecycle of scientific process Data acquisition and management Data preservation Data discovery and access Data integration Data analysis and visualization Process management and preservation Evolve to accommodate technology change
3
Design goals Distributed management at Member Nodes Replication and caching for preservation and performance Software must provide benefits for scientists today Evolution of software and standards Support and adapt existing community software efforts Emphasize Free and Open Source Software
4
What data are in scope? Biological e.g., Gene, Organism, Population, Species, Community, Biome, Ecosystem Environmental e.g., Atmospheric, Chemical, Ecological, Hydrological, Oceanographic, Physical Social e.g., Land use, human population Economic e.g., trade, ecosystem services, resource extraction
5
Providers Academic and Agency Scientists Research networks Environmental observatories Citizen groups Students Consumers Academic and Agency Scientists Research networks Environmental observatories Citizen groups Students Who are the providers and consumers? Same people, different roles driving needs
6
Every community has multiple metadata schemas Biological Data Profile, Darwin Core, Dublin Core, Ecological Metadata Language, Open GIS schemas multiple data formats ASCII, NetCDF, HDF, GeoTiff,... Some communities have general and domain specific ontologies Addressing this heterogeneity is critical Integrated analysis of datasets requires Syntax mapping Semantics mapping Sophisticated integration tools that do not exist Metadata and data integration
7
Integrating with existing infrastructure KNB, ESDIS, and Waters Networks
8
Overview of Components Member Nodes Earth observing institutions, projects, and networks Provide resources for their own data and replicated data Focused on serving their constituencies Coordinating Nodes Provide network-wide services to Member Nodes Geographically replicated services Investigator Toolkit Tools for researchers to access DataNetONE General Purpose and discipline-specific tools Adapt existing tools where possible
9
Node Design Member nodes Geographically Distributed Nodes Authoritative repository for many datasets Diversity tolerant (less tightly coordinated) Freedom to try new tools, methods, and leapfrog forward Partial replication Coordinating nodes Completely replicated Complete metadata catalogue Data Subset (initially a large fraction) Tightly coordinated, stable service platform
10
DataONE Service Interface Federated Identity and Authorization Services Object Management Services Discovery and Usage Services Preservation Services Network Services
12
Create common access methods for different clients Create a mechanism to map heterogeneous services Provide an interface between nodes and service requests Simplicity of construction Lightweight Ease of implementation Implementations are opaque to service consumers Service Interface for Interoperability
13
DataNetONE Components
14
What is the Investigator Toolkit? Suite of software tools for researchers Emphasize Free and Open Source, but support commercial General analysis frameworks (e.g., R, MATLAB) Domain-specific tools (e.g., GARP, Phylocom) Organized using scientific workflows Supports the scientific lifecycle Data management and preservation Data query and access Data analysis and visualization Process management and preservation Communication via the Service Interface
15
Toolkit Functions Supports the scientific lifecycle Data management and preservation Data query and access Data analysis and visualization Process management and preservation Portal software
16
Many existing open source efforts exist Data management: MATT, UDig, Specify Analysis and modeling: R, Octave Workflow systems: Kepler, Taverna, Triana, Pegasus Grid systems: Condor, Globus, BOINC Data and workflow portals: VegBank, myExperiment Commercial tools important too MATLAB, SAS, ArcGIS DataONE: help communities build their own tools Integrate, interoperate, stabilize Create libraries to DataONE Service Interface Who will build the Toolkit?
17
Data Management and Preservation Data management functions Data creation, input, editing, versioning Metadata creation, editing, annotation Local data storage, indexing, searching Example applications Morpho metadata editor Mercury metadata editor MATT metadata editor ESRI ArcCatalog Metacat Data Server -- lab group data management
18
Data Analysis and Visualization Need community-standard analysis frameworks R, Octave, GRASS SPlus, MATLAB, ArcGIS Thousands of domain-specific analytical tools exist GARP: Genetic Algorithm for Rule Processing Blast search ClustalW Phlylocom Mesquite
19
Workflow system capabilities Workflow systems: Enable communication Support preservation of scientific processes Enable component re-use Allow integration across many software frameworks Example workflow engines Kepler, Taverna, Pegasus, Triana
20
Community tools have been successful Investigator Toolkit will build upon these successes Adapt tools to work together with Service Interface Support Free and Open Source Software Supported tools will build over time
21
DataONE discovery portals Data discovery portal at Coordinating Nodes Workflow discovery portal at Coordinating Nodes Other portals as needed
22
Outstanding issues Data Discovery, Access, and Availability Federated Identity, Authentication, and Access Control Metadata and data standards Evolution of specifications Data Integration and Interoperability Data and Metadata preservation, longevity, and migration Versioning and identifiers Scalability
23
NIH Syndrome Lots of: metadata catalogs and specifications data standards service definitions architectures and protocols Many communities of practice GEOSS, KNB, CUAHSI, NBII, GBIF, TDWG, Ameriflux, EOS, OGC, W3C, LTER, NEON, OOI and on and on and on... DataONE can not just be Community n+1 Easy to get entrained in the details Have to save people work Have to engage groups early and earnestly
24
DataONE I am here W3C NCEAS OGC TDWG LTER Kepler SONet ME KNB GBIF GEOSS EOS Where are you?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.