Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cyberinfrastructure Overview Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of.

Similar presentations


Presentation on theme: "Cyberinfrastructure Overview Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of."— Presentation transcript:

1 Cyberinfrastructure Overview Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of California, Santa Barbara DataONE Kick-off Meeting October 20-22, 2009

2 Cyberinfrastructure Objectives Support synthesis in earth observation sciences Support full lifecycle of scientific process Data acquisition and management Data preservation Data discovery and access Data integration Data analysis and visualization Process management and preservation Evolve to accommodate technology change

3 Design goals Distributed management at Member Nodes Replication and caching for preservation and performance Software must provide benefits for scientists today Evolution of software and standards Support and adapt existing community software efforts Emphasize Free and Open Source Software

4 What data are in scope? Biological e.g., Gene, Organism, Population, Species, Community, Biome, Ecosystem Environmental e.g., Atmospheric, Chemical, Ecological, Hydrological, Oceanographic, Physical Social e.g., Land use, human population Economic e.g., trade, ecosystem services, resource extraction

5 Providers Academic and Agency Scientists Research networks Environmental observatories Citizen groups Students Consumers Academic and Agency Scientists Research networks Environmental observatories Citizen groups Students Who are the providers and consumers? Same people, different roles driving needs

6 Every community has multiple metadata schemas Biological Data Profile, Darwin Core, Dublin Core, Ecological Metadata Language, Open GIS schemas multiple data formats ASCII, NetCDF, HDF, GeoTiff,... Some communities have general and domain specific ontologies Addressing this heterogeneity is critical Integrated analysis of datasets requires Syntax mapping Semantics mapping Sophisticated integration tools that do not exist Metadata and data integration

7 Integrating with existing infrastructure KNB, ESDIS, and Waters Networks

8 Overview of Components Member Nodes Earth observing institutions, projects, and networks Provide resources for their own data and replicated data Focused on serving their constituencies Coordinating Nodes Provide network-wide services to Member Nodes Geographically replicated services Investigator Toolkit Tools for researchers to access DataNetONE General Purpose and discipline-specific tools Adapt existing tools where possible

9 Node Design Member nodes Geographically Distributed Nodes Authoritative repository for many datasets Diversity tolerant (less tightly coordinated) Freedom to try new tools, methods, and leapfrog forward Partial replication Coordinating nodes Completely replicated Complete metadata catalogue Data Subset (initially a large fraction) Tightly coordinated, stable service platform

10 DataONE Service Interface Federated Identity and Authorization Services Object Management Services Discovery and Usage Services Preservation Services Network Services

11

12 Create common access methods for different clients Create a mechanism to map heterogeneous services Provide an interface between nodes and service requests Simplicity of construction Lightweight Ease of implementation Implementations are opaque to service consumers Service Interface for Interoperability

13 DataNetONE Components

14 What is the Investigator Toolkit? Suite of software tools for researchers Emphasize Free and Open Source, but support commercial General analysis frameworks (e.g., R, MATLAB) Domain-specific tools (e.g., GARP, Phylocom) Organized using scientific workflows Supports the scientific lifecycle Data management and preservation Data query and access Data analysis and visualization Process management and preservation Communication via the Service Interface

15 Toolkit Functions Supports the scientific lifecycle Data management and preservation Data query and access Data analysis and visualization Process management and preservation Portal software

16  Many existing open source efforts exist  Data management: MATT, UDig, Specify  Analysis and modeling: R, Octave  Workflow systems: Kepler, Taverna, Triana, Pegasus  Grid systems: Condor, Globus, BOINC  Data and workflow portals: VegBank, myExperiment  Commercial tools important too  MATLAB, SAS, ArcGIS  DataONE: help communities build their own tools  Integrate, interoperate, stabilize  Create libraries to DataONE Service Interface Who will build the Toolkit?

17 Data Management and Preservation Data management functions Data creation, input, editing, versioning Metadata creation, editing, annotation Local data storage, indexing, searching Example applications Morpho metadata editor Mercury metadata editor MATT metadata editor ESRI ArcCatalog Metacat Data Server -- lab group data management

18 Data Analysis and Visualization Need community-standard analysis frameworks R, Octave, GRASS SPlus, MATLAB, ArcGIS Thousands of domain-specific analytical tools exist GARP: Genetic Algorithm for Rule Processing Blast search ClustalW Phlylocom Mesquite

19 Workflow system capabilities Workflow systems: Enable communication Support preservation of scientific processes Enable component re-use Allow integration across many software frameworks Example workflow engines Kepler, Taverna, Pegasus, Triana

20 Community tools have been successful Investigator Toolkit will build upon these successes Adapt tools to work together with Service Interface Support Free and Open Source Software Supported tools will build over time

21 DataONE discovery portals Data discovery portal at Coordinating Nodes Workflow discovery portal at Coordinating Nodes Other portals as needed

22 Outstanding issues Data Discovery, Access, and Availability Federated Identity, Authentication, and Access Control Metadata and data standards Evolution of specifications Data Integration and Interoperability Data and Metadata preservation, longevity, and migration Versioning and identifiers Scalability

23 NIH Syndrome Lots of: metadata catalogs and specifications data standards service definitions architectures and protocols Many communities of practice GEOSS, KNB, CUAHSI, NBII, GBIF, TDWG, Ameriflux, EOS, OGC, W3C, LTER, NEON, OOI and on and on and on... DataONE can not just be Community n+1 Easy to get entrained in the details Have to save people work Have to engage groups early and earnestly

24 DataONE I am here W3C NCEAS OGC TDWG LTER Kepler SONet ME KNB GBIF GEOSS EOS Where are you?


Download ppt "Cyberinfrastructure Overview Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of."

Similar presentations


Ads by Google