EcoGrid SEEK All Hands Meeting February 2003 Albuquerque, NM
EcoGrid in SEEK ANAYSIS & MODELING LAYER SEMANTIC MEDIATION LAYER DATA/COMPUTE LAYER
Aims of EcoGrid Which, Where, How, Who ???? Share Data and Information Relate Data from multiple projects/groups Crosswalks across data structures Develop Eco-related Finding Aids for Data Global User: Authenticate and Authorize Provide an infrastructure for “Archivable Collection-building” for SEEK scientists Facilitate the A&M layer and the SMS layer
Challenges of EcoGrid Data & User Diversity – datasets & scientists –themes, methods, units,structures –Small data sizes but high complexity - metadata Multiple Data Organizations –Biodiversity Surveys –Population data –GIS, Satellite Images, Weather Data, … Ontologies & Taxonomies Data Discovery: No single place to find Data Entropy – rapid decline of information on data Autonomy with Centralized access Leverage Computational Grid work
Our Charge Develop a framework for “global access to ecologically- related data” Look at current approaches, existing systems & grids List features/functionalities we want to see in EcoGrid Study how to leverage, integrate, extend existing work Come up with architectural framework & user interfaces Identify Datasets that should be in the EcoGrid Identify Networks that will be part of the EcoGrid Identify Methods that can be used through the EcoGrid Identify people, members, partners Identify timeline, goals, milestones
Existing services Metacat – syntactic and semantic metadata querying/inserting/updating/deleting, user registration/authentication, data replication, data/metadata versioning, - supports any XML-based metadata Xanthoria – common-schema mediator (currently 8 sites) metadata query/insert/update/delete for any XML schema to underlying metadatabase (SQL, native XML)
Existing Systems Prometheus – querying classification taxonomy, query/describe graph structures DiGIR – querying arbitrary XML-describable resources (underlying data sources can be any type: RDB, XMLDB). ClimDB – integrating (using wrapping at the data source) diverse format climate data. Access through web, common schema identified beforehand – tabular description HyperLTER – summary ontology as metadata for images put in as metadata, image extraction /geographicsubsetting/band-level subsetting/ - integration with MODIS images and Hyperspectral images, TM images, airphotos, …
Existing Systems Veg Bank – 3 databases co-occurrence records, species taxonomic database that is concept-driven, community classification. Distributed vegbank, querying by plots. Querying/insert/update/annotate across three diverse databases that are described using XML SRB – access distributed data, syntactic, semantics,user-defined (arbitrary relational) metadata based querying. Annotations for data. Opertions on data. Extraction of metadata. ingest,bulk ingest, delete,upate of data/metadata
What is needed for Data/EcoGrid Define definitions for structures for data that will be processed in SEEK data grid –Vectors, graphs, tables, trees, …. Extend EML to take account of DDI and other metadata standards List a set of services that might be supported by data Identify common languages and mappings for a structural vocabulary –Ex. Occurrence, co-occurrence, as key words Identify the services of EcoGrid Components Identify Registry Language – users/data/methods/resources/srorage/compute
EcoGrid Phase 1 SRB/MCAT Client EcoGrid Client MetaCat Client EcoGrid Server SRBLifeMapper GARP WhyWhere MetaCat Wrappers Eco CAT
What do we here Specifications document Calendering, Meetings Milestones (Priorities, Duration for pubs, software, and other products) Deliverables for Annual report (June 1) Planned activies for dev mtg, wg, all-hands mtg Staff coordination and task allocation