SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open Grid interfaces (OGSA compliant) l Analysis and Modeling System l Modeling scientific workflows l Semantic Mediation System l “Smart” data discovery l Knowledge-based data integration l Knowledge-based analysis integration l Knowledge Representation l Ontologies for describing ecology
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Grid-standardized interfaces l Metadata-mediated data access (EML) l Query l Read l Write l Authentication l Authorization l Replication l Computational access l Pre-defined analytical services l On-the-fly analytical services
EcoGrid Services l Query l Search metadata and data, return result sets with ID l Read l Retrieve data objects by ID l Authentication l Verify user identity l Authorization l Record allowable interactions l Write l Write data objects by ID l Replication l Mirror objects for backup and efficiency l Computation l Execute models and simulations from AMS on various nodes
EcoGrid client interactions l Modes of interaction l Client-server l Fully distributed l Peer-to-peer l EcoGrid Registry l Node discovery l Service discovery l Aggregation services l Centralized access l Reliability l Data preservation
Building the EcoGrid ANDLUQHBRNTL Metacat node Legacy system LTER Network (24) Natural History Collections (>> 20) Organization of Biological Field Stations (180) UC Natural Reserve System (36) Partnership for Interdisciplinary Studies of Coastal Oceans (4) Multi-agency Rocky Intertidal Network (60) SRB node DiGIR node VCR VegBank node Xanthoria node
Semantics for Science l Ontologies provide domain context l Link directly to data via EML, and to analytical workflows l Use logic engines for discovery and integration Elevation (m) Vegetation cover type P, juniper, 2200m, 16C P, pinyon, 2320m, 14C A, creosote, 1535m, 22C Sample 1, lat, long, presence Sample 3, lat, long, absence Sample 2, lat, long, presence Mean annual temperature (C) Access File Excel File Integrated data:
Ecological ontologies l What was measured (e.g., biomass) l Type of measurement (e.g., Energy) l Context of measurement (e.g., Psychotria limonensis) l How it was measured (e.g., dry weight)
l Label data with semantic types l Label inputs and outputs of analytical components with semantic types l Use reasoning engines to generate transformation steps l Beware analytical constraints l Use reasoning engine to discover relevant components Semantic Type Labeling DataOntologyWorkflow Components
Scientific workflows EML provides semi-automated data binding Scientific workflows represent knowledge about the process; AMS captures this knowledge
SEEK Analysis and Modeling System l Ontologies provide domain context l Link directly to data via EML, and to analytical workflows via MoML l Use logic engines for: l Discovery of data and analytical components l Integration of those components l Implementation l Design tool based on Ptolemy l Direct access to EcoGrid data within design tool l Individual workflow components execute as OGSA services
Analysis and Modeling system Training sample (d) GARP rule set (e) Test sample (d) Integrated layers (native range) (c) DiGIR Species presence & absence points (native range) (a) EcoGrid Query EcoGrid Query Layer Integration Layer Integration Sample + A3 + A2 + A1 Data Calculation MapValidation User ValidationMap SRB Environmental layers (invasion area) (b) Integrated layers (invasion area) (c) Invasion area prediction map (f) DiGIR Species presence &absence points (invasion area) (a) Native range prediction map (f) Model quality parameter (g) SRB Environmental layers (native range) (b) Model quality parameter (g) Slide from D. Pennington Scientific workflows represent knowledge about the process; AMS captures this knowledge
Aims of EcoGrid l Which, Where, How, Who ???? l Share Data and Information l Relate Data from multiple projects/groups l Crosswalks across data structures l Develop Eco-related Finding Aids for Data l Global User: Authenticate and Authorize l Provide an infrastructure for “Archivable Collection-building” for SEEK scientists l Facilitate the A&M layer and the SMS layer
Challenges of EcoGrid l Data & User Diversity l datasets & scientists l themes, methods, units,structures l Small data sizes but high complexity - metadata l Multiple Data Organizations l Biodiversity Surveys l Population data l GIS, Satellite Images, Weather Data, … l Ontologies & Taxonomies l Data Discovery: No single place to find l Data Entropy – rapid decline of information on data l Autonomy with Centralized access l Leverage Computational Grid work
Existing services l Metacat – syntactic and semantic metadata querying/inserting/updating/deleting, user registration/authentication, data replication, data/metadata versioning, - supports any XML- based metadata l Xanthoria – common-schema mediator (currently 8 sites) metadata query/insert/update/delete for any XML schema to underlying metadatabase (SQL, native XML)
Existing Systems l DiGIR – querying arbitrary XML-describable resources (underlying data sources can be any type: RDB, XMLDB). l ClimDB – integrating (using wrapping at the data source) diverse format climate data. Access through web, common schema identified beforehand – tabular description l HyperLTER – summary ontology as metadata for images put in as metadata, image extraction /geographicsubsetting/band-level subsetting/ - integration with MODIS images and Hyperspectral images, TM images, airphotos, …
Existing Systems l VegBank – 3 databases co-occurrence records, species taxonomic database that is concept-driven, community classification. Distributed vegbank, querying by plots. Querying/insert/update/annotate across three diverse databases that are described using XML l SRB – access distributed data, syntactic, semantics,user-defined (arbitrary relational) metadata based querying. Annotations for data. Opertions on data. Extraction of metadata. ingest,bulk ingest, delete,upate of data/metadata