Presentation is loading. Please wait.

Presentation is loading. Please wait.

Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego

Similar presentations


Presentation on theme: "Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego"— Presentation transcript:

1 Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego http://seek.ecoinformatics.org UC Santa Barbara UC San Diego U New Mexico U Kansas Vermont, Napier, ASU, UNC

2 SEEK Overview, 3/2004 2 Architecture Overview Analysis & Modeling System –Design and execution of ecological models and analysis –End user focus – application-/upperware Semantic Mediation System –Data Integration of hard- to-relate sources and processes –Semantic Types and Ontologies – upper middleware EcoGrid –Access to ecology data and tools – middle-/underware Plus Working Groups: – Knowledge Representation (SEEK-KR) – Classification and Nomenclature (TAXON) – Biodiversity and Ecological Analysis and Modeling (BEAM) (cf. GEON + Cyberinfrastructure)

3 SEEK Overview, 3/2004 3 SEEK EcoGrid Goal: standardize interfaces (using web and grid services) –We have standardized data via EML –Integrate diverse data networks from ecology, biodiversity, and environmental sciences Grid-standardized interfaces –Uniform interface to: Metacat, SRB, DiGIR, Xanthoria, etc. Anyone can implement these interfaces Hides complexity of underlying systems Metadata-mediated data access –Supports multiple metadata standards –EML, Darwin Core as foci Computational services –Pre-defined analytical services –On-the-fly analytical services

4 SEEK Overview, 3/2004 4 Grid versus Web Services Grid Services are Web Services –Add authentication, lifecycle management, notification, etc. –Globus Toolkit 3: Implements Open Grid Services Architecture (OGSA) Implications for use –Write a normal web service extending GridService base class –When deployed within GT3, you get these extra functions for ‘free’ –Supports distributed computation via proxy authentication Problems –Complex system to understand –GT3 can be difficult to deploy –Proposals to incorporate grid services within the Web services community (Web Services Resource Framework [WSRF])

5 SEEK Overview, 3/2004 5 EcoGrid client interactions Modes of interaction –Client-server –Fully distributed –Peer-to-peer EcoGrid Registry –Node discovery –Service discovery Aggregation services –Centralized access –Reliability –Data preservation

6 SEEK Overview, 3/2004 6 Building the EcoGrid ANDLUQHBRNTL Metacat node Legacy system LTER Network (24) Natural History Collections (>> 100) Organization of Biological Field Stations (180) UC Natural Reserve System (36) Partnership for Interdisciplinary Studies of Coastal Oceans (4) Multi-agency Rocky Intertidal Network (60) SRB node DiGIR node VCR VegBank node Xanthoria node

7 SEEK Overview, 3/2004 7 Kepler: Scientific Workflows EML provides semi-automated data binding Scientific workflows represent knowledge about the process; Kepler captures this knowledge Query EcoGrid to find data Archive output to EcoGrid

8 SEEK Overview, 3/2004 8 GARP Invasive Species Model Training sample (d) GARP rule set (e) Test sample (d) Integrated layers (native range) (c) DiGIR Species presence & absence points (native range) (a) EcoGrid Query EcoGrid Query Layer Integration Layer Integration Sample + A3 + A2 + A1 Data Calculation MapValidation User ValidationMap SRB Environmental layers (invasion area) (b) Integrated layers (invasion area) (c) Invasion area prediction map (f) DiGIR Species presence &absence points (invasion area) (a) Native range prediction map (f) Model quality parameter (g) SRB Environmental layers (native range) (b) Model quality parameter (g) Slide from D. Pennington Scientific workflows represent knowledge about the process; AMS captures this knowledge

9 SEEK Overview, 3/2004 9 Kepler Team, Projects, Sponsors Ilkay Altintas SDM Chad Berkley SEEK Shawn Bowers SEEK Jeffrey Grethe BIRN Christopher H. Brooks Ptolemy II Zhengang Cheng SDM Efrat Jaeger GEON Matt Jones SEEK Edward A. Lee Ptolemy II Kai Lin GEON Bertram Ludäscher BIRN, GEON, SDM, SEEK Steve Mock NMI Steve Neuendorffer Ptolemy II Jing Tao SEEK Mladen Vouk SDM Yang Zhao Ptolemy II … Ptolemy II

10 SEEK Overview, 3/2004 10 Kepler Understands EML Data (Chad Berkley, SEEK)

11 SEEK Overview, 3/2004 11 Kepler: Ecological Modeling (Chad Berkley, SEEK)

12 SEEK Overview, 3/2004 12 Database Access (Efrat Jaeger, GEON) Note: EML descriptions of relational sources would allow automated data ingestion

13 SEEK Overview, 3/2004 13 Mineral Classification with Kepler … (Efrat Jaeger, GEON)

14 SEEK Overview, 3/2004 14 … inside the Classifier

15 SEEK Overview, 3/2004 15 Standard BrowserUI: Client-Side SVG

16 SEEK Overview, 3/2004 16 SWF Reengineering (Ilkay, SDM; Ashraf, Efrat, Kai, GEON)

17 SEEK Overview, 3/2004 17 DataMapper Sub-Workflow

18 SEEK Overview, 3/2004 18 Result launched via BrowserUI actor (coupling with ESRI’s ArcIMS)

19 SEEK Overview, 3/2004 19 Distributed Workflows in KEPLER Web and Grid Service plug-ins –WSDL (now) and Grid services (stay tuned …) –ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard –SSH, SCP, SDSC SRB, OGS?-???… coming WS Harvester –Import query-defined WS operations as Kepler actors XSLT and XQuery Data Transformers –to link not “designed-to-fit” web services WS-deployment interface (planned)

20 SEEK Overview, 3/2004 20 Web Service Actor (Ilkay Altintas, SDM) Given a WSDL and the name of an operation of a web service, dynamically customizes itself to implement and execute that method. Configure - select service operation

21 SEEK Overview, 3/2004 21 Set Parameters and Commit Set parameters and commit

22 SEEK Overview, 3/2004 22 Specialized WS Actor (after instantiation)

23 SEEK Overview, 3/2004 23 Web Service Harvester (Ilkay Altintas, SDM) Imports the web services in a repository into the actor library. Has the capability to search for web services based on a keyword.

24 SEEK Overview, 3/2004 24 Kepler: Grid Services Access (Steve Mock, NMI)

25 SEEK Overview, 3/2004 25 An (oversimplified) Model of the Grid Hosts : {h1, h2, h3, …} Data @ Hosts : d1@{h i }, d2@{h j }, … Functions @ Hosts : f1@{h i }, f2@{h j }, … Given : data/workflow: … as a functional plan: […; Y := f(X); Z := g(Y); …] … as a logic plan: […; f(X,Y)  g(Y,Z); …] Find Host Assignment : d i  h i, f j  h j for all d i, f j … s.t. […; d3@h3 := f@h2(d1@h1), …] is a valid plan f g X Y Z

26 SEEK Overview, 3/2004 26 Shipping & Handling Algebra (SHA) f@a x@b y@c f@a x@b y@c f@a x@b y@c f@a x@b y@c plan Y@C = F@A of X@B = 1.[ X@B to A, Y@A := F@A(X@A), Y@A to C ] 2.[ F@A => B, Y@B := F@B(X@B), Y@B to C ] 3.[ X@B to C, F@A => C, Y@C := F@C(X@C) ] Logical view Physical view: SHA Plans (1) (3) (2)

27 SEEK Overview, 3/2004 27 Grid-Enabling PTII: Handles AB GAGB 1.A  GA: get_handle 2.GA  A: return &X 3.A  B: send &X 4.B  GB: request &X 5.GB  GA: request &X 6.GA  GB: send *X 7.GB  B: send done(&X) Example : &X = “GA.17” *X = Candidate Formalisms : GridFTP SSH, SCP SDSC SRB OGS?-??? … WSRF? 1 2 3 4 5 6 7 Kepler space Grid space Logical token transfer (3) requires get_handle(1,2); then exec_handle(4,5,6,7) for completion.

28 SEEK Overview, 3/2004 28 Homogeneous Data Integration Integration of homogeneous or mostly homogeneous data via EML metadata is relatively straightforward

29 SEEK Overview, 3/2004 29 Heterogeneous Data integration Requires advanced metadata and processing –Attributes must be semantically typed –Collection protocols must be known –Units and measurement scale must be known –Measurement relationships must be known e.g., that ArealDensity=Count/Area

30 SEEK Overview, 3/2004 30 Label data with semantic types Label inputs and outputs of analytical components with semantic types Use reasoning engines to generate transformation steps –Beware analytical constraints Use reasoning engine to discover relevant components Semantic Mediation DataOntologyWorkflow Components

31 SEEK Overview, 3/2004 31 Ecological ontologies What was measured (e.g., biomass) Type of measurement (e.g., Energy) Context of measurement (e.g., Psychotria limonensis) How it was measured (e.g., dry weight) SEEK intends to enable community-created ecological ontologies using OWL –Represents a controlled vocabulary for ecological metadata

32 SEEK Overview, 3/2004 32 Extensions : Semantic Types Take concepts and relationships from an ontology to “semantically type” the data-in/out ports Application: e.g., design support: –smart/semi-automatic wiring, generation of “massaging actors” m 1 (normalize) p3p3 p4p4 Takes Abundance Count Measurements for Life Stages Returns Mortality Rate Derived Measurements for Life Stages

33 SEEK Overview, 3/2004 33

34 SEEK Overview, 3/2004 34

35 SEEK Overview, 3/2004 35 Semantic Types The semantic type signature –Type expressions over the (OWL) ontology m 1 (normalize) p3p3 p4p4 SemType m1 :: Observation & itemMeasured.AbundanceCount & hasContext.appliesTo.LifeStageProperty -> DerivedObservation & itemMeasured.MortalityRate & hasContext.appliesTo.LifeStageProperty

36 SEEK Overview, 3/2004 36 Extended Type System (here: OWL Semantic Types) SemType m1 :: Observation & itemMeasured.AbundanceCount & hasContext.appliesTo.LifeStageProperty  DerivedObservation & itemMeasured.MortalityRate & hasContext.appliesTo.LifeStageProperty Substructure association: XML raw-data =(X)Query=> object model =link => OWL ontology

37 SEEK Overview, 3/2004 37 Semantic Types for Scientific Workflows

38 SEEK Overview, 3/2004 38 Deriving Data Transformations from Semantic Service Registration [Bowers-Ludaescher, DILS’04]

39 SEEK Overview, 3/2004 39 Structural and Semantic Mappings [Bowers-Ludaescher, DILS’04]

40 SEEK Overview, 3/2004 40 Fundamental improvements for researchers –Global access to ecologically relevant data –Rapidly locate and utilize distributed computation –Capture, reproduce, extend analysis process SEEK Impact

41 SEEK Overview, 3/2004 41 Acknowledgements This material is based upon work supported by: The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676. PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research) Kepler contributors: SEEK, Ptolemy II, SDM/SciDAC, GEON


Download ppt "Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego"

Similar presentations


Ads by Google