Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards Semantic Typing Support for Scientific Workflows Bertram Ludäscher Knowledge-Based Information Systems Lab San Diego Supercomputer Center University.

Similar presentations


Presentation on theme: "Towards Semantic Typing Support for Scientific Workflows Bertram Ludäscher Knowledge-Based Information Systems Lab San Diego Supercomputer Center University."— Presentation transcript:

1 Towards Semantic Typing Support for Scientific Workflows Bertram Ludäscher Knowledge-Based Information Systems Lab San Diego Supercomputer Center University of California San Diego http://seek.ecoinformatics.orghttp://www.geongrid.org

2 B. Ludäscher – Scientific Data Management 2 Outline 1.Motivation: Traditional vs Scientific Data Integration 2.Semantic (a.k.a. Model-Based) Mediation 3.Scientific Workflows (a.k.a. Analysis Pipelines) 4.DB Theory Appetizer: Web Service Composition Through Declarative Queries

3 B. Ludäscher – Scientific Data Management 3 Information Integration Challenges System aspects: “Grid” Middleware distributed data & computing Web Services, WSDL/SOAP, OGSA, … sources = functions, files, data sets … Syntax & Structure: (XML-Based) Data Mediators wrapping, restructuring (XML) queries and views sources = (XML) databases Semantics: Model-Based/Semantic Mediators conceptual models and declarative views Knowledge Representation: ontologies, description logics (RDF(S),OWL...) sources = knowledge bases (DB+CMs+ICs) Syntax Structure Semantics System aspects  reconciling S 4 heterogeneities  “gluing” together resources  bridging information and knowledge gaps computationally

4 B. Ludäscher – Scientific Data Management 4 Information Integration from a DB Perspective Information Integration Problem – Given : data sources S 1,..., S k (DBMS, web sites,...) and user questions Q 1,..., Q n that can be answered using the S i – Find : the answers to Q 1,..., Q n The Database Perspective: source = “database”  S i has a schema (relational, XML, OO,...)  S i can be queried  define virtual (or materialized) integrated/global view G over S 1,..., S k using database query languages (SQL, XQuery,...)  questions become queries Q i against G(S 1,..., S k )

5 B. Ludäscher – Scientific Data Management 5 Standard (XML-Based) Mediator Architecture MEDIATOR Integrated Global (XML) View G Integrated View Definition G(..)  S 1 (..)…S k (..) USER/Client USER/Client 1. Query Q ( G (S 1,..., S k ) ) 1. Query Q ( G (S 1,..., S k ) ) S1S1 Wrapper (XML) View S2S2 Wrapper (XML) View SkSk Wrapper (XML) View web services as wrapper APIs 3. Q1 Q2 Q3 4. {answers(Q1)} {answers(Q2)} {answers(Q3)} 6. {answers(Q)}

6 B. Ludäscher – Scientific Data Management 6 Query Planning for Mediators Given: – User query Q : answer(…)  …G... –… & { G  … S … } global-as-view (GAV) –… & { S  … G … } local-as-view (LAV) –… & { false  … S … G… } integrity constraints (ICs) Find: –equivalent (or min. containing, max.contained) query plan Q’ : answer(…)  … S … Results: –A variety of results/algorithms; depending on classes of queries, views, and ICs: P, NP,…, undecidable –many variants still open

7 B. Ludäscher – Scientific Data Management 7 From Scientific Data Integration to Process & Application Integration (and back…) Data Integration –Database mediation + Knowledge-based extension  Query rewriting w/ GAV, LAV, ICs, access patterns “Process/Application”Integration –Scientific models (ocean, atmosphere, ecology, …), assimilation models (e.g., real-time data feeds), … –Data sets –Legacy tools  Components = web services  Applications = composite components (“workflows”)  Need for semantic type extensions

8 B. Ludäscher – Scientific Data Management 8 Geologic Map Integration Given : –Geologic maps from different state geological surveys (shapefiles w/ different data schemas) –Different ontologies: Geologic age ontology Rock type ontologies: –Multiple hierarchies (chemical, fabric, texture, genesis) from Geological Survey of Canada (GSC) –Single hierarchy from British Geological Survey (BGS) Problem –Support uniform queries against the multiple geologic maps using different ontologies –Support registration w/ ontology A, querying w/ ontology B

9 B. Ludäscher – Scientific Data Management 9 Ontology Mappings: Motivation Establish correspondences between ontologies  Integrate data sets which are registered to different ontologies  Query data sets through different ontologies Data set 1 Data set 2 Ontology A Ontology B register Ontology mappings queries

10 B. Ludäscher – Scientific Data Management 10 A Multi-Hierarchical Rock Classification Ontology (GSC) Composition Genesis Fabric Texture

11 B. Ludäscher – Scientific Data Management 11 Some enabling operations on “ontology data” Composition Concept expansion: what else to look for when asking for ‘Mafic’ what else to look for when asking for ‘Mafic’

12 B. Ludäscher – Scientific Data Management 12 Some enabling operations on “ontology data” Composition Generalization: finding data that is “like” X and Y finding data that is “like” X and Y

13 B. Ludäscher – Scientific Data Management 13 Implementation in OWL: Not only “for the machine” …

14 Geologic Map Integration domain knowledge domain knowledge Knowledge representation Ontologies!? Nevada Geoscientists + Computer Scientists Igneous Geoinformaticists +/- Energy GEON Metamorphism Equation: +/- a few hundred million years

15 B. Ludäscher – Scientific Data Management 15 Geology Workbench: Uploading Ontologies click on Ontology Submission Choose an OWL file to uploadClick to check its detail Name Space Can be used to import this ontology into others

16 B. Ludäscher – Scientific Data Management 16 Geology Workbench: Registering Data to an Ontology Step 1: Choose Classes Click on Submission Data set nameSelect a shapefile Choose an ontology class

17 B. Ludäscher – Scientific Data Management 17 Geology Workbench: Data Registration Step 2: Choose Columns for Selected Classes AREA PERIMETER AZ_1000 AZ_1000_ID GEO PERIOD ABBREV DESCR D_SYMBOL P_SYMBOL It contains information about geologic age

18 B. Ludäscher – Scientific Data Management 18 Geology Workbench: Data Registration Step 3: Resolve Mismatches Two terms are not matched any ontology terms Manually mapping algonkian into the ontology

19 B. Ludäscher – Scientific Data Management 19 Geology Workbench: Ontology-enabled Map Integrator Click on the name Choose interesting Classes All areas with the age Paleozoic

20 B. Ludäscher – Scientific Data Management 20 Geology Workbench: Change Ontology Submit a mapping Ontology mapping between British Rock Classification and Canadian Rock Classification Switch from Canadian Rock Classification to British Rock Classification Run it New query interface

21 B. Ludäscher – Scientific Data Management 21 Ontologies and Data Management Where do ontologies fit within data management architectures? An ontology is similar to a schema or conceptual model if one exists, but is –Developed independently of a particular application –Probably given in a different language –Inherently more general –Usually not a very good schema (weak structure)

22 B. Ludäscher – Scientific Data Management 22 Ontologies and Data Management Schema Conceptual Model Conceptual Model Ontology Data  Metadata Design Artifact use concepts from (explicitly or implicitly) How to define and refine an ontology? How to register a dataset to an ontology ?

23 B. Ludäscher – Scientific Data Management 23 Biomedical Informatics Research Network http://nbirn.net Biomedical Informatics Research Network http://nbirn.net Refining an Ontology – the logic way, enables “Source Contextualization”

24 B. Ludäscher – Scientific Data Management 24 Connecting Datasets to Ontologies: “Semantic Registration” DateSiteTransectSP_CodeCount 2000-09-08CARP1CRGI0 2000-09-08CARP4LOCH0 2000-09-08CARP7MUCA1 2000-09-22NAPL7LOCH1 2000-09-18NAPL1PAPA5 2000-09-28BULL1CYOS57 DateSiteTransectSP_CodeCount 2000-09-08CARP1CRGI0 2000-09-08CARP4LOCH0 2000-09-08CARP7MUCA1 2000-09-22NAPL7LOCH1 2000-09-18NAPL1PAPA5 2000-09-28BULL1CYOS57 DataCollectionEvent Measurement MeasurementContext MeasurableItem SpeciesCount SpeciesAbundance AbundanceCollectionEvent Location LTERSite SBLTERSite {naples,…} ⊑  contains.Measurement ⊑  measureOf.MeasurableItem ⊓  hasContext.MeasurementContext ⊑  hasTime.DateTime ⊓  hasLocation.Location ⊑  hasUnit.Unit ⊓  hasValue.UnitValue ⊑ MeasurableItem ⊓  hasSpecies.Species ⊓  hasUnit.RatioUnit … ⊑ Measurement ⊓  measureOf.SpeciesCount ⊑ DataCollectionEvent ⊓  contains.SpeciesAbundance ⊑  position.Coordinate ⊑ Location ⊑ LTERSite ⊓  position.SBLTERCoordinate ⊑ SBLTERSite How can we “ register ” the dataset to concepts in the Ontology? Ontology (snippet) Dataset

25 B. Ludäscher – Scientific Data Management 25 Purpose of Semantic Registration Expose “hidden” information: –What do attributes represent? –What do specific values represent? –What conceptual “objects” are in the dataset? Capture connections between the dataset and ontology to: –Find existing datasets (or parts of datasets) via ontological concepts (discovery) –Enable integration of datasets (mediation) –Generate metadata for new data products (in a pipeline)

26 B. Ludäscher – Scientific Data Management 26 Semantic Registration Framework Step 1 : Data provider selects relevant ontological concepts (for the dataset) Step 2 : The semantic registration system creates a structural representation based on chosen concepts (data provide refines if needed) Step 3 : The data provider maps the dataset information to the generated structural representation

27 B. Ludäscher – Scientific Data Management 27 Step1: Selecting Relevant Concepts DateSiteTransectSP_CodeCount 2000-09-08CARP1CRGI0 2000-09-08CARP4LOCH0 2000-09-08CARP7MUCA1 2000-09-22NAPL7LOCH1 2000-09-18NAPL1PAPA5 2000-09-28BULL1CYOS57 DateSiteTransectSP_CodeCount 2000-09-08CARP1CRGI0 2000-09-08CARP4LOCH0 2000-09-08CARP7MUCA1 2000-09-22NAPL7LOCH1 2000-09-18NAPL1PAPA5 2000-09-28BULL1CYOS57 Concepts from an Ontology Dataset DataCollectionEvent AbundanceCollectionEvent Measurement Abundance SpeciesAbundance MeasurableItem SpeciesCount Location LTERSite SBLTERSite naples Species … MeasurementContext …

28 B. Ludäscher – Scientific Data Management 28 Step1: Selecting Relevant Concepts DateSiteTransectSP_CodeCount 2000-09-08CARP1CRGI0 2000-09-08CARP4LOCH0 2000-09-08CARP7MUCA1 2000-09-22NAPL7LOCH1 2000-09-18NAPL1PAPA5 2000-09-28BULL1CYOS57 DateSiteTransectSP_CodeCount 2000-09-08CARP1CRGI0 2000-09-08CARP4LOCH0 2000-09-08CARP7MUCA1 2000-09-22NAPL7LOCH1 2000-09-18NAPL1PAPA5 2000-09-28BULL1CYOS57 Concepts from an Ontology Dataset DataCollectionEvent AbundanceCollectionEvent Measurement Abundance SpeciesAbundance MeasurableItem SpeciesCount Location LTERSite SBLTERSite naples Species … MeasurementContext …

29 B. Ludäscher – Scientific Data Management 29 Step2: Generate Object Model Concepts from an Ontology Abundance Collection Event SpeciesAbundanc e contains SpeciesCount measureOf Species hasSpecies RatioUnit hasUnit RatioValue hasValue DateTime SBLTERSite hasTime hasLoc DataCollectionEvent AbundanceCollectionEvent Measurement Abundance SpeciesAbundance MeasurableItem SpeciesCount Location LTERSite SBLTERSite naples Species … MeasurementContext …

30 B. Ludäscher – Scientific Data Management 30

31 B. Ludäscher – Scientific Data Management 31

32 B. Ludäscher – Scientific Data Management 32

33 Scientific Workflows

34 B. Ludäscher – Scientific Data Management 34 Promoter Identification Workflow (PIW) Source: Matt Coleman (LLNL)

35 B. Ludäscher – Scientific Data Management 35 Source: NIH BIRN (Jeffrey Grethe, UCSD)

36 B. Ludäscher – Scientific Data Management 36 Ecology: GARP Analysis Pipeline for Invasive Species Prediction Training sample (d) GARP rule set (e) Test sample (d) Integrated layers (native range) (c) Species presence & absence points (native range) (a) EcoGrid Query EcoGrid Query Layer Integration Layer Integration Sample Data + A3 + A2 + A1 Data Calculation Map Generation Validation User Validation Map Generation Integrated layers (invasion area) (c) Species presence &absence points (invasion area) (a) Native range prediction map (f) Model quality parameter (g) Environmental layers (native range) (b) Generate Metadata Archive To Ecogrid Registered Ecogrid Database Registered Ecogrid Database Registered Ecogrid Database Registered Ecogrid Database Environmental layers (invasion area) (b) Invasion area prediction map (f) Model quality parameter (g) Selected prediction maps (h) Source: NSF SEEK (Deana Pennington et. al, UNM)

37 B. Ludäscher – Scientific Data Management 37 Scientific Workflows: Some Findings More dataflow than (business) workflow Need for “programming extension” –Iterations over lists (foreach); filtering; functional composition; generic & higher-order operations (zip, map(f), …) Need for abstraction and nested workflows Need for data transformations Need for rich user interaction & workflow steering: –pause / revise / resume –select & branch; e.g., web browser capability at specific steps as part of a coordinated SWF Need for high-throughput transfers (“grid-enabling”, “streaming”) Need for persistence of intermediate products  data provenance (“virtual data” concept)

38 Our Starting Point: Dataflow Process Networks and Ptolemy II see!see! try!try! read!read! Source: Edward Lee et al. http://ptolemy.eecs.berkeley.edu/ptolemyII/

39 B. Ludäscher – Scientific Data Management 39 Kepler Team, Projects, Sponsors Ilkay Altintas SDM Chad Berkley SEEK Shawn Bowers SEEK Jeffrey Grethe BIRN Christopher H. Brooks Ptolemy II Zhengang Cheng SDM Efrat Jaeger GEON Matt Jones SEEK Edward A. Lee Ptolemy II Kai Lin GEON Ashraf Memon GEON Bertram Ludaescher BIRN, GEON, SDM, SEEK Steve Mock NMI Steve Neuendorffer Ptolemy II Mladen Vouk SDM Yang Zhao Ptolemy II … Ptolemy II

40 B. Ludäscher – Scientific Data Management 40 Commercial Workflow/Dataflow Systems

41 B. Ludäscher – Scientific Data Management 41 SCIRun: Problem Solving Environments for Large-Scale Scientific Computing SCIRun: PSE for interactive construction, debugging, and steering of large-scale scientific computations Component model, based on generalized dataflow programming Steve Parker (cs.utah.edu)

42 B. Ludäscher – Scientific Data Management 42 E-Science and Link-Up Buddies … … –Taverna, Scufl, Freefluo,.. –DiscoveryNet –Triana –ICENI –…

43 B. Ludäscher – Scientific Data Management 43 Dataflow Process Networks: Putting Computation Models first! Synchronous Dataflow Network (SDF) –Statically schedulable single-threaded dataflow Can execute multi-threaded, but the firing-sequence is known in advance –Maximally well-behaved, but also limited expressiveness Process Network (PN) –Multi-threaded dynamically scheduled dataflow –More expressive than SDF (dynamic token rate prevents static scheduling) –Natural streaming model Other Execution Models (“Domains”) –Implemented through different “Directors” actor typed i/o ports FIFO advanced push/pull

44 B. Ludäscher – Scientific Data Management 44 Promoter Identification Workflow (PIW) Source: Matt Coleman (LLNL)

45 B. Ludäscher – Scientific Data Management 45 Promoter Identification Workflow in Ptolemy-II [SSDBM’03] Execution Semantics

46 B. Ludäscher – Scientific Data Management 46 hand-crafted control solution; also: forces sequential execution! designed to fit hand-crafted Web-service actor Complex backward control-flow No data transformations available

47 B. Ludäscher – Scientific Data Management 47 Simplified Process Network PIW Back to purely functional dataflow process network (= a data streaming model !) Re-introducing map ( f ) to Ptolemy-II (was there in PT Classic)  no control-flow spaghetti  data-intensive apps  free concurrent execution  free type checking  automatic support to go from piw(GeneId) to PIW := map (piw) over [GeneId] map (f)-style iterators Powerful type checking Generic, declarative “programming” constructs Generic data transformation actors Forward-only, abstractable sub- workflow piw(GeneId)

48 B. Ludäscher – Scientific Data Management 48 Optimization by Declarative Rewriting PIW as a declarative, referentially transparent functional process  optimization via functional rewriting possible e.g. map(f o g) = map(f) o map(g) Details: –Technical report &PIW specification in Haskell map(f o g) instead of map(f) o map(g) Combination of map and zip http://kbi.sdsc.edu/SciDAC-SDM/scidac-tn-map-constructs.pdf

49 B. Ludäscher – Scientific Data Management 49 Web Services & Scientific Workflows in Kepler Web services = individual components (“actors”) “Minute-Made” Application Integration: –Plugging-in and harvesting web service components is easy and fast Rich SWF modeling semantics (“directors” and more): –Different and precise dataflow models of computation –Clear and composable component interaction semantics  Web service composition and application integration tool Coming soon: –Shrinked wrapped, pre-packaged “Kepler-to-Go” (v0.8) –SWFs with structural and semantic data types (better design support) –Grid-enabled web services (for big data, big computations,…) –Different deployment models (SWF  WS, web site, applet, …)

50 B. Ludäscher – Scientific Data Management 50 KEPLER Core Capabilities (1/2) Designing scientific workflows –Composition of actors (tasks) to perform a scientific WF Actor prototyping Accessing heterogeneous data –Data access wizard to search and retrieve Grid-based resources –Relational DB access and query –Ability to link to EML data sources

51 B. Ludäscher – Scientific Data Management 51 KEPLER Core Capabilities (2/2) Data transformation actors to link heterogeneous data Executing scientific workflows –Distributed and/or local computation –Various models for computational semantics and scheduling – SDFPN – SDF and PN : Most common for scientific workflows External computing environments: –C++, Python, C (… Perl--planned...) Deploying scientific tasks and workflows as web services themselves (… planned …)

52 B. Ludäscher – Scientific Data Management 52 The KEPLER GUI (Vergil) Drag and drop utilities, director and actor libraries.

53 B. Ludäscher – Scientific Data Management 53 Running the workflow

54 B. Ludäscher – Scientific Data Management 54 Distributed SWFs in KEPLER Web and Grid Service plug-ins –WSDL, and whatever comes after GWSDL –ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard WS Harvester –Imports all the operations of a specific WS (or of all the WSs in a UDDI repository) as Kepler actors WS-deployment interface (…ongoing work…) XSLT and XQuery transformers to link non-fitting services together

55 B. Ludäscher – Scientific Data Management 55 A Generic Web Service Actor Given a WSDL and the name of an operation of a web service, dynamically customizes itself to implement and execute that method. Configure - select service operation

56 B. Ludäscher – Scientific Data Management 56 Set Parameters and Commit Set parameters and commit

57 B. Ludäscher – Scientific Data Management 57 WS Actor after Instantiation

58 B. Ludäscher – Scientific Data Management 58 Web Service Harvester Imports the web services in a repository into the actor library. Has the capability to search for web services based on a keyword.

59 B. Ludäscher – Scientific Data Management 59 Composing 3 rd -Party WSs Output of previous web service User interaction & Transformations Input of next web service

60 B. Ludäscher – Scientific Data Management 60 Classifying with Kepler

61 B. Ludäscher – Scientific Data Management 61 Classifying with Kepler

62 B. Ludäscher – Scientific Data Management 62

63 B. Ludäscher – Scientific Data Management 63 SWF Designed in Kepler

64 B. Ludäscher – Scientific Data Management 64 Result launched via the BrowserUI actor

65 Querying Example

66 B. Ludäscher – Scientific Data Management 66 KEPLER and YOU Kepler … –is a community-based, cross- project, open source collaboration –uses web services as basic building blocks –has a joint CVS repository, mailing lists, web site, … –is gaining momentum thanks to contributors and contributions BSD-style license allows commercial spin-offs –a pre-packaged, shrink-wrapped version (“Kepler-to-GO”) coming soon to a place near you…

67 Now back to the “Semantics Stuff”

68 B. Ludäscher – Scientific Data Management 68 Semantic Types for Scientific Workflows

69 B. Ludäscher – Scientific Data Management 69 From Semantic to Structural Mappings

70 B. Ludäscher – Scientific Data Management 70 Structural and Semantic Mappings

71 B. Ludäscher – Scientific Data Management 71 Large collaborative NSF/ITR project: UNM, UCSB, UCSD, UKansas,.. Goals: global access to ecologically relevant data; rapidly locate and utilize distributed computation; (semi-)automate, streamline analysis process – “ Knowledge Discovery Workflows ” Summary I: Putting it all together for the Science Environment for Ecological Knowledge

72 B. Ludäscher – Scientific Data Management 72 Outline 1.Motivation: Traditional vs Scientific Data Integration 2.Semantic (a.k.a. Model-Based) Mediation 3.Scientific Workflows (a.k.a. Analysis Pipelines) 4.DB Theory Appetizer: Web Service Composition Through Declarative Queries

73 B. Ludäscher – Scientific Data Management 73 Planning with Limited Access Patterns (back to GAV mediation …) User query Q : answer(ISBN, Author, Title)  book(ISBN, Author, Title), catalog(ISBN, Author), not library(ISBN). Limited (web service) APIs ( access patterns ): –Src1.books: in : ISBN out : Author, Title –Src1.books: in : Author out : ISBN, Title –Src2.catalog: in : {} out : ISBN, Author –Src3.library: in : {} out : ISBN Note: Q is not executable, but feasible (equivalent to executable Q’: catalog ; book ; not library)

74 B. Ludäscher – Scientific Data Management 74 Query Feasibility is as hard as Containment Theorem [EDBT’04]: For UCQ neg queries Q: Q is feasible iff ans(Q)  Q The answerable part ans(Q) can be computed in quadratic time. Idea: scan Q for answerable literals, rescan, repeat until ans(Q) is reached Checking query containment Q1  Q2 is hard: –Already NP -complete for CQ (conjunctive queries) – Undecidable for FO (first-order logic queries)

75 B. Ludäscher – Scientific Data Management 75 Conjunctive Query Containment Given : conjunctive queries Q1, Q2 (aka Select-Project-Join queries) Problem : Is answers(D, Q1)  answers(D, Q2) for all databases D? If yes, we say that “Q1 is contained in Q2”; short: Q1  Q2 Examples : Q1 : answer(X)  student(X, cs) Q2 : answer(X)  student(X,Dept), advisor(X,Y), dept(Y,cs) Q3 : answer(X)  student(X,Dept) Quiz: – Q1  Q2 ? – No : not every student X necessarily has an adviser Y who is in the cs department! – Q1  Q3 ? – Yes : every cs student is student in some department (crux of the “proof”: Dept = cs) Homework: What about Q1  Q2 if we know that every student must have an advisor from the same department?

76 B. Ludäscher – Scientific Data Management 76 The World’s Shortest Conjunctive Query Containment Checker (an NP-complete problem): 7 lines in Prolog … Quiz: 1. find the bug in the 7 lines of code 2. Fix the bug (hint: add one more line of code) Moral: Short programs can be buggy too

77 B. Ludäscher – Scientific Data Management 77 Summary II: Got milk/eggs/meat/wool? Or: “ Die eierlegende Wollmilchsau …” Data Integration –query rewriting under GAV/LAV –w/ binding pattern constraints –distributed query processing Semantic Mediation –semantic integrity constraints, reasoning w/ plans, automated deduction –deductive database/logic programming technology, AI “stuff”... –Semantic Web technology Scientific Workflow Management –more procedural than database mediation (the scientist is the “query planner”) –deployment using web services

78 B. Ludäscher – Scientific Data Management 78 Large collaborative NSF/ITR project: UNM, UCSB, UCSD, UKansas,.. Goals: global access to ecologically relevant data; rapidly locate and utilize distributed computation; (semi-)automate, streamline analysis process – “ Knowledge Discovery Workflows ” Science Environment for Ecological Knowledge

79 B. Ludäscher – Scientific Data Management 79 Building the EcoGrid ANDLUQHBRNTL Metacat node Legacy system LTER Network (24) Natural History Collections (>> 100) Organization of Biological Field Stations (180) UC Natural Reserve System (36) Partnership for Interdisciplinary Studies of Coastal Oceans (4) Multi-agency Rocky Intertidal Network (60) SRB node DiGIR node VCR VegBank node Xanthoria node Source: Matthew Jones (UCSB)

80 B. Ludäscher – Scientific Data Management 80 Heterogeneous Data integration Requires advanced metadata and processing –Attributes must be semantically typed –Collection protocols must be known –Units and measurement scale must be known –Measurement relationships must be known e.g., that ArealDensity=Count/Area

81 B. Ludäscher – Scientific Data Management 81 Ecological ontologies What was measured (e.g., biomass) Type of measurement (e.g., Energy) Context of measurement (e.g., Psychotria limonensis) How it was measured (e.g., dry weight) SEEK intends to enable community-created ecological ontologies using OWL –Represents a controlled vocabulary for ecological metadata More about this in Bertram’s talk

82 B. Ludäscher – Scientific Data Management 82 Label data with semantic types (e.g. concept expressions in OWL) Label inputs and outputs of analytical components with semantic types Use reasoning engines to generate transformation steps –Observe analytical constraints Use reasoning engine to discover relevant components Semantic Mediation DataOntologyWorkflow Components


Download ppt "Towards Semantic Typing Support for Scientific Workflows Bertram Ludäscher Knowledge-Based Information Systems Lab San Diego Supercomputer Center University."

Similar presentations


Ads by Google