Download presentation
Presentation is loading. Please wait.
Published byLogan Adrian Smith Modified over 9 years ago
1
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer Center
2
Analytical Pipelines AS x TS 1 AS y AS z AS r TS 2 AS x TS 1 Analysis Step in an Execution Environment: SAS, MATLAB, etc. Transformation Step AS x AP 0 Library of Analysis steps & Analytical Pipeline ECOTaxon Parameter Ontologies & Taxonomies Semantic Mediation System Logic RulesQuery Processing Parameters w/Semantics AP 0
3
Scientific Workflows AS x TS 1 AS y AS z AS r TS 2 Search for relevant data (Query) AS x TS 1 AS z AS r TS 2 AS r TS 2 Iterative SW 0
4
Benefits Reusable analysis steps, pipelines, and workflows Formal documentation of methods (output in report format) Reproducibility of methods Visual creation and communication of methods Versioning Automated data typing and transformation
5
Ptolemy II demo
6
Geographic SpaceEcological Space Projection back onto geography Native range prediction Invaded range prediction Ecological Niche Modeling Results used for integration with other data realms (e.g., human populations, public health, etc.) Geospatial and remotely sensed data Vegetation class Precipitation Modified from B. Michener ecological niche modeling vegetation class Model of niche in ecological dimensions precipitation Model type: Linear regression (GRASP) Genetic algorithms (GARP) Biodiversity information … e.g., data from museum specimens
7
Ecological Niche Models Elevation (m) Vegetation cover type P, juniper, 2200m, 16C P, pinyon, 2320m, 14C A, creosote, 1535m, 22C Sample 3, lat, long, absence Mean annual temperature (C) Access File Excel File Integrated data: Sample 2, lat, long, presence Sample 1, lat, long, presence
8
GARP Native-Species Pipeline (informal) Training sample GARP rule set Test sample Species pres. & abs. points EcoGrid Query EcoGrid Query Layer Integration Sample Data + A3 + A2 + A1 Data Calculation Map Generation Validation User Model quality parameters Native range prediction map Env. layers Generate Metadata Archive To Ecogrid Selected prediction maps Physical Transformation Scaling EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase Integrated layers Integrated layers GARP rule set Species pres. & abs. points
9
GARP Native-Species Pipeline (informal) GARP rule set Species pres. & abs. points EcoGrid Query EcoGrid Query Layer Integration Data Calculation Map Generation Validation User Model quality parameters Native range prediction map Env. layers Generate Metadata Archive To Ecogrid Selected prediction maps Physical Transformation Scaling EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase Integrated layers GARP rule set Training sample Test sample Sample Data Integrated layers Species pres. & abs. points We will look at this analytic step + A3 + A2 + A1
10
Sample Data: Basic Input/Output parameters Sample Data + A3 + A2 + A1 Test Sample of Conditioned Data Training Sample of Conditioned Data Environmental Layers (temp., vegetation, etc.) Species presence points inputoutput Presence under environmental conditions Dependent- Variable Coordinates Independent- Variable Coordinates
11
Analytic-Step Abstractions Physical Level An analytic step is a particular software implementation that takes and produces physical data (for example, files) Logical Level Defines the structure of input and output (like a database schema) Semantic Level Uses ontological information to conceptually define the analytic step (for discovery and integration)
12
Analytic-Step Abstractions Physical Level An analytic step is a particular software implementation that takes and produces physical data (for example, files) Logical Level Defines the structure of input and output (like a database schema) Semantic Level Uses ontological information to conceptually define the analytic step (for discovery and integration)
13
Sample Data: Physical Level parameters Sample Data + A3 + A2 + A1 33.454606, 106.789098; 33.454606, 106.789097; … 33.454606, 106.789098, 56.25; 33.454606, 106.789097, 56.37; … 1, 56.25, 0, 20, …, 44; 0, 57.34, 0, 55, …, 14; … 0, 77.33, 1, 50, …, 44; 1, 56.01, 0, 55, …, 14; … inputoutput An actual program that implements Sample Data 33.454606, 106.789098, 56.25; 33.454606, 106.789097, 56.37; … 33.454606, 106.789098, 56.25; 33.454606, 106.789097, 56.37; … Data as comma-delimited, plain text files
14
Analytic-Step Abstractions Physical Level An analytic step is a particular software implementation that takes and produces physical data (for example, files) Logical Level Defines the structure of input and output (like a database schema) Semantic Level Uses ontological information to conceptually define the analytic step (for discovery and integration)
15
GARP Native-Species Pipeline (informal) GARP rule set Species pres. & abs. points EcoGrid Query EcoGrid Query Layer Integration Data Calculation Map Generation Validation User Model quality parameters Native range prediction map Env. layers Generate Metadata Archive To Ecogrid Selected prediction maps Physical Transformation Scaling EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase Integrated layers GARP rule set Training sample Test sample Sample Data Integrated layers Species pres. & abs. points We will look at this analytic step + A3 + A2 + A1
16
Sample Data: Basic Input/Output parameters Sample Data + A3 + A2 + A1 Test Sample of Conditioned Data Training Sample of Conditioned Data Environmental Layers (temp., vegetation, etc.) Species presence points inputoutput Presence under environmental conditions Dependent- Variable Coordinates Independent- Variable Coordinates
17
Analytic-Step Abstractions Physical Level An analytic step is a particular software implementation that takes and produces physical data (for example, files) Logical Level Defines the structure of input and output (like a database schema) Semantic Level Uses ontological information to conceptually define the analytic step (for discovery and integration)
18
Analytic-Step Abstractions Physical Level An analytic step is a particular software implementation that takes and produces physical data (for example, files) Logical Level Defines the structure of input and output (like a database schema) Semantic Level Uses ontological information to conceptually define the analytic step (for discovery and integration)
19
Sample Data: Physical Level parameters Sample Data + A3 + A2 + A1 33.454606, 106.789098; 33.454606, 106.789097; … 33.454606, 106.789098, 56.25; 33.454606, 106.789097, 56.37; … 1, 56.25, 0, 20, …, 44; 0, 57.34, 0, 55, …, 14; … 0, 77.33, 1, 50, …, 44; 1, 56.01, 0, 55, …, 14; … inputoutput An actual program that implements Sample Data 33.454606, 106.789098, 56.25; 33.454606, 106.789097, 56.37; … 33.454606, 106.789098, 56.25; 33.454606, 106.789097, 56.37; … Data as comma-delimited, plain text files
20
Analytic-Step Abstractions Physical Level An analytic step is a particular software implementation that takes and produces physical data (for example, files) Logical Level Defines the structure of input and output (like a database schema) Semantic Level Uses ontological information to conceptually define the analytic step (for discovery and integration)
21
Logical descriptions Recall that a schema sets the allowable structure for data Employee name : stringage : integerssn : stringtitle : stringsalary : int Smith40555-…5 Jones36555-…4 Davis22555-…2 Clark50555-…Mgr.75000 Lewis36555-…Sales40000 These tables are not allowable instances of the logical description Allen Young too many columnstoo few columns, wrong datatypes
22
Sample Data: Logical Level parameters Sample Data + A3 + A2 + A1 matrix[x, y] list(matrix[x, y, z]) sample1(pres, temp, veg, …, z n ) inputoutput sample2(pres, temp, veg, …, z n ) 2-dimensional matrix List of 3-dimensional matrices, one matrix per environmental layer Relation of n+1 attributes for n environmental layers
23
Why have the Logical Level? Data independence Hides the details of how information is represented (text or binary files) from what is represented (a table of integers) Reduced application development time Makes information more easily reusable, for example, by other applications or services – with programs for handling the physical/logical level Can help enable integration Explicit knowledge of the structure and types of data can help automate conversion, for example, by using higher-level languages
24
Choosing a logical representation parameters Sample Data + A3 + A2 + A1 matrix[x, y] list(matrix[x, y, z]) sample1(pres, temp, veg, …, z n ) inputoutput sample2(pres, temp, veg, …, z n ) 2-dimensional matrix List of 3-dimensional matrices, one matrix per environmental layer Relation of n+1 attributes for n environmental layers Can you see any potential problems with this choice of logical output?
25
Choosing a logical representation Sample Data + A3 + A2 + A1 matrix[x, y] list(matrix[x, y, z]) sample1(pres, z 1, z 2, …, z n ) sample2(pres, z 1, z 2, …, z n ) Service avail(pres, temp, veg, elev) The output structure is dependent on the input data… ? + A3 + A2 + A1
26
GARP Native-Species Pipeline (informal) GARP rule set Species pres. & abs. points EcoGrid Query EcoGrid Query Layer Integration Data Calculation Map Generation Validation User Model quality parameters Native range prediction map Env. layers Generate Metadata Archive To Ecogrid Selected prediction maps Physical Transformation Scaling EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase Integrated layers GARP rule set Training sample Test sample Sample Data Integrated layers Species pres. & abs. points We will look at this analytic step + A3 + A2 + A1
27
Sample Data: Basic Input/Output parameters Sample Data + A3 + A2 + A1 Test Sample of Conditioned Data Training Sample of Conditioned Data Environmental Layers (temp., vegetation, etc.) Species presence points inputoutput Presence under environmental conditions Dependent- Variable Coordinates Independent- Variable Coordinates
28
Analytic-Step Abstractions Physical Level An analytic step is a particular software implementation that takes and produces physical data (for example, files) Logical Level Defines the structure of input and output (like a database schema) Semantic Level Uses ontological information to conceptually define the analytic step (for discovery and integration)
29
Analytic-Step Abstractions Physical Level An analytic step is a particular software implementation that takes and produces physical data (for example, files) Logical Level Defines the structure of input and output (like a database schema) Semantic Level Uses ontological information to conceptually define the analytic step (for discovery and integration)
30
Sample Data: Physical Level parameters Sample Data + A3 + A2 + A1 33.454606, 106.789098; 33.454606, 106.789097; … 33.454606, 106.789098, 56.25; 33.454606, 106.789097, 56.37; … 1, 56.25, 0, 20, …, 44; 0, 57.34, 0, 55, …, 14; … 0, 77.33, 1, 50, …, 44; 1, 56.01, 0, 55, …, 14; … inputoutput An actual program that implements Sample Data 33.454606, 106.789098, 56.25; 33.454606, 106.789097, 56.37; … 33.454606, 106.789098, 56.25; 33.454606, 106.789097, 56.37; … Data as comma-delimited, plain text files
31
Analytic-Step Abstractions Physical Level An analytic step is a particular software implementation that takes and produces physical data (for example, files) Logical Level Defines the structure of input and output (like a database schema) Semantic Level Uses ontological information to conceptually define the analytic step (for discovery and integration)
32
Logical descriptions Recall that a schema sets the allowable structure for data Employee name : stringage : integerssn : stringtitle : stringsalary : int Smith40555-…5 Jones36555-…4 Davis22555-…2 Clark50555-…Mgr.75000 Lewis36555-…Sales40000 These tables are not allowable instances of the logical description Allen Young too many columnstoo few columns, wrong datatypes
33
Sample Data: Logical Level parameters Sample Data + A3 + A2 + A1 matrix[x, y] list(matrix[x, y, z]) sample1(pres, temp, veg, …, z n ) inputoutput sample2(pres, temp, veg, …, z n ) 2-dimensional matrix List of 3-dimensional matrices, one matrix per environmental layer Relation of n+1 attributes for n environmental layers
34
Why have the Logical Level? Data independence Hides the details of how information is represented (text or binary files) from what is represented (a table of integers) Reduced application development time Makes information more easily reusable, for example, by other applications or services – with programs for handling the physical/logical level Can help enable integration Explicit knowledge of the structure and types of data can help automate conversion, for example, by using higher-level languages
35
Choosing a logical representation parameters Sample Data + A3 + A2 + A1 matrix[x, y] list(matrix[x, y, z]) sample1(pres, temp, veg, …, z n ) inputoutput sample2(pres, temp, veg, …, z n ) 2-dimensional matrix List of 3-dimensional matrices, one matrix per environmental layer Relation of n+1 attributes for n environmental layers Can you see any potential problems with this choice of logical output?
36
Choosing a logical representation Sample Data + A3 + A2 + A1 matrix[x, y] list(matrix[x, y, z]) sample1(pres, z 1, z 2, …, z n ) sample2(pres, z 1, z 2, …, z n ) Service avail(pres, temp, veg, elev) The output structure is dependent on the input data… ? + A3 + A2 + A1
37
Choosing a logical representation Sample Data + A3 + A2 + A1 matrix[x, y] list(matrix[x, y, z]) sample1(obs, property, value) sample2(obs, property, value) Service avail(obs, property, value) Reusability is easier when the logical representation is known ahead of time…
38
Analytic-Step Abstractions Physical Level An analytic step is a particular software implementation that takes and produces physical data (for example, files) Logical Level Defines the structure of input and output (like a database schema) Semantic Level Uses ontological information to conceptually define the analytic step (for discovery and integration)
39
Sample Data: Semantic input/output Ecological Model Biodiversity Model EcoNiche Model Regression Based ENM Logistic Regression Model Statistical Model usesRegressionModel Dependent Variable Independent Variable Statistical Variable Statistical Context hasIndVarhasDepVar hasContext
40
Putting it all together parameters Sample Data + A3 + A2 + A1 inputoutput Physical = Data Logical + Semantic Metadata list(matrix[x, y, z]) 33.454606, 106.789098, 56.25; 33.454606, 106.789097, 56.37; … 33.454606, 106.789098, 56.25; 33.454606, 106.789097, 56.37; … 33.454606, 106.789098, 56.25; 33.454606, 106.789097, 56.37; … Independent Variable hasContext Grid Coordinate Statistical Context Dependent Variable hasContext Grid Coordinate Statistical Context matrix[x, y] 33.454606, 106.789098; 33.454606, 106.789097; … Statistical Dataset Dependent Variable Independent Variable hasDepVar hasIndVar sample1(obs, property, value) 1, 56.25, 0, 20, …, 44; 0, 57.34, 0, 55, …, 14; … Statistical Dataset Dependent Variable Independent Variable hasDepVar hasIndVar sample2(obs, property, value) 1, 56.25, 0, 20, …, 44; 0, 57.34, 0, 55, …, 14; …
41
Domain Workflow Training sample GARP rule set Test sample Species pres. & abs. points EcoGrid Query EcoGrid Query Layer Integration Sample Data + A3 + A2 + A1 Data Calculation Map Generation Validation User Model quality parameters Native range prediction map Env. layers Generate Metadata Archive To Ecogrid Selected prediction maps Physical Transformation Scaling EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase Integrated layers Integrated layers GARP rule set Species pres. & abs. points
42
Generic Workflow Training sample GARP (or other) rule set Test sample Occurrence Data Binary, Categorical or Numeric EcoGrid Query EcoGrid Query Layer Integration Sample Data + A3 + A2 + A1 Data Calculation Map Generation Validation User Model quality parameters Prediction map Environmental layers Generate Metadata Archive To Ecogrid Selected prediction maps Physical Transformation Scaling EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase Integrated layers Integrated layers GARP rule set
43
Temperature Interpolation Workflow Training sample GARP rule set Test sample Weather station temperature data EcoGrid Query EcoGrid Query Layer Integration Sample Data + A3 + A2 + A1 Data Calculation Map Generation Validation User Model quality parameters Prediction map: Interpolated temperature grid Environmental layers: elevation, aspect, land cover Generate Metadata Archive To Ecogrid Selected prediction maps Physical Transformation Scaling EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase Integrated layers Integrated layers GARP rule set
44
Extending Workflows: Climate AS x TS 1 AS y AS z AS r TS 2 Current environmental layers: Prediction maps under current conditions AS x TS 1 AS y AS z AS r TS 2 Changed environmental layers: Prediction maps under changed conditions Compare to get predicted effect of environmental change on species Prediction model from native area
45
Extending Workflows: Invasion AS x TS 1 AS y AS z AS r TS 2 Native area occurrence and environmental layers: Prediction maps in native area AS x TS 1 AS y AS z AS r TS 2 Invasion area environmental layers: Prediction maps in invasion area Prediction model from native area
46
Process 1.Create the domain workflow at a conceptual level 2.Define the physical and logical data types for each step 3.Define the ontological data types for each step, for both the domain and a generic ontology 4.Map the domain workflow to a generic workflow 5.Map the generic workflow to other domain workflows
47
Exercise Divide into two groups (roughly half in each): Divide into two groups (roughly half in each): 1. Climate change 2. Invasive species Download generic workflow from: Download generic workflow from:ftp://ftp.lternet.edu/pub/outgoing/penningd Work on conceptual workflows that: Work on conceptual workflows that: 1. Reuse the generic pipeline 2. Extend the generic pipeline 3. Create new pipelines Use Power Point, Visio, or paper tablets…your choice! Use Power Point, Visio, or paper tablets…your choice!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.