Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modeling Data Product Generation

Similar presentations


Presentation on theme: "Modeling Data Product Generation"— Presentation transcript:

1 Modeling Data Product Generation
Bill Howe Dave Maier

2 Data Product Management
Thesis: The value of an EOFS is the number of products it provides Limits on #’s of products Amount of oversight for current products Time to create a new product Resources required to generate products 11/14/2018 Modeling Data Product Generation

3 Modeling Data Products
Data Product Definitions (DPDs) or “recipes” initially for documentation “blueprint” for manual construction 11/14/2018 Modeling Data Product Generation

4 Modeling Data Product Generation
Beyond Documentation Quality Analysis and Translation calculate quality metrics from DPDs (e.g., resolution) translate DPDs into executable network of Infopipes (meeting a quality standard) 11/14/2018 Modeling Data Product Generation

5 Modeling Data Product Generation
Product Generation and Documentation management and scheduling of product suite based on input avail, resources, dissem. req. job shop  assembly line adaptive eventually; priorities, feedback to sensors and models Performance Optimization algebraic optimization common subresults & shared scans on groups of products 11/14/2018 Modeling Data Product Generation

6 Modeling Data Product Generation
Remote Computation “product kit”: final product built at consumer site remote “product factory” 11/14/2018 Modeling Data Product Generation

7 Exercise: Fill in the Acronym CORMORANT
COlumbia River Modeling, Observation, Retrieval?? & Archive… 11/14/2018 Modeling Data Product Generation

8 Modeling Data Product Generation
Roadmap Vision Status Past Graphical Diagram Process Modeling Type System Current Abstract Grids Grid Functions 11/14/2018 Modeling Data Product Generation

9 Graphical System Description
Studied relevant files and codes to model: Producers and consumers Control flow Data flow Benefits: understanding within the project communication outside the project Drawbacks: only a ‘snapshot’ very literal no scheduling help... 11/14/2018 Modeling Data Product Generation

10 Modeling Data Product Generation
Brittle Scheduling Contentious codes cause crashes Annotate the diagram with cron job information? But, it would be nice to capture real executions of all system components for careful study 11/14/2018 Modeling Data Product Generation

11 Modeling Data Product Generation
Instrumenting CORIE Model the executions of codes using a relational database Monitor CORIE activity using SGI’s FAM technology Try to identify bottlenecks, problem spots, and resource consumption properties Status: we’re poised to perform further testing; some security concerns have been raised 11/14/2018 Modeling Data Product Generation

12 More than just processes...
The model is too close of a fit Let’s start at a higher level... 11/14/2018 Modeling Data Product Generation

13 A Candidate Type System
Relevant types: TimeSeries (TS) ElementField (EF) / NodeField (NF) DepthField (DF) Ex: salt.63 = TS (EF (DF Salinity)) fort.21 = EF Depth findmax63 = TS (EF (DF a))  TS (EF a) 11/14/2018 Modeling Data Product Generation

14 Abstract Data Product Recipes
But consider compute_plumevol: Grid Vol select(sal<30) subgrid(Ocean) Elev Vol sum(grid) + plumevol This informal recipe seems appropriate regardless of the specifics of our data representation This information should be captured somewhere! Currently it’s obfuscated by c codes, and tightly coupled with the TS (EF (DF a)) structure 11/14/2018 Modeling Data Product Generation

15 Modeling Data Product Generation
Topological Grid A more general grid Gd is a collection of k-cells of dimension k, k in {0..d} A grid function GF is a mapping from a k-cell to a value of type T GF : k-cell  T 11/14/2018 Modeling Data Product Generation

16 Modeling Data Product Generation
Imagine a big 4d grid representing our current best data hindcast experimental ELCIRC vers missing hindcast forecast Grid Functions (GF) map grid locations to values 15º C 23.4 psu 11/14/2018 Modeling Data Product Generation

17 Modeling Data Product Generation
Grid Functions We can derive new grid functions from our original set GF Salt GF Magnitude GF Velocity GF Velo N’hood GF Temp GF Vorticity GF Elev GF Neighbors 11/14/2018 Modeling Data Product Generation

18 Modeling Data Product Generation
Benefits Say we have recipes that involve a grid, some grid functions, and some operators So what? Well, We can reason about data product outputs We can optimize recipe execution 11/14/2018 Modeling Data Product Generation

19 Modeling Data Product Generation
Reasoning about Types GF Velocity applytoall(vort) GF Vorticity GF Salt applytoall(vort) GF ??? High level recipes can detect this kind of error before wasting compute resources 11/14/2018 Modeling Data Product Generation

20 Reasoning about Schema
GF1 subgrid(Ocean) GF2 type(GF1) = type(GF2 ), but schema(GF1)  schema(GF2 ) since GF2 is defined over a smaller grid than GF1 By tracking schema information through complex recipes we can: check for errors estimate resource requirements (big schema require big buffers) a valid transect an invalid transect 11/14/2018 Modeling Data Product Generation

21 Reasoning about Quality
Say we have operators coarsen and refine which lower resolution via grouping and raise resolution via interpolation, respectively type(GF1) = type(GF2), schema(GF1) = schema(GF2), but qual(GF1)  qual(GF2) GF1 coarsen refine GF2 11/14/2018 Modeling Data Product Generation

22 Optimize via Algebraic Manipulations
Different sequences of operators can give equivalent results GF Elev computevol subgrid(Ocean) GF Vol GF Area ... GF Elev subgrid(Ocean) GF Vol GF Area computevol ... These are equivalent, but the second avoids computing volume over the entire grid 11/14/2018 Modeling Data Product Generation

23 Optimize via Choice of Implementation
GF Salt select(s < 30) ? GF Bool F T GF (Maybe Salt) - 22 24 23 {KCell} {c1, c2, c3} 11/14/2018 Modeling Data Product Generation

24 Optimize via Shared Intermediate Results
A Node’s neighbors don’t often change, so we can avoid re-computing this result GF Velocity GF Velo N’hood GF Vorticity GF Neighbors GF Salt N’hood GF Salinity GF Salt Gradient 11/14/2018 Modeling Data Product Generation

25 Modeling Data Product Generation
Other niceties... We don’t have to re-implement everything to realize benefits But eventually we’ll want to wag the dog! A collection of recipes can help... communicate the product catalog provide provenance Derive new recipes from parts of old ones support for product lines 11/14/2018 Modeling Data Product Generation

26 Modeling Data Product Generation
Summary Modeling the current CORIE Graphical System Description pmon Modeling the future CORIE Grid Functions Recipes Reasoning Optimization 11/14/2018 Modeling Data Product Generation

27 Modeling Data Product Generation
Milestones RPE this spring Specify existing data products using the model Perform checks on existing production plans Type Schema / Resources Quality 11/14/2018 Modeling Data Product Generation

28 Modeling Data Product Generation
11/14/2018 Modeling Data Product Generation

29 A Thorough Experiment Management Schema
11/14/2018 Modeling Data Product Generation

30 Modeling Data Product Generation
task definition A Good Start... task instance (with parameters) task execution 11/14/2018 Modeling Data Product Generation

31 Modeling Data Product Generation
pmon (Process Monitor) Database Web Server pmon Architecture fam (File Alteration Monitor) imon, dnotify, or polling, depending on kernel patch Filesystem pacct (stopped process stats) /proc (running process info) acct (process accounting) Process to Monitor Linux Kernel 11/14/2018 Modeling Data Product Generation


Download ppt "Modeling Data Product Generation"

Similar presentations


Ads by Google