Modeling Data Product Generation

Slides:



Advertisements
Similar presentations
1 Computational Asset Description for Cyber Experiment Support using OWL Telcordia Contact: Marian Nodine Telcordia Technologies Applied Research
Advertisements

Starfish: A Self-tuning System for Big Data Analytics.
Database System Concepts and Architecture
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
INTRODUCTION COMPUTATIONAL MODELS. 2 What is Computer Science Sciences deal with building and studying models of real world objects /systems. What is.
Three Flavors of Data Science Data Simulations and Sensor Readings Catalog Data Metadata; descriptors of datasets, data products and other processing artifacts.
Process-oriented System Automation Executable Process Modeling & Process Automation.
Test coverage Tor Stålhane. What is test coverage Let c denote the unit type that is considered – e.g. requirements or statements. We then have C c =
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Chapter 4 User Experience Model. User experience model (Ux) Visual specification of the user interface Visual specification of the user interface Both.
Week 5: Business Processes and Process Modeling MIS 2101: Management Information Systems.
ITEC224 Database Programming
Based on D. Galin, and R. Patton.  According to D. Galin  Software quality assurance is:  A systematic, planned set of actions necessary to provide.
An Introduction to Software Architecture
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
2. Database System Concepts and Architecture
Informix IDS Administration with the New Server Studio 4.0 By Lester Knutsen My experience with the beta of Server Studio and the new Informix database.
RISICO on the GRID architecture First implementation Mirko D'Andrea, Stefano Dal Pra.
Secure Systems Research Group - FAU Classifying security patterns E.B.Fernandez, H. Washizaki, N. Yoshioka, A. Kubo.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Testing Workflow In the Unified Process and Agile/Scrum processes.
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
Introduction to Software Development. Systems Life Cycle Analysis  Collect and examine data  Analyze current system and data flow Design  Plan your.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
Computing and SE II Chapter 9: Design Methods and Design Models Er-Yu Ding Software Institute, NJU.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
CS223: Software Engineering
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 1 / 10 Grid Monitoring A conceptual introduction to GridICE Sergio Andreozzi
Grid as a Service. Agenda Targets Overview and awareness of the obtained material which determines the needs for defining Grid as a service and suggest.
Databases (CS507) CHAPTER 2.
Chapter 4: Business Process and Functional Modeling, continued
CIM Modeling for E&U - (Short Version)
Definition CASE tools are software systems that are intended to provide automated support for routine activities in the software process such as editing.
Overview – SOE PatchTT November 2015.
ALICE Monitoring
Database Management System
Chapter 2: System Structures
Appendix D: Network Model
Overview Part 1 – Design Procedure Beginning Hierarchical Design
Cloud based Open Source Backup/Restore Tool
Chapter 12: Query Processing
Software Design Methodology
Intelligent Agents Chapter 2.
Database System Concepts and Architecture
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Chapter 15 QUERY EXECUTION.
Chapter 12: Automated data collection methods
湖南大学-信息科学与工程学院-计算机与科学系
Leigh Grundhoefer Indiana University
Logical information model LIM Geneva june
An Introduction to Software Architecture
Test coverage Tor Stålhane.
Laura Bright David Maier Portland State University
Wide Area Workload Management Work Package DATAGRID project
Dynamic Program Analysis
Overview of Workflows: Why Use Them?
Software Architecture
Design Yaodong Bi.
Database System Concepts and Architecture
5/7/2019 Map Reduce Map reduce.
Software Development Process Using UML Recap
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Production Manager Tools (New Architecture)
Chapter 2: Building a System
GGF10 Workflow Workshop Summary
Building a “System” Moving from writing a program to building a system. What’s the difference?! Complexity, size, complexity, size complexity Breadth.
Implementation Plan system integration required for each iteration
Presentation transcript:

Modeling Data Product Generation Bill Howe Dave Maier

Data Product Management Thesis: The value of an EOFS is the number of products it provides Limits on #’s of products Amount of oversight for current products Time to create a new product Resources required to generate products 11/14/2018 Modeling Data Product Generation

Modeling Data Products Data Product Definitions (DPDs) or “recipes” initially for documentation “blueprint” for manual construction 11/14/2018 Modeling Data Product Generation

Modeling Data Product Generation Beyond Documentation Quality Analysis and Translation calculate quality metrics from DPDs (e.g., resolution) translate DPDs into executable network of Infopipes (meeting a quality standard) 11/14/2018 Modeling Data Product Generation

Modeling Data Product Generation Product Generation and Documentation management and scheduling of product suite based on input avail, resources, dissem. req. job shop  assembly line adaptive eventually; priorities, feedback to sensors and models Performance Optimization algebraic optimization common subresults & shared scans on groups of products 11/14/2018 Modeling Data Product Generation

Modeling Data Product Generation Remote Computation “product kit”: final product built at consumer site remote “product factory” 11/14/2018 Modeling Data Product Generation

Exercise: Fill in the Acronym CORMORANT COlumbia River Modeling, Observation, Retrieval?? & Archive… 11/14/2018 Modeling Data Product Generation

Modeling Data Product Generation Roadmap Vision Status Past Graphical Diagram Process Modeling Type System Current Abstract Grids Grid Functions 11/14/2018 Modeling Data Product Generation

Graphical System Description Studied relevant files and codes to model: Producers and consumers Control flow Data flow Benefits: understanding within the project communication outside the project Drawbacks: only a ‘snapshot’ very literal no scheduling help... 11/14/2018 Modeling Data Product Generation

Modeling Data Product Generation Brittle Scheduling Contentious codes cause crashes Annotate the diagram with cron job information? But, it would be nice to capture real executions of all system components for careful study 11/14/2018 Modeling Data Product Generation

Modeling Data Product Generation Instrumenting CORIE Model the executions of codes using a relational database Monitor CORIE activity using SGI’s FAM technology Try to identify bottlenecks, problem spots, and resource consumption properties Status: we’re poised to perform further testing; some security concerns have been raised 11/14/2018 Modeling Data Product Generation

More than just processes... The model is too close of a fit Let’s start at a higher level... 11/14/2018 Modeling Data Product Generation

A Candidate Type System Relevant types: TimeSeries (TS) ElementField (EF) / NodeField (NF) DepthField (DF) Ex: salt.63 = TS (EF (DF Salinity)) fort.21 = EF Depth findmax63 = TS (EF (DF a))  TS (EF a) 11/14/2018 Modeling Data Product Generation

Abstract Data Product Recipes But consider compute_plumevol: Grid Vol select(sal<30) subgrid(Ocean) Elev Vol sum(grid) + plumevol This informal recipe seems appropriate regardless of the specifics of our data representation This information should be captured somewhere! Currently it’s obfuscated by c codes, and tightly coupled with the TS (EF (DF a)) structure 11/14/2018 Modeling Data Product Generation

Modeling Data Product Generation Topological Grid A more general grid Gd is a collection of k-cells of dimension k, k in {0..d} A grid function GF is a mapping from a k-cell to a value of type T GF : k-cell  T 11/14/2018 Modeling Data Product Generation

Modeling Data Product Generation Imagine a big 4d grid representing our current best data hindcast experimental ELCIRC vers missing hindcast forecast Grid Functions (GF) map grid locations to values 15º C 23.4 psu 11/14/2018 Modeling Data Product Generation

Modeling Data Product Generation Grid Functions We can derive new grid functions from our original set GF Salt GF Magnitude GF Velocity GF Velo N’hood GF Temp GF Vorticity GF Elev GF Neighbors 11/14/2018 Modeling Data Product Generation

Modeling Data Product Generation Benefits Say we have recipes that involve a grid, some grid functions, and some operators So what? Well, We can reason about data product outputs We can optimize recipe execution 11/14/2018 Modeling Data Product Generation

Modeling Data Product Generation Reasoning about Types GF Velocity applytoall(vort) GF Vorticity GF Salt applytoall(vort) GF ??? High level recipes can detect this kind of error before wasting compute resources 11/14/2018 Modeling Data Product Generation

Reasoning about Schema GF1 subgrid(Ocean) GF2 type(GF1) = type(GF2 ), but schema(GF1)  schema(GF2 ) since GF2 is defined over a smaller grid than GF1 By tracking schema information through complex recipes we can: check for errors estimate resource requirements (big schema require big buffers) a valid transect an invalid transect 11/14/2018 Modeling Data Product Generation

Reasoning about Quality Say we have operators coarsen and refine which lower resolution via grouping and raise resolution via interpolation, respectively type(GF1) = type(GF2), schema(GF1) = schema(GF2), but qual(GF1)  qual(GF2) GF1 coarsen refine GF2 11/14/2018 Modeling Data Product Generation

Optimize via Algebraic Manipulations Different sequences of operators can give equivalent results GF Elev computevol subgrid(Ocean) GF Vol GF Area ... GF Elev subgrid(Ocean) GF Vol GF Area computevol ... These are equivalent, but the second avoids computing volume over the entire grid 11/14/2018 Modeling Data Product Generation

Optimize via Choice of Implementation GF Salt select(s < 30) ? GF Bool F T GF (Maybe Salt) - 22 24 23 {KCell} {c1, c2, c3} 11/14/2018 Modeling Data Product Generation

Optimize via Shared Intermediate Results A Node’s neighbors don’t often change, so we can avoid re-computing this result GF Velocity GF Velo N’hood GF Vorticity GF Neighbors GF Salt N’hood GF Salinity GF Salt Gradient 11/14/2018 Modeling Data Product Generation

Modeling Data Product Generation Other niceties... We don’t have to re-implement everything to realize benefits But eventually we’ll want to wag the dog! A collection of recipes can help... communicate the product catalog provide provenance Derive new recipes from parts of old ones support for product lines 11/14/2018 Modeling Data Product Generation

Modeling Data Product Generation Summary Modeling the current CORIE Graphical System Description pmon Modeling the future CORIE Grid Functions Recipes Reasoning Optimization 11/14/2018 Modeling Data Product Generation

Modeling Data Product Generation Milestones RPE this spring Specify existing data products using the model Perform checks on existing production plans Type Schema / Resources Quality 11/14/2018 Modeling Data Product Generation

Modeling Data Product Generation 11/14/2018 Modeling Data Product Generation

A Thorough Experiment Management Schema 11/14/2018 Modeling Data Product Generation

Modeling Data Product Generation task definition A Good Start... task instance (with parameters) task execution 11/14/2018 Modeling Data Product Generation

Modeling Data Product Generation pmon (Process Monitor) Database Web Server pmon Architecture fam (File Alteration Monitor) imon, dnotify, or polling, depending on kernel patch Filesystem pacct (stopped process stats) /proc (running process info) acct (process accounting) Process to Monitor Linux Kernel 11/14/2018 Modeling Data Product Generation