Download presentation
Presentation is loading. Please wait.
Published byAmberlynn Gregory Modified over 6 years ago
1
SDM workshop Strawman report History and Progress and Goal
2
History Original plan In Extended EOC we came up with draft report
Identify Scientific Applications Data Management needs Focus on different application types: simulations, experiments/observations Identify Data Management technologies Identify other relevant Computer Science technologies Identify Gaps, Cost, Priorities In Extended EOC we came up with draft report Based on extensive discussions of application needs Identified the scientific investigation process (workflow) Identified technologies needed Assigned writing to individuals
3
Section 2: Application sciences motivation and needs
Astrophysics Biology Climate Modeling Combustion Fusion Energy Science High Energy and Nuclear Physics Nanotechnology
4
Section 3: The scientific investigation process
Distributed Scientific Workflows Scientific Data Management Phases Data Generation Data Analysis Data Visualization Foundation of scientific data management technology Workflow, dataflow, data transformation Storage, data movement, grid, networks Metadata management and cataloging Efficient access and query, data integration Integrated analysis environment, visualization Requirements of supportive technologies Networking Visualization
5
Scientific Workflow Cycle
Data Generation workflow workflow Scientific Data Management Data Visualization Data Analysis workflow
6
Section 4: Data Management Technologies and Gap Analysis
1) Workflow, dataflow, data transformation Workflow specification Workflow execution in distributed systems Monitoring of long-running workflows Adapting components to the framework Workflow layers Control-flow layer Application and Software Tools layer I/O System layer Storage and Network Resource layer
7
Astrophysical Simulation Workflow Cycle
Application Layer Start New Simulation? Run Simulation batch job on capability system Continue Simulation? Simulation generates checkpoint files Archive checkpoint files to HPSS Migrate subset of checkpoint files to local cluster Vis & Analysis on local Beowulf cluster Parallel I/O Layer Parallel HDF5 Storage Layer HPSS GPFS PVFS or LUSTRE MSS, Disks, & OS
8
Section 4: Data Management Technologies and Gap Analysis
2) Storage, data movement, grid, networks Dynamic data storage and caching Robust terabyte-scale data movers Dataflow automation between components Multi-resolution data movement 3) Metadata management and cataloging Unified data models and API’s Annotation, ontologies and provenance Metadata requirements for workflows
9
Section 4: Data Management Technologies and Gap Analysis
4) Efficient access and query, data integration Parallel and random I/O Large-scale feature-based Indexing Query processing over files Data integration 5) Integrated analysis environment, visualization A single environment for packaged tools and user software A single environment for a variety of tools: statistical software, cluster analysis, … Coupling with visualization tools Work with parallel I/O
10
Section 5: Prioritization, Cost, and Management
Prioritization process Reasons based on current barriers and needs Reasons based on long term projections Practical budgeting considerations Research and development Hardening and packaging Deployment and maintenance Recommendations and program planning Prioritization Cost Management Structure
11
Gap & Cost Matrix Workflow, dataflow, data transformation
Research and Development Hardening and Packaging Deployment and maintenance Workflow, dataflow, data transformation Storage, data movement, grid, networks Metadata management and cataloging Efficient access and query, data integration Integrated analysis environment, visualization
12
Discussion items Research and Development Hardening and Packaging
Deployment and maintenance Control flow tier Granularity of tasks, sub-workflows Task Invocation mechanisms-Web Services, Corba, Wrappers, Callbacks Human tasks: Notifications and alerts, steering Dataflow streaming granularity Work Tier Workflow engine for scientific applications Dataflow management Effect of dataflow on the control flow Failure detection and recovery Performance and bottleneck issues
13
The End
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.