Download presentation
Presentation is loading. Please wait.
1
Data Management in Cloud Workflow Systems Dong Yuan Faculty of Information and Communication Technology Swinburne University of Technology
2
Outline >Cloud Computing & Cloud Workflow Systems –Introduction to cloud workflow systems. A brief overview of grid workflow systems. >Data Management in Cloud Workflow Systems –New features and research issues >Cloud Computing Environment and SwinDeW-C –Our simulation environment and cloud workflow system
3
>Cloud Computing & Cloud Workflow Systems
4
Cloud Computing >Some new features of cloud computing –Large data centres with cheap hardware –Virtualisation –Internet based and SOA SaaS, PaaS, IaaS –Market driven and cost model >Research of cloud computing has emerged in many areas –Data mining, Database, Parallel computing & Scientific application, Content delivery
5
Cloud Workflow Systems >Grid workflow systems –Kepler, Pegasus, Taverna, MOTEUR, Triana, ASKALON –Gridbus, GridFlow >Build-time: focus on data modelling. –Kepler: actor-oriented data modelling. Taverna - Sculf. ASKALON - AGWL >Runtime: adopt Data Grid system –Grid DataFarm, GDMP, GridDB, SRB, RLS (P-RLS), GSB, DaltOn
6
Cloud Workflow Systems >Architecture –Based on Internet –Platform as a Service –More distributed
7
>Data Management in Cloud Workflow Systems
8
Data Management in Cloud Workflow Systems >New features and challenges –Independent of users and automatic –Cost driven computation cost, storage cost, data transfer cost –Data dependency Task – data, data – data, derivation >Some research issues –Data partition, placement, replication, synchronisation, provenance, catalogue, meta-data, consistence, reduction, storage, movement, etc.
9
Data Placement in Cloud Workflow Systems >Data Placement: to decide where to store the application data in the distributed data centres >Aims: –Reduce data movement –Reduce task waiting time >Strategy: –Data dependency: dataset – dataset –Build-time: existing data, runtime: generated data (also intermediate data)
10
Data Replication in Cloud Workflow Systems >Data replication: for one dataset, store several copies in different places (data centres) >Aims: –Increase data security –Fast data access –Reduce data movement >Strategy: –Dynamic replication.
11
Intermediate Data Storage in Cloud Workflow Systems >Intermediate data storage is especially importance in scientific workflows >Aim: –Reduce system cost >Strategy: –Intermediate data can be regenerated with data provenance information –Selectively store some key intermediate datasets
12
>Cloud computing environment and SwinDeW-C
13
Simulation Cloud
14
Web Portal
15
Related key system components of SwinDeW-C
16
End >Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.