1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

Slides:

Advertisements

Similar presentations

Ch. 12 Routing in Switched Networks

Advertisements

Exploiting Deadline Flexibility in Grid Workflow Rescheduling Wei Chen Alan Fekete Young Choon Lee.

Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.

LIBRA: Lightweight Data Skew Mitigation in MapReduce

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Using Parallel Genetic Algorithm in a Predictive Job Scheduling

Solutions for Scheduling Assays. Why do we use laboratory automation? Improve quality control (QC) Free resources Reduce sa fety risks Automatic data.

Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.

UNIVERSITY OF JYVÄSKYLÄ Building NeuroSearch – Intelligent Evolutionary Search Algorithm For Peer-to-Peer Environment Master’s Thesis by Joni Töyrylä

Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,

Presenter: David Fleeman { D. Juedes, F. Drews, L. Welch and D. Fleeman Center for Intelligent, Distributed & Dependable.

Progress Report Wireless Routing By Edward Mulimba.

GridFlow: Workflow Management for Grid Computing Kavita Shinde.

Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.

Parallel Simulation etc Roger Curry Presentation on Load Balancing.

1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,

UNIVERSITY OF JYVÄSKYLÄ Topology Management in Unstructured P2P Networks Using Neural Networks Presentation for IEEE Congress on Evolutionary Computing.

Chapter 10 Introduction to Wide Area Networks Data Communications and Computer Networks: A Business User’s Approach.

Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,

1 Chapter 10 Introduction to Metropolitan Area Networks and Wide Area Networks Data Communications and Computer Networks: A Business User’s Approach.

Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.

1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:

MATE: MPLS Adaptive Traffic Engineering Anwar Elwalid, et. al. IEEE INFOCOM 2001.

1 Reasons for parallelization Can we make GA faster? One of the most promising choices is to use parallel implementations. The reasons for parallelization.

Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.

A Budget Constrained Scheduling of Workflow Applications on Utility Grids using Genetic Algorithms Jia Yu and Rajkumar Buyya Grid Computing and Distributed.

Predicting performance of applications and infrastructures Tania Lorido 27th May 2011.

Computer Science Informed Content Delivery Across Adaptive Overlay Networks Overlay networks have emerged as a powerful and highly flexible method for.

Tiziana FerrariNetwork metrics usage for optimization of the Grid1 DataGrid Project Work Package 7 Written by Tiziana Ferrari Presented by Richard Hughes-Jones.

2005/10/211 A Survey on Physical Network Topology Estimation October 21, 2005 Chikayama-Taura Lab. Tatsuya Shirai.

ROBUST RESOURCE ALLOCATION OF DAGS IN A HETEROGENEOUS MULTI-CORE SYSTEM Luis Diego Briceño, Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell,

DLS on Star (Single-level tree) Networks Background: A simple network model for DLS is the star network with a master-worker platform. It consists of a.

Multicast Routing Algorithms n Multicast routing n Flooding and Spanning Tree n Forward Shortest Path algorithm n Reversed Path Forwarding (RPF) algorithms.

MapReduce M/R slides adapted from those of Jeff Dean’s.

A Survey of Distributed Task Schedulers Kei Takahashi (M1)

Network and Communications Ju Wang Chapter 5 Routing Algorithm Adopted from Choi’s notes Virginia Commonwealth University.

Chi-Cheng Lin, Winona State University CS 313 Introduction to Computer Networking & Telecommunication Chapter 5 Network Layer.

Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013,

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

1 A distributed Task Scheduler Optimizing Data Transfer Time Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

The Network Layer.

Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.

Hadoop System simulation with Mumak Fei Dong, Tianyu Feng, Hong Zhang Dec 8, 2010.

Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.

1 An Adaptive File Distribution Algorithm for Wide Area Network Takashi Hoshino, Kenjiro Taura, Takashi Chikayama University of Tokyo.

Intermediate Presentation(05/04/15) Autonomous Failure Detection for Supporting Fault Tolerant Parallel Computation 05/04/15 Taura Lab. Master 2nd

A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside

A Method for Distributed Computation of Semi-Optimal Multicast Tree in MANET Eiichi Takashima, Yoshihiro Murata, Naoki Shibata*, Keiichi Yasumoto, and.

Adaptive and Robust Broadcast Algorithm Takeshi Sekiya Chikayama-Taura Lab. 2007/4/13.

4 Introduction Broadcasting Tree and Coloring System Model and Problem Definition Broadcast Scheduling Simulation 6 Conclusion and Future Work.

An Adaptive Collective Communication Suppressing Contention Taura Lab. M2 Shota Yoshitomi.

A Hyper-heuristic for scheduling independent jobs in Computational Grids Author: Juan Antonio Gonzalez Sanchez Coauthors: Maria Serna and Fatos Xhafa.

Deadline-based Resource Management for Information- Centric Networks Somaya Arianfar, Pasi Sarolahti, Jörg Ott Aalto University, Department of Communications.

By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.

Static Process Scheduling

Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,

Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.

Fair and Efficient multihop Scheduling Algorithm for IEEE BWA Systems Daehyon Kim and Aura Ganz International Conference on Broadband Networks 2005.

1 Low Latency Multimedia Broadcast in Multi-Rate Wireless Meshes Chun Tung Chou, Archan Misra Proc. 1st IEEE Workshop on Wireless Mesh Networks (WIMESH),

A Stable Broadcast Algorithm Kei Takahashi Hideo Saito Takeshi Shibata Kenjiro Taura (The University of Tokyo, Japan) 1 CCGrid Lyon, France.

1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.

Towards High Performance Processing of Streaming Data May Supun Kamburugamuve, Saliya Ekanayake, Milinda Pathirage and Geoffrey C. Fox Indiana.

Interaction and Animation on Geolocalization Based Network Topology by Engin Arslan.

2018/4/23 Dynamic Load-balanced Path Optimization in SDN-based Data Center Networks Author: Yuan-Liang Lan , Kuochen Wang and Yi-Huai Hsu Presenter: Yi-Hsien.

A Study of Group-Tree Matching in Large Scale Group Communications

Wireless Sensor Network Architectures

湖南大学-信息科学与工程学院-计算机与科学系

ECE 544 Protocol Design Project 2016

CIS 488/588 Bruce R. Maxim UM-Dearborn

MapReduce: Simplified Data Processing on Large Clusters

Presentation transcript:

1 A distributed Task Scheduler Optimizing Data Transfer Time (データ転送時間を最適化する分散タスクスケジュラー) Taura lab. Kei Takahashi (56428) Taura lab. Kei Takahashi (56428)

2 Task Schedulers A system which distributes many serial tasks onto available nodes ► Task assignments ► Data transfers Users can easily execute many serial tasks in parallel on distributed environment A system which distributes many serial tasks onto available nodes ► Task assignments ► Data transfers Users can easily execute many serial tasks in parallel on distributed environment A A Scheduler File B B C C D D Task

3 Data Intensive Applications A computation using a large amount of data ► Natural language processing, data mining, etc. Data transfer time takes considerable part of the total processing time ► Example: parsing a set of data collected by crawlers located in some hosts Typical network bandwidth: 1Gbps~50Mbps Throughput of Chasen parser: 1.7MB/s = 14Mbps In the worst case, the transfer takes 20% of the total processing time A computation using a large amount of data ► Natural language processing, data mining, etc. Data transfer time takes considerable part of the total processing time ► Example: parsing a set of data collected by crawlers located in some hosts Typical network bandwidth: 1Gbps~50Mbps Throughput of Chasen parser: 1.7MB/s = 14Mbps In the worst case, the transfer takes 20% of the total processing time computation Transfer Time

4 Considering Transfer Cost GrADS [1] considers transfer time when scheduling tasks ► Measure bandwidth between any two hosts ► Estimate transfer time by using the bandwidth ► Assign task based on the time Since they only use static bandwidth values, their prediction can be far from the real network behavior when multiple data transfer share a link GrADS [1] considers transfer time when scheduling tasks ► Measure bandwidth between any two hosts ► Estimate transfer time by using the bandwidth ► Assign task based on the time Since they only use static bandwidth values, their prediction can be far from the real network behavior when multiple data transfer share a link [1] Mandal. et al. "Scheduling Strategies for Mapping Application Workflows onto the Grid“ in IEEEInternational Symposium on High Performance Distributed Computing (HPDC 2005)

5 Using Replicas Machida et. al [1] have developed a scheduler which utilizes multiple replicas ► When data are copied from the source to another node, the copied node is registered as a replica ► When another node requires the same data, the data is copied from the nearest replicas In this work, a task schedule itself is not optimized. Also, it does not care about the effect caused by of link sharing of multiple transfers Machida et. al [1] have developed a scheduler which utilizes multiple replicas ► When data are copied from the source to another node, the copied node is registered as a replica ► When another node requires the same data, the data is copied from the nearest replicas In this work, a task schedule itself is not optimized. Also, it does not care about the effect caused by of link sharing of multiple transfers [1] 町田悠哉滝澤真一朗中田秀基松岡聡 `` レプリカ管理システムを利用したデータインテンシブアプリケーション向けスケジューリングシステム '‘ (HPCS2006) File

6 Effects of Link Sharing If several transfers share a link, the sum of their throughputs cannot exceed the link bandwidth ► throughput(File1) + throughput (File2) <= (Link bandwidth) Bandwidth between two hosts varies when multiple data are transferred ► The value is reduced in 25% if 4 transfers share a link If several transfers share a link, the sum of their throughputs cannot exceed the link bandwidth ► throughput(File1) + throughput (File2) <= (Link bandwidth) Bandwidth between two hosts varies when multiple data are transferred ► The value is reduced in 25% if 4 transfers share a link File1 File 2 File1 50Mbps 100Mbps50Mbps

7 Considering Topology The throughput is limited by the narrowest link in the transfer The throughput may become larger by altering source of transfers or changing task assignment The throughput is limited by the narrowest link in the transfer The throughput may become larger by altering source of transfers or changing task assignment Throughputs of the two transfers are limited on this link File 2 File1 Task1 Task2 File1 can be transferred from the other source

8 Research Purpose Design and implement an efficient distributed task scheduler for data intensive applications ► Minimize data transfer time by using network topology and bandwidth information Create a schedule which needs less transfers Plan multicasts for data transfers Maximize throughput by using linear programming Design and implement an efficient distributed task scheduler for data intensive applications ► Minimize data transfer time by using network topology and bandwidth information Create a schedule which needs less transfers Plan multicasts for data transfers Maximize throughput by using linear programming

9 Agenda Background Purpose Our Approach ► Task Scheduling Algorithm ► Transfer Planning Algorithm Conclusion Background Purpose Our Approach ► Task Scheduling Algorithm ► Transfer Planning Algorithm Conclusion

10 Input and Output Input: ► Data locations ► Network topology and bandwidth ► Task information (required data) Output: ► Task Scheduling: Assignment of nodes to tasks ► Transfer Scheduling: From/to which host data are transferred Limit bandwidth during the transfer if needed Final goal: Minimizing the total completion time Input: ► Data locations ► Network topology and bandwidth ► Task information (required data) Output: ► Task Scheduling: Assignment of nodes to tasks ► Transfer Scheduling: From/to which host data are transferred Limit bandwidth during the transfer if needed Final goal: Minimizing the total completion time

11 Immediate Goal Final goal: Minimizing the total completion time Our idea is to minimize data transfer time Immediate goal: Maximizing the total of data arrival throughput on each node Final goal: Minimizing the total completion time Our idea is to minimize data transfer time Immediate goal: Maximizing the total of data arrival throughput on each node (file i, j : the j th file required by task i ) Maximize the total of these throughputs

12 The Whole Algorithm Create Initial Task Schedule Plan Efficient Transfers Improve the Schedule Plan Multicast Optimize Throughput Re-plan Multicast Assign tasks to nodes Some nodes are unscheduled Transfer Files Execute Tasks (A) Task Scheduling Algorithm (B) Transfer Planning Algorithm

13 Task Scheduling Algorithm Create Initial Task Schedule Plan Efficient Transfers Improve the Schedule Plan Multicast Optimize Throughput Re-plan Multicast Assign tasks to nodes Some nodes are unscheduled Transfer Files Execute Tasks (A) Task Scheduling Algorithm

14 Task Scheduling Algorithm When some nodes are unscheduled, ► Create candidate task schedules ► Plan efficient file transfers From which node file is transfered Bandwidth of each file transfer ► Search for the best schedule which maximizes data transfer throughput (by using heuristics like GA, SA) Decide the task schedule, and start file transfers and task executions When some nodes are unscheduled, ► Create candidate task schedules ► Plan efficient file transfers From which node file is transfered Bandwidth of each file transfer ► Search for the best schedule which maximizes data transfer throughput (by using heuristics like GA, SA) Decide the task schedule, and start file transfers and task executions

15 Transfer Planning Algorithm Create Initial Task Schedule Plan Efficient Transfers Improve the Schedule Plan Multicast Optimize Throughput Re-plan Multicast Assign tasks to nodes Some nodes are unscheduled Transfer Files Execute Tasks (B) Transfer Planning Algorithm

16 Transfer Planning Algorithm When source nodes and destination nodes for each file is given: ► Decide source node for each destinations by using Kruskal's Algorithm If multiple destinations uses one source, the data are multicasted ► Calculate bandwidth value for each transfer to maximize throughput by using linear programming ► If the originally chosen source is not optimal, modify the multicast tree by using a new bandwidth topology When source nodes and destination nodes for each file is given: ► Decide source node for each destinations by using Kruskal's Algorithm If multiple destinations uses one source, the data are multicasted ► Calculate bandwidth value for each transfer to maximize throughput by using linear programming ► If the originally chosen source is not optimal, modify the multicast tree by using a new bandwidth topology

17 Planing Multicast Create Initial Task Schedule Plan Efficient Transfers Improve the Schedule Plan Multicast Optimize Throughput Re-plan Multicast Assign tasks to nodes Some nodes are unscheduled Transfer Files Execute Tasks (B) Transfer Planning Algorithm

18 Pipeline Multicast (1) For a given schedule, it is known which nodes require which files When multiple nodes need a common file, pipeline multicast shortens transfer time (in the case of large files) The speed of a pipeline broadcast is limited by the narrowest link in the tree A broadcast can be sped up by efficiently using multiple sources For a given schedule, it is known which nodes require which files When multiple nodes need a common file, pipeline multicast shortens transfer time (in the case of large files) The speed of a pipeline broadcast is limited by the narrowest link in the tree A broadcast can be sped up by efficiently using multiple sources

19 Pipeline Multicast (2) The tree is constructed in depth-first manner Every related link is only used twice (upward/downward) Since disk access is as slow as network, the disk access bandwidth should be also counted The tree is constructed in depth-first manner Every related link is only used twice (upward/downward) Since disk access is as slow as network, the disk access bandwidth should be also counted SourceDestination

20 Multi-source Multicast M nodes have the same source data; N nodes need it For each link in the order of bandwidth: ► If the link connects two nodes/switches which are already connected to the source node: → Discard the link ► Otherwise: → Adopt the link (Kruskal's Algorithm: it maximizes the narrowest link in the pipelines) M nodes have the same source data; N nodes need it For each link in the order of bandwidth: ► If the link connects two nodes/switches which are already connected to the source node: → Discard the link ► Otherwise: → Adopt the link (Kruskal's Algorithm: it maximizes the narrowest link in the pipelines) Do not use this link Pipeline 1 Pipeline 2 Source Destination

21 Maximizing Throughput Create Initial Task Schedule Plan Efficient Transfers Improve the Schedule Plan Multicast Optimize Throughput Re-plan Multicast Assign tasks to nodes Some nodes are unscheduled Transfer Files Execute Tasks (B) Transfer Planning Algorithm

22 Maximizing Throughput After constructing multicast trees for every file, decide the bandwidth each transfer uses ► By using linear programming Maximize: (bw0 + bw1 + bw2 * 3 + bw3 * 2 + bw4) Conditions:bw0 + bw1 ≤ (const) bw1 + bw3 ≤ (const) bw0 + bw2 + bw3 ≤ (const) For local data, use disk access cost as the bandwidth After constructing multicast trees for every file, decide the bandwidth each transfer uses ► By using linear programming Maximize: (bw0 + bw1 + bw2 * 3 + bw3 * 2 + bw4) Conditions:bw0 + bw1 ≤ (const) bw1 + bw3 ≤ (const) bw0 + bw2 + bw3 ≤ (const) For local data, use disk access cost as the bandwidth 30Mbps 60Mbps 50Mbps 100Mbps bw0 bw1 bw2 bw3 bw4

23 Re-planning Multicast Create Initial Task Schedule Plan Efficient Transfers Improve the Schedule Plan Multicast Optimize Throughput Re-plan Multicast Assign tasks to nodes Some nodes are unscheduled Transfer Files Execute Tasks (B) Transfer Planning Algorithm

24 Re-planning Multicast Now every transfer schedule is decided, but it may not be optimal. Since multicast trees are planned independently, unnecessary conflictions may occur. By re-planning multicast by using current bandwidth information, the multicast trees are optimized ► When re-optimizing a transfer, first create an available bandwidth map and construct multicast trees Now every transfer schedule is decided, but it may not be optimal. Since multicast trees are planned independently, unnecessary conflictions may occur. By re-planning multicast by using current bandwidth information, the multicast trees are optimized ► When re-optimizing a transfer, first create an available bandwidth map and construct multicast trees Old route Pink links: available bandwidth New route

25 Improve Task Schedule Create Initial Task Schedule Plan Efficient Transfers Improve the Schedule Plan Multicast Optimize Throughput Re-plan Multicast Assign tasks to nodes Some nodes are unscheduled Transfer Files Execute Tasks

26 Improve Task Schedule After efficient data transfer plan has obtained, the scheduler tries to reduce the transfer size by altering the task schedule We are thinking of using GA or Simulated Annealing. Since the most crowded link has found, we can try to reduce transfers on this link in the mutation phase. After efficient data transfer plan has obtained, the scheduler tries to reduce the transfer size by altering the task schedule We are thinking of using GA or Simulated Annealing. Since the most crowded link has found, we can try to reduce transfers on this link in the mutation phase.

27 Actual Transfers After the transfer schedule is determined, the plan is performed as simulated The bandwidth of each transfer is limited to the previously calculated value When detecting a significant change in bandwidth, the schedule is reconstructed ► The bandwidth is measured by using existing methods (eg. Nettimer [1] ) After the transfer schedule is determined, the plan is performed as simulated The bandwidth of each transfer is limited to the previously calculated value When detecting a significant change in bandwidth, the schedule is reconstructed ► The bandwidth is measured by using existing methods (eg. Nettimer [1] ) [1] Kevin Lai et al. ``Measuring Link Bandwidths Using a Deterministic Model of Packet Delay'' SIGCOMM '00, Stockholm, Sweden.

28 Re-scheduling Transfers When one of the following events occurs, bandwidth assignments are recalculated ► A transfer has finished ► Bandwitdth has changed ► New tasks are scheduled When one of the following events occurs, bandwidth assignments are recalculated ► A transfer has finished ► Bandwitdth has changed ► New tasks are scheduled 30Mbps 60Mbps 50Mbps 100Mbps bw0 bw1 bw2 bw3 bw4 90Mbps 40Mbps

29 Current Situation The algorithm has determined The implementation is ongoing ► Plan data transfers when topology and a task schedule is given ► Create a schedule with heuristics ► Perform the real file transfer and task execution Evaluation will be done by comparing to existing schedulers The algorithm has determined The implementation is ongoing ► Plan data transfers when topology and a task schedule is given ► Create a schedule with heuristics ► Perform the real file transfer and task execution Evaluation will be done by comparing to existing schedulers

30 Conclusion Introduced a new scheduling algorithm ► Predict transfer time by using network topology, and search for a better task schedule ► Plan an efficient multicast ► Maximize throughput by linear programming and by limiting bandwidth ► Dynamically re-scheduling transfers Introduced a new scheduling algorithm ► Predict transfer time by using network topology, and search for a better task schedule ► Plan an efficient multicast ► Maximize throughput by linear programming and by limiting bandwidth ► Dynamically re-scheduling transfers

31 Publications 高橋慧, 田浦健次朗, 近山隆. マイグレーションを支援する分散集合オブジェクト．並列／分散／協調処理に関するサマーワークショップ ( ポスター発表 ) （ SWoPP2005 ），武雄， 2005 年 8 月．高橋慧, 田浦健次朗, 近山隆. マイグレーションを支援する分散集合オブジェクト. 先進的計算基盤シンポジウム（ SACSIS 2005 ），筑波， 2005 年 5 月．高橋慧, 田浦健次朗, 近山隆. マイグレーションを支援する分散集合オブジェクト．並列／分散／協調処理に関するサマーワークショップ ( ポスター発表 ) （ SWoPP2005 ），武雄， 2005 年 8 月．高橋慧, 田浦健次朗, 近山隆. マイグレーションを支援する分散集合オブジェクト. 先進的計算基盤シンポジウム（ SACSIS 2005 ），筑波， 2005 年 5 月．