Download presentation
Presentation is loading. Please wait.
Published byBruno Reed Modified over 9 years ago
1
1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University Effective Use of Networked Reconfigurable Resources http://ece.gmu.edu/lucite
2
2 Problem: Reconfigurable resources expensive and underutilized Many of these resources available over the network It is desirable to leverage networked reconfigurable resources to help other users within the same organization
3
3 Tasks 1, 2, 3 Task 3 Task 1 Execution Host 1 Execution Host 2 Execution Host 3 Master Host Submission Host Task 2 Approach: Adapt and use a Job Management System
4
4 Approach: Select the most suitable existing Job Management System (JMS) Extend this JMS to recognize and utilize reconfigurable resources - identify and define functional requirements - rank known systems according to these requirements - identify which JMS is the easiest to extend - add new dynamic resources - configure scheduling to be based on these new resources
5
5 Tasks 1, 2, 3 Task 3 Task 1 Execution Host 1 Execution Host 2 Execution Host 3 Master Host Submission Host Task 2 Networked Reconfigurable Resource Management System FPGA boards
6
6 Myrinet SAN/LAN Switch WILDFORCE Dell WILDSTAR Dell SLAAC Dell WILDSTAR Dell WILDFORCE Dell Sparc 10 SLAAC Research Reference Platform Ethernet Intelligent Hub 100 Mbps Heterogeneous network with FPGA-based accelerators Dell HP Sparc 20DellGateway SLAAC WILDSTAR WILDFORCE SLAAC Ethernet Intelligent Hub 100 Mbps
7
7 Functional units of a typical Job Management System jobs & their requirements User Server Job Scheduler Resource Monitor available resources resource requirements scheduling policies Job Dispatcher resource allocation and job execution Resource Manager
8
8 Classification of Investigated Systems (1) Centralized JMS Distributed JMS w/o a Central Scheduler Distributed Operating System LSF CODINE PBS Condor RES Globus Legion NetSolve MOSIX
9
9 Parameter Study Scheduler Resource Monitor and Forecaster Distributed Computing Interface Compaq DCE AppLES NWS Classification of Investigated Systems (2)
10
10 Operating system, flexibility, user interface LSF Codine PBS CONDOR RES Distribution Source code OS Support User Interface Solaris Linux Tru64 NT GUI & CLI com pub pub/compubgov GUI & CLI GUI & CLI GUI & CLI
11
11 Scheduling and Resource Management LSF Codine PBS CONDOR RES Batch jobs Interactive jobs Parallel jobs Accounting
12
12 Efficiency and Utilization LSF Codine PBS CONDOR RES Stage-in and stage-out Timesharing Process migration Dynamic load balancing Scalability
13
13 Fault Tolerance and Security LSF Codine PBS CONDOR RES Checkpointing Daemon fault recovery Authentication Authorization
14
14 Documentation and Technical Support LSF Codine PBS CONDOR RES Documentation Technical support
15
15 JMS features supporting extension to reconfigurable hardware capability to define new dynamic resources strong support for stage-in and stage-out - configuration bitstreams - executable code - input/output data support for Windows NT and Linux
16
16 Ranking of Centralized Job Management Systems (1) Capability to define new dynamic resources: Excellent:LSF, PBS, CODINE More difficult:CONDOR, RES Stage-in and stage-out: Excellent:LSF, PBS Limited:CONDOR No:CODINE, RES
17
17 Ranking of Centralized Job Management Systems (2) Overall suitability to extend to reconfigurable hardware: 1.LSF 2.CODINE 3.PBS 4.CONDOR 5.RES without changing the JMS source code requires changes to the JMS source code
18
18 Submission host LIM Batch API Master host MLIM MBD Execution host SBD Child SBD LIM RES User job Extension of LSF to reconfigurable hardware (1) Operation of LSF LIM – Load Information Manager MLIM – Master LIM MBD – Master Batch Daemon SBD – Slave Batch Daemon RES – Remote Execution Server queue 1 2 3 4 5 6 7 8 9 10 11 12 13 Load information other hosts other hosts bsub app
19
19 Extension of LSF to reconfigurable hardware(2) Submission host LIM Batch API Master host MLIM MBD Execution host SBD Child SBD LIM RES User job ELIM – External Load Information Manager ACS API – Adaptive Computing Systems API queue 1 2 3 4 5 6 7 8 9 10 11 12 13 Load information other hosts other hosts bsub app ELIM ACS API 14 FPGA board Status of the board
20
20 Conclusions (1) 12 systems evaluated using 25 functional requirements + the suitability of extension to support reconfigurable hardware LSF, CODINE, PBS, and Condor ranked the highest in the functional requirements LSF, CODINE, and PBSPro found easy to extend without changes in their source codes LSF most suitable to support reconfigurable hardware
21
21 General software architecture of the extended system developed Experimental developments, verification and performance evaluation of the extended system in progress Conclusions (2)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.