1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University.

Slides:

Advertisements

Similar presentations

Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.

Advertisements

M. Muztaba Fuad Masters in Computer Science Department of Computer Science Adelaide University Supervised By Dr. Michael J. Oudshoorn Associate Professor.

2. Computer Clusters for Scalable Parallel Computing

A Computation Management Agent for Multi-Institutional Grids

Windows HPC Server 2008 Presented by Frank Chism Windows and Condor: Co-Existence and Interoperation.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.

Universität Dortmund Robotics Research Institute Information Technology Section Grid Metaschedulers An Overview and Up-to-date Solutions Christian.

Aneka: A Software Platform for .NET-based Cloud Computing

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

June 21, PROOF - Parallel ROOT Facility Maarten Ballintijn, Rene Brun, Fons Rademakers, Gunter Roland Bring the KB to the PB.

NetSolve Henri Casanova and Jack Dongarra University of Tennessee and Oak Ridge National Laboratory

Workload Management Massimo Sgaravatto INFN Padova.

AgentOS: The Agent-based Distributed Operating System for Mobile Networks Salimol Thomas Department of Computer Science Illinois Institute of Technology,

Process Concept An operating system executes a variety of programs

Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group

Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.

WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.

Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting February 24-25, 2003.

Section 6.1 Explain the development of operating systems Differentiate between operating systems Section 6.2 Demonstrate knowledge of basic GUI components.

Grid Computing 7700 Fall 2005 Lecture 17: Resource Management Gabrielle Allen

Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.

KARMA with ProActive Parallel Suite 12/01/2009 Air France, Sophia Antipolis Solutions and Services for Accelerating your Applications.

Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.

Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.

 What is OS? What is OS?  What OS does? What OS does?  Structure of Operating System: Structure of Operating System:  Evolution of OS Evolution of.

 Introduction to Operating System Introduction to Operating System  Types Of An Operating System Types Of An Operating System  Single User Single User.

University of Illinois at Urbana-Champaign NCSA Supercluster Administration NT Cluster Group Computing and Communications Division NCSA Avneesh Pant

Automatic Software Testing Tool for Computer Networks ADD Presentation Dudi Patimer Adi Shachar Yaniv Cohen

Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.

Distributed Component Object Model (DCOM)

Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.

SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.

INVITATION TO COMPUTER SCIENCE, JAVA VERSION, THIRD EDITION Chapter 6: An Introduction to System Software and Virtual Machines.

CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei

Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison

N. GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

GVis: Grid-enabled Interactive Visualization State Key Laboratory. of CAD&CG Zhejiang University, Hangzhou

What is SAM-Grid? Job Handling Data Handling Monitoring and Information.

Satisfy Your Technical Curiosity Specialists Enterprise Desktop -

Experimental Comparative Study of Job Management Systems George Washington University George Mason University

1 Putchong Uthayopas, Thara Angsakul, Jullawadee Maneesilp Parallel Research Group, Computer and Network System Research Laboratory Department of Computer.

Getting Started on Emerald Research Computing Group.

Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.

Capacity and Capability Computing using Legion Anand Natrajan ( ) The Legion Project, University of Virginia (

Introduction Why are virtual machines interesting?

WebFlow High-Level Programming Environment and Visual Authoring Toolkit for HPDC (desktop access to remote resources) Tomasz Haupt Northeast Parallel Architectures.

Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.

Batch Systems P. Nilsson, PROOF Meeting, October 18, 2005.

Operating Systems.

Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group

HNC COMPUTING - Network Concepts 1 Network Concepts Network Concepts Network Operating Systems Network Operating Systems.

Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.

Mobile Analyzer A Distributed Computing Platform Juho Karppinen Helsinki Institute of Physics Technology Program May 23th, 2002 Mobile.

SYSTEM MODELS FOR ADVANCED COMPUTING Jhashuva. U 1 Asst. Prof CSE

1 Platform LSF6 What’s new in LSF6

Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,

Workload Management Workpackage

Consulting Services JobScheduler Architecture Decision Template

2. OPERATING SYSTEM 2.1 Operating System Function

Introduction to Operating System (OS)

Auburn University COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques (2) Dr. Xiao Qin Auburn University.

NCSA Supercluster Administration

Chapter 2: System Structures

Basic Grid Projects – Condor (Part I)

Wide Area Workload Management Work Package DATAGRID project

Presentation transcript:

1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University Effective Use of Networked Reconfigurable Resources

2 Problem: Reconfigurable resources expensive and underutilized Many of these resources available over the network It is desirable to leverage networked reconfigurable resources to help other users within the same organization

3 Tasks 1, 2, 3 Task 3 Task 1 Execution Host 1 Execution Host 2 Execution Host 3 Master Host Submission Host Task 2 Approach: Adapt and use a Job Management System

4 Approach: Select the most suitable existing Job Management System (JMS) Extend this JMS to recognize and utilize reconfigurable resources - identify and define functional requirements - rank known systems according to these requirements - identify which JMS is the easiest to extend - add new dynamic resources - configure scheduling to be based on these new resources

5 Tasks 1, 2, 3 Task 3 Task 1 Execution Host 1 Execution Host 2 Execution Host 3 Master Host Submission Host Task 2 Networked Reconfigurable Resource Management System FPGA boards

6 Myrinet SAN/LAN Switch WILDFORCE Dell WILDSTAR Dell SLAAC Dell WILDSTAR Dell WILDFORCE Dell Sparc 10 SLAAC Research Reference Platform Ethernet Intelligent Hub 100 Mbps Heterogeneous network with FPGA-based accelerators Dell HP Sparc 20DellGateway SLAAC WILDSTAR WILDFORCE SLAAC Ethernet Intelligent Hub 100 Mbps

7 Functional units of a typical Job Management System jobs & their requirements User Server Job Scheduler Resource Monitor available resources resource requirements scheduling policies Job Dispatcher resource allocation and job execution Resource Manager

8 Classification of Investigated Systems (1) Centralized JMS Distributed JMS w/o a Central Scheduler Distributed Operating System LSF CODINE PBS Condor RES Globus Legion NetSolve MOSIX

9 Parameter Study Scheduler Resource Monitor and Forecaster Distributed Computing Interface Compaq DCE AppLES NWS Classification of Investigated Systems (2)

10 Operating system, flexibility, user interface LSF Codine PBS CONDOR RES Distribution Source code OS Support User Interface Solaris Linux Tru64 NT GUI & CLI com pub pub/compubgov GUI & CLI GUI & CLI GUI & CLI

11 Scheduling and Resource Management LSF Codine PBS CONDOR RES Batch jobs Interactive jobs Parallel jobs Accounting

12 Efficiency and Utilization LSF Codine PBS CONDOR RES Stage-in and stage-out Timesharing Process migration Dynamic load balancing Scalability

13 Fault Tolerance and Security LSF Codine PBS CONDOR RES Checkpointing Daemon fault recovery Authentication Authorization

14 Documentation and Technical Support LSF Codine PBS CONDOR RES Documentation Technical support

15 JMS features supporting extension to reconfigurable hardware capability to define new dynamic resources strong support for stage-in and stage-out - configuration bitstreams - executable code - input/output data support for Windows NT and Linux

16 Ranking of Centralized Job Management Systems (1) Capability to define new dynamic resources: Excellent:LSF, PBS, CODINE More difficult:CONDOR, RES Stage-in and stage-out: Excellent:LSF, PBS Limited:CONDOR No:CODINE, RES

17 Ranking of Centralized Job Management Systems (2) Overall suitability to extend to reconfigurable hardware: 1.LSF 2.CODINE 3.PBS 4.CONDOR 5.RES without changing the JMS source code requires changes to the JMS source code

18 Submission host LIM Batch API Master host MLIM MBD Execution host SBD Child SBD LIM RES User job Extension of LSF to reconfigurable hardware (1) Operation of LSF LIM – Load Information Manager MLIM – Master LIM MBD – Master Batch Daemon SBD – Slave Batch Daemon RES – Remote Execution Server queue Load information other hosts other hosts bsub app

19 Extension of LSF to reconfigurable hardware(2) Submission host LIM Batch API Master host MLIM MBD Execution host SBD Child SBD LIM RES User job ELIM – External Load Information Manager ACS API – Adaptive Computing Systems API queue Load information other hosts other hosts bsub app ELIM ACS API 14 FPGA board Status of the board

20 Conclusions (1) 12 systems evaluated using 25 functional requirements + the suitability of extension to support reconfigurable hardware LSF, CODINE, PBS, and Condor ranked the highest in the functional requirements LSF, CODINE, and PBSPro found easy to extend without changes in their source codes LSF most suitable to support reconfigurable hardware

21 General software architecture of the extended system developed Experimental developments, verification and performance evaluation of the extended system in progress Conclusions (2)