XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez,

Slides:



Advertisements
Similar presentations
XtreemOS IP project is funded by the European Commission under contract IST-FP XtreemOS: Building and Promoting a Linux-based Operating System.
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
DataGrid is a project funded by the European Commission under contract IST WP2 – R2.1 Overview of WP2 middleware as present in EDG 2.1 release.
COMP Superscalar: Bringing GRID superscalar and GCM together Enric Tejedor Universitat Politècnica de Catalunya V ProActive and GCM.
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
Figure 1.1 Interaction between applications and the operating system.
Workload Management Massimo Sgaravatto INFN Padova.
Experience with K42, an open- source, Linux-compatible, scalable operation-system kernel IBM SYSTEM JOURNAL, VOL 44 NO 2, 2005 J. Appovoo 、 M. Auslander.
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Computer System Architectures Computer System Software
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Grid Initiatives for e-Science virtual communities in Europe and Latin America DIRAC TEAM CPPM – CNRS DIRAC Grid Middleware.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
High Throughput Computing on P2P Networks Carlos Pérez Miguel
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
1 A Framework for Data-Intensive Computing with Cloud Bursting Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The Ohio.
Overview of Monitoring and Information Systems in OSG MWGS08 - September 18, Chicago Marco Mambelli - University of Chicago
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
Enabling Project Communication, Collaboration & Workflow (CCW)
Middleware for Grid Computing and the relationship to Middleware at large ECE 1770 : Middleware Systems By: Sepehr (Sep) Seyedi Date: Thurs. January 23,
ServiceSs, a new programming model for the Cloud Daniele Lezzi, Rosa M. Badia, Jorge Ejarque, Raul Sirvent, Enric Tejedor Grid Computing and Clusters Group.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Cracow Grid Workshop ‘06 17 October 2006 Execution Management and SLA Enforcement in Akogrimo Antonios Litke Antonios Litke, Kleopatra Konstanteli, Vassiliki.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
SAN DIEGO SUPERCOMPUTER CENTER Inca TeraGrid Status Kate Ericson November 2, 2006.
Flexibility, Manageability and Performance in a Grid Storage Appliance John Bent, Venkateshwaran Venkataramani, Nick Leroy, Alain Roy, Joseph Stanley,
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
INFSO-RI Enabling Grids for E-sciencE Graphical User Interface. for Charon Extension Layer System. and Application Dashboards Jan.
INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract INFSO-RI Grid Accounting.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
INFSO-RI Enabling Grids for E-sciencE Use Case of gLite Services Utilization. Multiple Ligand Trajectory Docking Study Jan Kmuníček.
第 1 讲 分布式系统概述 §1.1 分布式系统的定义 §1.2 分布式系统分类 §1.3 分布式系统体系结构.
Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems.
Dr. Isabel Campos Plasencia (IFCA-CSIC) Spanish NGI Coordinator ES-GRID The Spanish National Grid Initiative.
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
DataTAG is a project funded by the European Union International School on Grid Computing, 23 Jul 2003 – n o 1 GridICE The eyes of the grid PART I. Introduction.
DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 1 / 10 Grid Monitoring A conceptual introduction to GridICE Sergio Andreozzi
BDTS and Its Evaluation on IGTMD link C. Chen, S. Soudan, M. Pasin, B. Chen, D. Divakaran, P. Primet CC-IN2P3, LIP ENS-Lyon
XtreemOS IP project is funded by the European Commission under contract IST-FP Scientific coordinator Christine Morin, INRIA Presented by Ana.
Brief overview on GridICE and Ticketing System
Grid Computing.
Auburn University COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques (2) Dr. Xiao Qin Auburn University.
Class project by Piyush Ranjan Satapathy & Van Lepham
Software models - Software Architecture Design Patterns
Lecture 4- Threads, SMP, and Microkernels
Development of Information Grid
Status of Grids for HEP and HENP
Presentation transcript:

XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez, Toni Cortes Barcelona Supercomputing Center (BSC – CNS) XtreemOS is funded by the European Commission through the Information Society Technology under contract IST-FP

Outline XtreemOS Overview Application Execution Manager Job Execution Flow Monitoring Performance and scalability Job Execution Job Status Future

XtreemOS overview What is? A Linux-based operating system to support Virtual Organizations for Grid. Several layers

XtreemOS overview Some key features: The Grid easy to use (like a Linux) Highly scalable. Fault Tolerant. Able to run interactive jobs. Extensible 3 nodes types (can be replicated): Core Resource Client

Application Execution Manager Job management, Monitoring and resource management. Access Point to submit and control jobs. Distributed and asynchronous. Extensible Linux concepts in Grid world: Process-Thread paradigm. Signals.

Application Execution Manager Several distributed services: Job Manager. Execution Manager. Reservation Manager. … Semantics: JobUnit Set of processes of a Job running in a resource. Job Set of JobUnits. Identified by a JobID. [Process- Thread]

Job Execution Flow XOSD JobMng User XOSD ExecMng JobDirectory RSS Any XOSD Kernel JID = createJob(JSDL) JID runJob(JID) getResources(JSDL) Schedules & Executes process Job finished (all processes finished)

Monitoring System metrics. User defined metrics. Different levels of information. Buffering. Each service mantains its monitoring information (SCOPE). ExecMng has information about processes. JobMng has information about jobs. ResMng has information about resources.

Performance & scalability Key points: Collaboration with Linux Kernel. No central storage. (DHT’s) Can be replicated. Don’t search for best global scheduling, only for a good enough local scheduling. What is the performance without DHT’s? Typical VO, small (100 nodes) local grid.

Job Execution O(X 2 ): Need resource management for each submitted process. All processes are from the same job. (in other systems they would be independent jobs)

Job status Ask all processes information of the job with low overhead. Look job finished status in seconds (0.014 in GT5) without contacting ExecMng’s

Future improvements Reduced internal communication times. Caching to reduce overhead. Some conclusions: Kernel Collaboration with «middleware» is important. DHT’s (not evaluated) are a good option to distribute data. But still no high performance. Including the concept 1 Job-> n Process gives the user a lot of benefits. Easy to understand, easy to manage.

XTREEMOS APPLICATION EXECUTION MANAGEMENT: A SCALABLE APPROACH Ramon Nou, Jacobo Giralt, Julita Corbalan, Enric Tejedor, J.Oriol Fitó, Josep M. Perez, Toni Cortes Barcelona Supercomputing Center (BSC – CNS) XtreemOS is funded by the European Commission through the Information Society Technology under contract IST-FP