Grid Infrastructure.

Slides:



Advertisements
Similar presentations
Workload Management David Colling Imperial College London.
Advertisements

EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Job Submission The European DataGrid Project Team
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
EGEE is a project funded by the European Union under contract IST Grid developments and middleware components Mike Mineter EGEE Training team.
The EDG Workload Management System – n° 1 The EDG Workload Management System.
Basic Grid Job Submission Alessandra Forti 28 March 2006.
INFSO-RI Enabling Grids for E-sciencE Grid Infrastructure & Related Projects Eddie Aronovich Tel-Aviv University, School of CS
Grid Infrastructure.
Workload Management Massimo Sgaravatto INFN Padova.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
NGS in the future: emerging middleware.
Computational grids and grids projects DSS,
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
DataGrid is a project funded by the European Union CHEP 2003 – March 2003 – M. Sgaravatto – n° 1 The EU DataGrid Workload Management System: towards.
L ABORATÓRIO DE INSTRUMENTAÇÃO EM FÍSICA EXPERIMENTAL DE PARTÍCULAS Enabling Grids for E-sciencE Grid Computing: Running your Jobs around the World.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)
M. Sgaravatto – n° 1 The EDG Workload Management System: release 2 Massimo Sgaravatto INFN Padova - DataGrid WP1
DataGrid is a project funded by the European Union VisualJob Demonstation EDG 1.4.x 2003 The EU DataGrid How the use of distributed resources can help.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
1 Esther Montes Prado CIEMAT 10th EELA Tutorial Madrid, Hands-on on WMS (Review and Summary)
Grid Workload Management Massimo Sgaravatto INFN Padova.
Training and the NGS Mike Mineter
EGEE is a project funded by the European Union under contract IST Middleware components in EGEE Mike Mineter NeSC Training team
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
EGEE is a project funded by the European Union under contract IST Job Description Language - more control over your Job Assaf Gottlieb University.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Job Services Emidio.
M. Sgaravatto – n° 1 Overview of release 2 of the EDG WP1 Workload Management System deployed in the INFN production Grid Massimo Sgaravatto INFN Padova.
E-infrastructure shared between Europe and Latin America 1 Workload Management System-WMS Luciano Diaz Universidad Nacional Autónoma de México - UNAM Mexico.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
EGEE-0 / LCG-2 middleware Practical.
INFSO-RI Enabling Grids for E-sciencE GILDA and GENIUS Guy Warner NeSC Training Team An induction to EGEE for GOSC and the NGS NeSC,
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
Induction: General components of Grid middleware and User Interfaces –April 26-28, General components of Grid middleware and User Interfaces Roberto.
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE Grid Services for Resource Reservation and Allocation Tiziana Ferrari Istituto Nazionale di.
EGEE is a project funded by the European Union under contract IST Job Description Language – How to control your Job Nadav Grossaug IsraGrid.
EDG - WP1 (Grid Work Scheduling) Status and plans Massimo Sgaravatto INFN Padova.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
User Interface UI TP: UI User Interface installation & configuration.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
13th EELA Tutorial, La Antigua, 18-19, October E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA
EGEE-II INFSO-RI Enabling Grids for E-sciencE Overview of gLite, the EGEE middleware Mike Mineter Training Outreach Education National.
Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA Grid2Win : gLite for Microsoft Windows Elisa Ingrà - INFN.
EGEE is a project funded by the European Union under contract IST GENIUS and GILDA Guy Warner NeSC Training Team Induction to Grid Computing.
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
EU 2nd Year Review – Feb – WP1 Demo – n° 1 WP1 demo Grid “logical” checkpointing Fabrizio Pacini (Datamat SpA, WP1 )
Grid Computing: Running your Jobs around the World
Workload Management System ( WMS )
EGEE tutorial, Job Description Language - more control over your Job Assaf Gottlieb Tel-Aviv University EGEE is a project.
Grid2Win: Porting of gLite middleware to Windows XP platform
Job Submission in the DataGrid Workload Management System
Introduction to Grid Technology
Workload Management System
5. Job Submission Grid Computing.
The gLite Workload Management System
Presentation transcript:

Grid Infrastructure

Acknowledgements Presentation is based on slides from: –Roberto Barbera, University of Catania and INFN (EGEE Tutorial Roma, ) –Mike Mineter, Concepts of grid computing –Fabrizio Gagliardi, EGEE Project Director, CERN, Geneva, Switzerland (Naregi Symposium 2005 – Tokyo) –Fabrizio Gagliardi, EGEE Project Director, CERN, Geneva, Switzerland (APAC, 27 September 2005) –Guy Warner, NeSC Training Team (An Induction to EGEE for GOSC and the NGS NeSC, 8th December 2004 ) Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 2

What is it ? Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 3 SERVERS Clients

IT all about IT Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 4

Hardware utilization Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 5

6

7

THE WORLD NEEDS ONLY FIVE COMPUTERS (Thomas J. Watson) Google grid Microsoft's live.com Yahoo! Amazon.com eBay Salesforce.com Well, that's O(5) ;) Greg Matter ( Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 8

Scaling Scale-up –Add more resources within the system –Does not requires changes in the applications –Limited extension –Singe point of failure Scape-out –Add more systems –Architecture dependent (needs change of code) –Economically Howto ? –Split the operation into groups –Perform each group on a different machine Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 9

How fast can parallelization be ? Let: – α be the proportion of the process that can not be parallelized. –P – number of processors –S – System speedup Amdhals law: S = 1 / (α + (1- α ) / P ) Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 10

Cluster types High availability –Active-Active –Active-Passive –Heart beat Load Balancing Cluster –Round robin (weighted/non-weighted) –System status aware (session, cpu load, etc) Compute cluster –Queuing system (condor, hadoop, open-pbs, LSF, etc.) –Single system image (ScaleMP, SSI, Mosix, nomad,etc.) Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 11

Batch to On-Line scale gLite & Globus Dedicated resources PBS Torque Utility computing (Condor) hadoop Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 12

Key Cloud Services Attributes Off-Site, Thirds-party provider Access via Internet Minimal/no IT skills required to “implement” Provisioning - self-service requesting; near real-time deployment; dynamic & fine-grained scaling Fine-grained usage-based pricing model UI - browser and successors Web services APIs as System Interface Shared resources/common versions Source: IDC, Sep 2008

What is “Grid” Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 14

What is Grid Computing ? Definition is not widely agreed Foster & Kesselman: Computing resources are not administered centrally. Open standards are used. Non-trivial quality of service is achieved. Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 15

Other definitions "the technology that enables resource virtualization, on-demand provisioning, and service (resource) sharing between organizations." (Plaszczak/Wellner) "a type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed autonomous resources dynamically at runtime depending on their availability, capability, performance, cost, and users' quality-of- service requirements“ (Buyya )autonomous "a service for sharing computer power and data storage capacity over the Internet." (CERN)Internet Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 16

SOA & Web services Decompose processing into services Each service works independently Main components: –Universal Description, Discovery and Integration –Simple Object Access Protocol –Web Services Description Language W3C standard Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 17

The Grid Metaphor Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 18

Virtual Organization What’s a VO? –People in different organisations seeking to cooperate and share resources across their organisational boundaries Why establish a Grid? –Share data –Pool computers –Collaborate The initial vision: “The Grid” The present reality: Many “grids” Each grid is an infrastructure enabling one or more “virtual organisations” to share computing resources Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 19 Institute A VO1 Institute CInstitute BInstitute DInstitute E VO2 Institute F

Stand alone computer Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 20 Application Operating system Hardware

Stand alone computer Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 21 Application Network stack Operating system Hardware

Stand alone computer Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 22 Application Grid Middleware Network stack Operating system Hardware

Grid middleware Middleware, is an interfaces between resources and the applications –User/Program Interface –Resource management –Connectivity –Information services –Collaboration Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 23

Middleware components – The batch approach Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 24 InformationService SE & CE info Publish Input “sandbox” + Broker Info ReplicaCatalogue DataSets info Logging & Book-keeping Author. &Authen. StorageElement ComputingElement Output “sandbox” ResourceBroker Job Status Job Submit Event Job Query Job Status Input “sandbox” Output “sandbox” “User interface”

UI Network Server Job Contr. Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node Characts. & status

UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status submitted Job Status UI: allows users to access the functionalities of the WMS (via command line, GUI, C++ and Java APIs)

UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status edg-job-submit myjob.jdl Myjob.jdl JobType = “Normal”; Executable = "$(CMS)/exe/sum.exe"; InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"}; OutputSandbox = {“sim.err”, “test.out”, “sim.log"}; Requirements = other. GlueHostOperatingSystemName == “linux" && other. GlueHostOperatingSystemRelease == "Red Hat 7.3“ && other.GlueCEPolicyMaxCPUTime > 10000; Rank = other.GlueCEStateFreeCPUs; submitted Job Statu s Job Description Language (JDL) to specify job characteristics and requirements

UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage Input Sandbox files Job waiting submitted Job Status NS: network daemon responsible for accepting incoming requests

Job submission UI Network Server Job Contr. - CondorG Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status WM: acts to satisfy the request Job Workload manager

Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Match- Maker/ Broker Where must this job be executed ?

Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Match- Maker/ Broker Matchmaker: responsible to find the “best” CE for a job

Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Match- Maker/ Broker Where are (which SEs) the needed data ? What is the status of the Grid ?

Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Match- Maker/ Broker CE choice

Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage waiting submitted Job Status Job Adapter Job Adapter: responsible for the final “touches” to the job before performing submission (e.g. creation of wrapper script, PFN, etc.)

Job submission UI Network Server Job Contr. Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage Job Status Job Controller: responsible for the actual job management operations (done via CondorG) Job submitted waiting ready

Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node CE characts & status SE characts & status RB storage Job Status Job submitted waiting ready scheduled

“Compute element” – reminder! Homogeneous set of worker nodes Grid gate node Local resource management system: Condor / PBS / LSF master Globus gatekeeper Job request Info system Logging gridmapfile I.S. Logging

Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node RB storage Job Status submitted waiting ready scheduled running “Grid enabled” data transfers/ accesses Job Input Sandbox files

Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node RB storage Job Status Output Sandbox files submitted waiting ready scheduled running done

Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node RB storage Job Status submitted waiting ready scheduled running done edg-job-get-output

Job submission UI Network Server Job Contr. - CondorG Workload Manager Replica Location Server Inform. Service Computing Element Storage Element RB node RB storage Job Status Output Sandbox files submitted waiting ready scheduled running done cleared

Job monitoring UI Log Monitor Logging & Bookkeeping Network Server Job Contr. - CondorG Workload Manager Computing Element RB node LM: parses CondorG log file (where CondorG logs info about jobs) and notifies LB LB: receives and stores job events; processes corresponding job status Log of job events edg-job-status edg-job-get-logging-info Job status

EGEE08 Istanbul, Turkey 43 Grids in Europe Prof. Dieter KRANZLMUELLER, EGEE 08