EU funding for DataGrid under contract IST-2000- 25182 is gratefully acknowledged GridPP Tier-1A Centre CCLRC provides the GRIDPP collaboration (funded.

Slides:

Advertisements

Similar presentations

S.L.LloydATSE e-Science Visit April 2004Slide 1 GridPP – A UK Computing Grid for Particle Physics GridPP 19 UK Universities, CCLRC (RAL & Daresbury) and.

Advertisements

Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.

Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.

Level 1 Components of the Project. Level 0 Goal or Aim of GridPP. Level 2 Elements of the components. Level 2 Milestones for the elements.

Andrew McNab - Manchester HEP - 22 April 2002 EU DataGrid Testbed EU DataGrid Software releases Testbed 1 Job Lifecycle Authorisation at your site More.

GridPP News NeSC opening “Media” dissemination Tier 1/A hardware Web pages Collaboration meetings Nick Brook University of Bristol.

Andrew McNab - Manchester HEP - 2 May 2002 Testbed and Authorisation EU DataGrid Testbed 1 Job Lifecycle Software releases Authorisation at your site Grid/Web.

Martin Bly RAL CSF Tier 1/A RAL Tier 1/A Status HEPiX-HEPNT NIKHEF, May 2003.

Tier1A Status Andrew Sansum GRIDPP 8 23 September 2003.

Martin Bly RAL Tier1/A RAL Tier1/A Site Report HEPiX-HEPNT Vancouver, October 2003.

DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

 Changes to sources of funding for computing in the UK.  Past and present computing resources.  Future plans for computing developments. UK Status &

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.

12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status

GridPP Steve Lloyd, Chair of the GridPP Collaboration Board.

ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.

CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.

High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.

Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.

David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\

Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.

D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.

QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.

Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.

12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.

23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.

Robin Middleton RAL/PPD DG Co-ordination Rome, 23rd June 2001.

ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.

Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.

Software Scalability Issues in Large Clusters CHEP2003 – San Diego March 24-28, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek RHIC.

21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.

Laboratório de Instrumentação e Física Experimental de Partículas GRID Activities at LIP Jorge Gomes - (LIP Computer Centre)

SLAC Site Report Chuck Boeheim Assistant Director, SLAC Computing Services.

SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.

RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.

Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.

1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.

GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.

RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.

WP8 Meeting Glenn Patrick1 LHCb Grid Activities in UK Grid WP8 Meeting, 16th November 2000 Glenn Patrick (RAL)

Grid Glasgow Outline LHC Computing at a Glance Glasgow Starting Point LHC Computing Challenge CPU Intensive Applications Timeline ScotGRID.

20-May-2003HEPiX Amsterdam EDG Fabric Management on Solaris G. Cancio Melia, L. Cons, Ph. Defert, I. Reguero, J. Pelegrin, P. Poznanski, C. Ungil Presented.

GridPP Presentation to AstroGrid 13 December 2001 Steve Lloyd Queen Mary University of London.

GridPP Building a UK Computing Grid for Particle Physics Professor Steve Lloyd, Queen Mary, University of London Chair of the GridPP Collaboration Board.

Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP

…building the next IT revolution From Web to Grid…

T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.

Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.

Tier1 Andrew Sansum GRIDPP 10 June GRIDPP10 June 2004Tier1A2 Production Service for HEP (PPARC) GRIDPP ( ). –“ GridPP will enable testing.

Tier1A Status Andrew Sansum 30 January Overview Systems Staff Projects.

RAL Site report John Gordon ITD October 1999

UK Grid Meeting Glenn Patrick1 LHCb Grid Activities in UK Grid Prototype and Globus Technical Meeting QMW, 22nd November 2000 Glenn Patrick (RAL)

2-Sep-02Steve Traylen, RAL WP6 Test Bed Report1 RAL and UK WP6 Test Bed Report Steve Traylen, WP6

HEPiX 2 nd Nov 2000 Alan Silverman Proposal to form a Large Cluster SIG Alan Silverman 2 nd Nov 2000 HEPiX – Jefferson Lab.

Partner Logo A Tier1 Centre at RAL and more John Gordon eScience Centre CLRC-RAL HEPiX/HEPNT - Catania 19th April 2002.

Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.

RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,

David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.

Tier1A Status Martin Bly 28 April CPU Farm Older hardware: –108 dual processors (450, 600 and 1GHz) –156 dual processor 1400MHz PIII Recent delivery:

15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK

Hans Wenzel CDF CAF meeting October 18 th -19 th CMS Computing at FNAL Hans Wenzel Fermilab  Introduction  CMS: What's on the floor, How we got.

10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.

Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.

UK GridPP Tier-1/A Centre at CLRC

The INFN TIER1 Regional Centre

UK Testbed Status Testbed 0 GridPP project Experiments’ tests started

D. Galli, U. Marconi, V. Vagnoni INFN Bologna N. Brook Bristol

Gridifying the LHCb Monte Carlo production system

Presentation transcript:

EU funding for DataGrid under contract IST is gratefully acknowledged GridPP Tier-1A Centre CCLRC provides the GRIDPP collaboration (funded by PPARC) with a large computing facility at RAL based on commodity hardware and open source software. The current resources (~700 CPUs, ~80TB of usable disk, and 180TB of tape) provide an integrated service as a prototype Tier-1 centre for the LHC project and a Tier-A centre for the BaBar Experiment based at SLAC. The service is expected to grow substantially over the next few years. Job submission into the Tier1A can be carried out from any networked workstation, having access to the EDG User Interface software. EDG software provides the following tools: 1. Authentication grid-proxy-init 2. Job submission e edg-job-submit 3. Monitoring and control e edg-job-status edg-job-cancel edg-job-get-output 4. Data publication and replication globus-url-copy, GDMP 5. Resource scheduling – use of Mass Storage Systems JDL, sandboxes, storage elements The concept of a Tier1 centre predates the grid. The Monarc Project described a model of distributed computing for LHC with a series of layers or ‘tiers’ from the Tier0 at CERN which would hold all the raw data through Tier1s which would each hold all the reconstructed data through Tier2s to the individual institutes and physicists who would hold smaller and smaller abstractions of the data. The model also works partly in reverse with the Tier1 and 2 centres producing large amounts of simulated data which will migrate from Tier2 to Tier1 to CERN. EDG Job Submission The principle objective of the Tier1A is to assist GRIDPP in the deployment of experimental GRID technologies and meet their international commitments to various projects: It runs five independent international GRID testbeds, four for the European Datagrid collaborations and one for the LCG project. It provides GRID gateways into the main Tier1A “production” service for a number of projects including: the EDG and LCG projects, BaBarGrid for the SLAC based Babar experiment and SAM-Grid for the Fermilab CDF and D0 collaborations. Cluster Infrastructure The Tier1A uses Redhat Linux deployed on Pentium III and Pentium 4, rack mounted PC hardware. Disk storage is provided by external SCSI/IDE RAID controllers and commodity IDE disk. An STK Powderhorn Silo provides near-line tape access (see Datastore poster). COMPUTING FARM 7 racks holding 236 dual Pentium III/P4 cpus. DISK STORE 80TByte disk-based Mass Storage Unit after RAID 5 overhead. PCs are clustered on network switches with up to 8x1000Mb ethernet TAPE ROBOT Upgraded last year, uses 60GB STK 9940 tapes 45TB currrent capacity could hold 330TB.

EU funding for DataGrid under contract IST is gratefully acknowledged Automate which in turn will make a decision based on time of day and agreed service level agreement before escalating the problem – for example, out of office by paging a Computer Operator. Helpdesk Requestracker is used to provide a web based helpdesk. Queries can be submitted either by or by web before being queued into subject queues in the system and assigned ticket numbers.Specialists can then take ownership of the problem and an escalation and notification system ensures that tickets do not get overlooked. GridPP Tier-1A Centre The service needs to be flexible and able to rapidly deploy staff and hardware to meet the changing needs of GRIDPP. Access can either be by the traditional login and manual job submission and editing or via one of several deployed GRID gateways: There are five independent GRID testsbeds available. These range from a few standalone nodes for EDG work-package developers to ten or more systems deployed into the international EDG and LCG production testbeds which take part in wide area work. A wide range of work runs on the EDG testbed including work from the Biomedical and Earth observation workpackages. A number of GRID gateways have been installed into the production farm where the bulk of the hardware is available. Some such as the EDG/LCG gateways are maintained by Tier1A staff, but some such as SAM- Grid are managed by the user community in close collaboration with the Tier1A. Once through the gateway, jobs are scheduled within the main production facility. Behind the gateways, the hardware is deployed into a number of hardware logical pools running 3 separate releases of Redhat Linux, all controlled by the OpenPBS batch system and MAUI schedulers. With so many systems, its necessary to automate as much as possible of the installation process. We use LCFG on the GRID testbeds and a complex Kickstart infrastructure on the main production service. Behind the scenes are a large number of service systems and business processes such as: · File servers · Security scanners and password crackers · RPM package monitoring & update services, · Automation and system monitoring. · Performance monitoring and accounting · System consoles · Change Control · Helpdesk Gradually over the next year its likely that some of the above will be replaced by GRID based Fabric Management tools being developed by EDG. Cluster Management tools Real-time Performance Monitoring On a large, complex cluster, real-time monitoring of system statistics of all components is needed to understand workflow and assist in problem resolution. We use Ganglia to provide us with detailed system metrics such as CPU load, memory utilisation and disk I/O rates. Near Real-time CPU Accounting CPU accounting data is vital to allow resource allocations to be met and monitor activity on the cluster. The Tier1A services takes accounting information from the OpenPBS system and loads it into a MYSQL databases for post processing through locally written perl scripts Ganglia is a scalable distributed monitoring system for high- performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It relies on a multicast-based listen/announce protocol to monitor state within clusters. Automate The commercial Automate package coupled to CERN’s SURE monitoring system is used to maintain a watch on the service. Alarms are raised on SURE if systems break or critical system parameters are found to be out of bounds. In the event of a mission critical fault, SURE will notify