Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 The Gelato Federation What is it exactly ? Sverre Jarp March, 2003.

Similar presentations


Presentation on theme: "1 The Gelato Federation What is it exactly ? Sverre Jarp March, 2003."— Presentation transcript:

1 1 The Gelato Federation What is it exactly ? Sverre Jarp March, 2003

2 SJ – Mar 2003 2 Gelato is a collaboration Goal: Promote Linux on Itanium-based systems Sponsor Hewlett-Packard Others coming Members 13 (right now) Mainly from the High Performance/High Throughput Community Expected to grow rapidly

3 SJ – Mar 2003 3 Current members North America NCAR (National Center for Atmospheric Research) NCSA (National Center for Supercomputing Applications) PNNL (Pacific Northwest National Lab) PSC (Pittsburgh Supercomputer Center) University of Illinois-Urbana/Champaign University of Waterloo Europe CERN DIKU (Datalogic Institute, University of Copenhagen) ESIEE (École Supérieure d’ingénieurs near Paris) INRIA (Institut National de Recherche en Informatique et Automatique) Far-East/Australia Bio-informatics Institute (Singapore) University of Tsinghua (Beijing) University of New South Wales (Sydney)

4 SJ – Mar 2003 4 Center of gravity Web portal (http://www.gelato.org)http://www.gelato.org Rich content (Pointers to) Open source IA-64 applications Examples: ROOT (from CERN) OSCAR (Cluster mgmt software from NSCA) OpenImpact compiler (UIUC) News Information, advice, hints Related to IPF, Linux kernel, etc. Member overview Who is who, etc.

5 SJ – Mar 2003 5 Current development focus Six “performance” areas: Single system scalability From 2-way to 16-way (HP, Fort Collins) Cluster Scalability and Performance Mgmt Up to 128-nodes: NSCA Parallel File System BII Compilers UIUC Performance tools, management HP Labs

6 SJ – Mar 2003 6 CERN Requirement # 1 Better C++ performance through Better compilers Faster systems Both! Gelato focus Madison @ 1.5 GHz

7 SJ – Mar 2003 7 Further Gelato Research and Development Linux memory management Superpages TLB sharing between processes IA-64 pre-emption support Compilers/Debuggers OpenImpact C compiler (UIUC) Open Research Compiler enhancements (Tsinghua) Fortran, C, C++ Parallel debugger (Tsinghua)

8 SJ – Mar 2003 8 The “opencluster” and the “openlab” Sverre Jarp IT Division CERN

9 SJ – Mar 2003 9 Definitions The “CERN openlab for DataGrid applications” is a framework for evaluating and integrating cutting-edge technologies or services in partnership with industry, focusing on potential solutions for the LCG. The openlab invites members of the industry to join and contribute systems, resources or services, and carry out with CERN large-scale high-performance evaluation of their solutions in an advanced integrated environment. “opencluster” project The openlab is constructing a pilot ‘compute and storage farm’ called the opencluster, based on HP's dual processor servers, Intel's Itanium Processor Family (IPF) processors, Enterasys's 10-Gbps switches and, at a later stage, a high-capacity storage system.

10 SJ – Mar 2003 10 Technology onslaught Large amounts of new technology will become available between now and LHC start-up. A few HW examples: Processors SMT (Symmetric Multi-Threading) CMP (Chip Multiprocessor) Ubiquitous 64-bit computing (even in laptops) Memory DDR II-400 (fast) Servers with 1 TB (large) Interconnect PCI-X  PCI-X2  PCI-Express (serial) Infiniband Computer architecture Chipsets on steroids Modular computers ISC2003 Keynote Presentation Building Efficient HPC Systems from Catalog Components Justin Rattner, Intel Corp., Santa Clara, USABuilding Efficient HPC Systems from Catalog Components Justin Rattner Disks Serial-ATA Ethernet 10 GbE (NICs and switches) 1 Terabit backplanes Not all, but some of this will definitely be used by LHC

11 SJ – Mar 2003 11 Vision: A fully functional GRID cluster node Gigabit long-haul link CPU Servers WAN Multi-gigabit LAN Storage system Remote Fabric

12 SJ – Mar 2003 12 opencluster strategy Demonstrate promising technologies LCG and LHC on-line Deploy the technologies well beyond the opencluster itself 10 GbE interconnect in the LHC Testbed Act as a 64-bit Porting Centre CMS and Alice already active; ATLAS is interested CASTOR 64-bit reference platform Storage subsystem as CERN-wide pilot Focal point for vendor collaborations For instance, in the “10 GbE Challenge” everybody must collaborate in order to be successful Channel for providing information to vendors Thematic workshops

13 SJ – Mar 2003 13 The opencluster today Three industrial partners: Enterasys, HP, and Intel A fourth partner joining soon Data storage subsystem Which would “fulfill the vision” Technology aimed at the LHC era Network switches at 10 Gigabits Rack-mounted HP servers 64-bit Itanium processors Cluster evolution: 2002: Cluster of 32 systems (64 processors) 2003: 64 systems (“Madison” processors) 2004/05: Possibly 128 systems (“Montecito” processors)

14 SJ – Mar 2003 14 Activity overview Over the last few months Cluster installation, middleware Application porting, compiler installations, benchmarking Initialization of “Challenges” Planned first thematic workshop Future Porting of grid middleware Grid integration and benchmarking Storage partnership Cluster upgrades/expansion New generation network switches

15 SJ – Mar 2003 15 opencluster in detail Integration of the cluster: Fully automated network installations 32 nodes + development nodes RedHat Advanced Workstation 2.1 OpenAFS, LSF GNU, Intel, ORC Compilers (64-bit) ORC (Open Research Compiler, used to belong to SGI) CERN middleware: Castor data mgmt CERN Applications Porting, Benchmarking, Performance improvements CLHEP, GEANT4, ROOT, Sixtrack, CERNLIB, etc. Database software (MySQL, Oracle?) Many thanks to my colleagues in ADC, FIO and CS

16 SJ – Mar 2003 16 The compute nodes HP rx2600 Rack-mounted (2U) systems Two Itanium-2 processors 900 or 1000 MHz Field upgradeable to next generation 2 or 4 GB memory (max 12 GB) 3 hot pluggable SCSI discs (36 or 73 GB) On-board 100 Mbit and 1 Gbit Ethernet 4 PCI-X slots: full-size 133 MHz/64-bit slot(s) Built-in management processor Accessible via serial port or Ethernet interface

17 SJ – Mar 2003 17 rx2600 block diagram PCIX 133/64 LAN 10/100 zx1 IOA HD D USB 2.0 IDE CD - DVD SCSI Ultra 160 Gbit LAN PCIX 133/64 zx1 IOA cell 1 HD D CD/ DV D 12 DIMMs Intel Itanium 2 3 internal drives zx1 Memory & I/O Controller LAN 10/100 3 serial ports cell 0 zx1 IOA Service processor Management Processor card VGA monitor channel a channel b 6.4GB/s PCIX 133/64 4.3 GB/s 1 GB/s

18 SJ – Mar 2003 18 Benchmarks Comment: Note that 64-bit benchmarks will pay a performance penalty for LP64, i.e. 64-bit pointers. Need to wait for AMD systems that can run natively either a 32-bit OS or a 64-bit OS to understand the exact cost for our benchmarks.

19 SJ – Mar 2003 19 Benchmark-1: Sixtrack (SPEC) What we would have liked to see for all CERN benchmarks: Projections: Madison @ 1.5 GHz: ~ 81 s  Small is best! From www.spec.orgItanium 2 @ 1000MHz (efc7) Pentium 4 @ 3.06 GHz (ifl7) IBM Power4 @ 1300 MHz Sixtrack122 s195 s202 s

20 SJ – Mar 2003 20 Benchmark-2: CRN jobs/FTN Itanium @ 800 MHz (efc O3, prof_use) Itanium 2 @ 1000 MHz (efc O3, ipo, prof_use) Pentium 4 Xeon @ 2 GHz, 512KB (ifc O3, ipo, prof_use) Geom. Mean183387415 CU/MHz0.230.390.21  Big is best! Projections: Madison @ 1.5 GHz: ~ 585 CU P4 Xeon @ 3.0 GHz: ~ 620 CU

21 SJ – Mar 2003 21 Benchmark-3: Rootmarks/C++ All jobs run in “batch” mode ROOT 3.05.02 Itanium 2 @ 1000MHz (gcc 3.2, O3) Itanium 2 @ 1000MHz (ecc7 prod, O2) Stress –b -q423476 Bench –b -q449534 Root -b benchmarks.C -q344325 Geometric Mean402436 Projections: Madison @ 1.5 GHz: ~ 660 RM Pentium 4 @ 3.0 GHz/512KB: ~ 750 RM René’s own 2.4 GHz P4 is normalized to 600 RM. Stop press: We have just agreed on a compiler improvement project with Intel

22 SJ – Mar 2003 22 opencluster - phase 1 Perform cluster benchmarks: Parallel ROOT queries (via PROOF) Observed excellent scaling: 2  4  8  16  32  64 CPUs To be reported at CHEP2003 “1 GB/s to tape” challenge Network interconnect via 10 GbE switches Opencluster may act as CPU servers 50 StorageTek tape drives in parallel “10 Gbit/s network Challenge” Groups together all Openlab partners Enterasys switch HP servers Intel processors and n/w cards CERN Linux and n/w expertise

23 SJ – Mar 2003 23 10 GbE Challenge

24 SJ – Mar 2003 24 Network topology in 2002 1-1213-2425-3637-48 49- 6061-7273-8485-96 1-1213-2425-36 4 4 4 4 4444 2 2 2 2 2 1-96 FastEthernet Disk Servers E1 OAS Gig copper Gig fiber 10 Gig

25 SJ – Mar 2003 25 Enterasys extension 1Q2003 1-12 13-24 25-36 37-48 49- 6061-72 73-8485-96 1-12 13-2425-36 4 4 4 4444 2 2 2 2 2 1-96 FastEthernet Disk Servers Gig copper Gig fiber 10 Gig 32 32 node Itanium cluster200+ node Pentium cluster E1 OAS

26 SJ – Mar 2003 26 Why a 10 GbE Challenge? Demonstrate LHC-era technology All necessary components available inside the opencluster Identify bottlenecks And see if we can improve We know that Ethernet is here to stay 4 years from now 10 Gbit/s should be commonly available Backbone technology Cluster interconnect Possibly also for iSCSI and RDMA traffic We want to advance the state-of-the-art !

27 SJ – Mar 2003 27 Demonstration of openlab partnership Everybody contributes: Enterasys 10 Gbit switches Hewlett-Packard Server with its PCI-X slots and memory bus Intel 10 Gbit NICs plus driver Processors (i.e. code optimization) CERN Linux kernel expertise Network expertise Project management IA32 expertise CPU clusters, disk servers on multi-Gbit infrastructure

28 SJ – Mar 2003 28 “Can we reach 400 – 600 MB/s throughput?” Bottlenecks could be: Linux CPU consumption Kernel and driver optimization Number of interrupts; tcp checksum; ip packet handling, etc. Definitely need TCP offload capabilities Server hardware Memory banks and speeds PCI-X slot and overall speed Switch Single transfer throughput Aim: identify bottleneck(s) Measure peak throughput Corresponding cost: processor, memory, switch, etc.

29 SJ – Mar 2003 29 Gridification

30 SJ – Mar 2003 30 Opencluster - future Port and validation of EDG 2.0 software Joint project with CMS Integrate opencluster alongside EDG testbed Porting, Verification Relevant software packages (hundreds of RPMs) Understand chain of prerequisites Exploit possibility to leave control node as IA-32 Interoperability with EDG testbeds and later with LCG-1 Integration into existing authentication scheme GRID benchmarks To be defined later Fact sheet: HP joined openlab mainly because of their interest in Grids

31 SJ – Mar 2003 31 Opencluster time line Jan 03Jan 04Jan 05Jan 06 Install 32 nodes Start phase 1 - Systems expertise in place Complete phase 1 Order/Install G-2 upgrades and 32 more nodes Order/Install G-3 upgrades; Add nodes openCluster integration EDG and LCG interoperability Start phase 2 Complete phase 2Start phase 3

32 SJ – Mar 2003 32 Recap: opencluster strategy Demonstrate promising IT technologies File system technology to come Deploy the technologies well beyond the opencluster itself Focal point for vendor collaborations Channel for providing information to vendors

33 SJ – Mar 2003 33 Storage Workshop Data and Storage Mgmt Workshop (Draft Agenda) March 17th – 18th 2003 (Sverre Jarp) Organized by the CERN openlab for Datagrid applications and the LCG Aim: Understand how to create synergy between our industrial partners and LHC Computing in the area of storage management and data access. Day 1 (IT Amphitheatre) Introductory talks: 09:00 – 09:15 Welcome. (von Rueden) 09:15 – 09:35 Openlab technical overview (Jarp) 09:35 – 10:15 Gridifying the LHC Data: Challenges and current shortcomings (Kunszt) 10:15 – 10:45 Coffee break The current situation: 10:45 – 11:15 Physics Data Structures and Access Patterns (Wildish) 11:15 – 11:35 The Andrew File System Usage in CERN and HEP (Többicke) 11:35 – 12:05 CASTOR: CERN’s data management system (Durand) 12:05 – 12:25 IDE Disk Servers: A cost-effective cache for physics data (NN) 12:25 – 14:00 Lunch Preparing for the future 14:00 – 14:30 ALICE Data Challenges: On the way to recording @ 1 GB/s (Divià) 14:30 – 15:00 Lessons learnt from managing data in the European Data Grid (Kunszt) 15:00 – 15:30 Could Oracle become a player in the physics data management? (Shiers) 15:30 – 16:00 CASTOR: possible evolution into the LHC era (Barring) 16:00 – 16:30 POOL: LHC data Persistency (Duellmann) 16:30 – 17:00 Coffee break 17:00 – Discussions and conclusion of day 1 (All) Day 2 (IT Amphitheatre) Vendor interventions; One-on-one discussions with CERN

34 SJ – Mar 2003 34 THANK YOU


Download ppt "1 The Gelato Federation What is it exactly ? Sverre Jarp March, 2003."

Similar presentations


Ads by Google