Building Large Scale Fabrics – A Summary Marcel Kunze, FZK.

Building Large Scale Fabrics – A Summary Marcel Kunze, FZK

ACAT 2002 MoscowMarcel Kunze - FZK Observation Everybody seems to need unprecedented amount of CPU, Disk and Network b/w Trend to PC based computing fabrics and commodity hardware LCG (CERN), L. Robertson CDF (Fermilab), M. Neubauer D0 (FermiLab), I. Terekhov Belle (KEK), P. Krokovny Hera-B (DESY), J. Hernandez Ligo, P. Shawhan Virgo, D. Busculic AMS, A.Klimentov Considerable savings in cost wrt. RISC based farm: Not enough ‘bang for the buck’ (M. Neubauer)

ACAT 2002 MoscowMarcel Kunze - FZK AMS02 Benchmarks Executive time of AMS “standard” job compare to CPU clock 1) V.Choutko, A.Klimentov AMS note 2001-11-01 1) Brand, CPU, Memory Intel PII dual-CPU 450 MHz, 512 MB RAM OS/Compiler RH Linux 6.2 / gcc 2.95 “Sim” 1 “Rec” 1 Intel PIII dual-CPU 933 MHz, 512 MB RAMRH Linux 6.2 / gcc 2.950.54 Compaq, Quad α-ev67 600 MHz, 2 GB RAMRH Linux 6.2 / gcc 2.950.580.59 AMD Athlon,1.2GHz, 256 MB RAMRH Linux 6.2 / gcc 2.950.390.34 Intel Pentium IV 1.5GHz, 256 MB RAMRH Linux 6.2 / gcc 2.950.440.58 Compaq dual-CPU PIV Xeon 1.7GHz, 2GB RAMRH Linux 6.2 / gcc 2.950.320.39 Compaq dual α-ev68 866MHz, 2GB RAMTru64 Unix/ cxx 6.20.230.25 Elonex Intel dual-CPU PIV Xeon 2GHz, 1GB RAMRH Linux 7.2 / gcc 2.950.290.35 AMD Athlon 1800MP, dual-CPU 1.53GHz, 1GB RAMRH Linux 7.2 / gcc 2.950.240.23 8 CPU SUN-Fire-880, 750MHz, 8GB RAMSolaris 5.8/C++ 5.20.520.45 24 CPU Sun Ultrasparc-III+, 900MHz, 96GB RAMRH Linux 6.2 / gcc 2.950.430.39 Compaq α-ev68 dual 866MHz, 2GB RAMRH Linux 7.1 / gcc 2.950.220.23

ACAT 2002 MoscowMarcel Kunze - FZK Fabrics and Networks: Commodity Equipment Needed for LHC at CERN in 2006: Storage Raw recording rate 0.1 – 1 GB/sec Accumulating at 5-8 PetaBytes/year 10 PetaBytes of disk Processing 200’000 of today’s (2001) fastest PCs Networks 5-10 Gbps between main Grid nodes Distributed computing effort to avoid congestion: 1/3 at CERN 2/3 elsewhere

ACAT 2002 MoscowMarcel Kunze - FZK PC Cluster 5 (Belle) 1U server Pentium III 1.2GHz 256 CPU (128 nodes)

ACAT 2002 MoscowMarcel Kunze - FZK PC Cluster 6 Blade server: LP Pentium III 700MHz 40CPU (40 nodes) 3U

ACAT 2002 MoscowMarcel Kunze - FZK Disk Storage

ACAT 2002 MoscowMarcel Kunze - FZK IDE Performance

ACAT 2002 MoscowMarcel Kunze - FZK Basic Questions Compute farms contain several 1000s of computing elements Storage farms contain 1000s of disk drives How to build scalable systems ? How to build reliable systems ? How to operate and maintain large fabrics ? How to recover from errors ? EDG deals with the issue (P. Kunszt) IBM deals with the issue (N. Zheleznykh) Project Eliza: Self healing clusters Several ideas and tools are already on the market

ACAT 2002 MoscowMarcel Kunze - FZK Storage Scalability Difficult to scale up to systems of 1000s of components and keep single system image: NFS-Automounter, Symbolic links etc. ( M.Neubauer, CAF : ROOTD does not need this and allows for direct worldwide access to distributed files w/o mounts) Scalability in size and throughput by means of storage virtualisation Allows to set up non-TCP/IP based systems to handle multi-GB/s

ACAT 2002 MoscowMarcel Kunze - FZK Virtualisation of Storage Internet Intranet Data Servers mount virtual storage as SCSI-Device Storage Area Network (FCAL, InfiniBand,…) Input Load balancing switch Shared Data Access (Oracle, PROOF) Scalability 200 MB/s sustained

ACAT 2002 MoscowMarcel Kunze - FZK Storage Elements (M. Gasthuber) PNFS = Perfectly Normal FileSystem Store MetaData with the Data 8 hierarchies of file tags Migration of data (hierarchical storage systems): dCache Development of DESY and FermiLab ACLs, Kerberos, ROOT-aware Web-Monitoring Cached as well as direct tape access Fail-safe

ACAT 2002 MoscowMarcel Kunze - FZK Necessary admin. Tools (A. Manabe) System (SW) Installation /update Dolly++ (Image cloning) Configuration Arusha (http://ark.sourceforge.net) LCFGng (http://www.lcfg.org) Status Monitoring/ System Health Check CPU/memory/disk/network utilization: Ganglia *1,plantir *2 (Sub-)system service sanity check: Pikt *3 /Pica *4 /cfengine *1 http://ganglia.sourceforge.net *2 http://www.netsonde.com *3 http://pikt.org *4 http://pica.sourceforge.net/wtf.htmlhttp://www.netsonde.comhttp://pikt.orghttp://pica.sourceforge.net/wtf.html Command Execution WANI: WEB base remote command executer

ACAT 2002 MoscowMarcel Kunze - FZK WANI is implemented on `Webmin’ GUI Command input Node selection Start

ACAT 2002 MoscowMarcel Kunze - FZK Command execution result Host name Results from 200nodes in 1 Page

ACAT 2002 MoscowMarcel Kunze - FZK Click here Stderr output Click here Stdout output

ACAT 2002 MoscowMarcel Kunze - FZK CPU Scalability The current tools scale up to ~1000 CPUs (In the previous example 10000 CPUs would require to check 50 pages) Autonomous operation required Intelligent self-healing clusters

ACAT 2002 MoscowMarcel Kunze - FZK Resource Scheduling Problem: How to access local resources from the Grid ? Local batch queues vs. Global batch queues Extension of Dynamite (Amsterdam university) to work with Globus: Dynamite-G (I. Shoshmina) Open Question: How do we deal with interactive applications on the Grid ?

ACAT 2002 MoscowMarcel Kunze - FZK Conclusions A lot of tools exist A lot of work needs yet to be done in the Fabric area in order to get reliable, scalable systems

Building Large Scale Fabrics – A Summary Marcel Kunze, FZK.

Similar presentations

Presentation on theme: "Building Large Scale Fabrics – A Summary Marcel Kunze, FZK."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Building Large Scale Fabrics – A Summary Marcel Kunze, FZK.

Similar presentations

Presentation on theme: "Building Large Scale Fabrics – A Summary Marcel Kunze, FZK."— Presentation transcript:

Similar presentations

About project

Feedback