Presentation is loading. Please wait.

Presentation is loading. Please wait.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.

Similar presentations


Presentation on theme: "6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab."— Presentation transcript:

1 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab

2 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 2 Outline Computing problems in High Energy Physics Clusters at Fermilab Hardware Configuration Software Management Tools Future Plans

3 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 3 Fermilab In Batavia, IL. Since 1972, highest energy accelerator in the world.

4 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 4 Accelerator Collides protons and antiprotons at 2 TeV

5 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 5 Coarse Parallelism Basic idea: Each “event” is independent Code doesn’t vectorize well or need SMP 1000’s of instructions per byte of I/O Need lots of small, cheap computers Have used VAX, MC68020, IBM, SGI workstations, now Linux PC’s.

6 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 6 Types of Computing at Fermilab Simulation of detector response Data acquisition Event reconstruction Data analysis Theory calculations (Beowulf-like) Linux clusters used in all of the above!

7 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 7 Physics Motivation Three examples: Fixed target experiment (~1999) Collider experiment (running now) CMS experiment (running 5 years in future)

8 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 8 Fermilab E871 Called HyperCP, ran in 1997 and 1999 3 particles per event 10 billion events written to tape 22000 tapes, 5 Gb apiece More than 100 Tb of data! Analysis recently completed, about 1 yr.

9 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 9 Run II Collider Experiments CDF and D0—just starting to run now Expected data rate 1 Tb/day 50-100 tracks per event Goal: To reconstruct events as fast as they come in.

10 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 10 CDF Detector

11 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 11 Mass Storage System 1 Pb-capacity tape robot (ADIC) Mammoth tape drives, 11 Mb/sec Two tape drives per Linux PC Unix-like filespace to keep track of files Network-attached storage, can deliver up to 100 Mb/sec throughput.

12 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 12 Mass Storage System

13 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 13 Reconstruction Farm Five farms currently installed 340 dual CPU nodes in all, 500-1000 MHZ 50 Gb disk each, 512 Mb RAM One I/O node, SGI Origin 2000, 1 Tb disk, 4 CPU’s, 2 x Gigabit Ethernet. Farms Batch System software to coordinate batch jobs

14 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 14 Farms I/O Node SGI O2200 4 x 400 MHz 2 Gb Ethernet 1 Tb disk

15 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 15 Farm Workers 50 500 MHz Dual PIII 50 Gb disk 512 Mb RAM

16 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 16 Farm Workers 2U dual PIII 750 MHz, 50 Gb disk. 1Gb RAM.

17 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 17 Data mining and analysis facility SGI Origin 2000, 176 processors 5 Terabytes of disk and growing Used for repetitive analysis of small subsets of data Wouldn’t need the SMP but it is the easiest way to get a lot of processors near a lot of disk.

18 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 18 CMS Project

19 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 19 CMS Project Scheduled to run in 2005 at CERN’s LHC (Geneva, Switzerland) Fermilab is managing US contribution. Every 40 ns, expect 25 collisions Each collision makes 50-100 particles 1-10 petabytes of data has to be distributed around the world Will need at least 10000 of today’s fastest PC’s

20 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 20 Qualified Vendors We evaluate vendors on hardware reliability, competency in Linux, service quality, and price/performance. Vendors chosen for desktops and farm workers 13 companies submitted evaluation units, five chosen in each category

21 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 21 Fermi Linux Based on Red Hat Linux 6.1 (7.1 coming soon) Add a number of security fixes Follow all kernel and installer updates Updates sent out to ~1000 nodes by Autorpm Qualified vendors ship machines with it preloaded.

22 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 22 ICABOD Vendor ships system with Linux OS loaded. Expect scripts: –Reinstall the system if necessary –Change root password, partition disks –Configure static IP address –Install kerberos and ssh keys

23 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 23 Burn-in All nodes go through 1 month burn-in test. Load both CPU (2 x seti@home)seti@home Disk (Bonnie) Network test Monitor temperatures and current draw. Reject if more than 2% down time.

24 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 24 Management tools

25 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 25 NGOP Monitor (Display)

26 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 26 NGOP Monitor (Display)

27 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 27 FBSNG Farms Batch System, Next Generation Allows parallel batch jobs which may be dependent on each other Abstract and flexible resource definition and management Dynamic configuration through API Web-based interface

28 6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 28 Future plans Next level of integration—1 “pod” of six racks plus switch, console server, display. Linux on disk servers, for NFS/NIS “chaotic” analysis servers and compute farms to replace big SMP boxes Find NFS replacement (SAN?) Abandon tape altogether?


Download ppt "6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab."

Similar presentations


Ads by Google