6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.

Slides:

Advertisements

Similar presentations

Beowulf Supercomputer System Lee, Jung won CS843.

Advertisements

Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.

ACAT 2002, Moscow June 24-28thJ. Hernández. DESY-Zeuthen1 Offline Mass Data Processing using Online Computing Resources at HERA-B José Hernández DESY-Zeuthen.

Title US-CMS User Facilities Vivian O’Dell US CMS Physics Meeting May 18, 2001.

IFIN-HH LHCB GRID Activities Eduard Pauna Radu Stoica.

Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.

EU funding for DataGrid under contract IST is gratefully acknowledged GridPP Tier-1A Centre CCLRC provides the GRIDPP collaboration (funded.

Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.

Tanenbaum 8.3 See references

Fabric Management for CERN Experiments Past, Present, and Future Tim Smith CERN/IT.

The D0 Monte Carlo Challenge Gregory E. Graham University of Maryland (for the D0 Collaboration) February 8, 2000 CHEP 2000.

CERN - European Laboratory for Particle Physics HEP Computer Farms Frédéric Hemmer CERN Information Technology Division Physics Data processing Group.

Online Systems Status Review of requirements System configuration Current acquisitions Next steps... Upgrade Meeting 4-Sep-1997 Stu Fuess.

D0 Run IIb Review 15-Jul-2004 Run IIb DAQ / Online status Stu Fuess Fermilab.

University of Southampton Clusters: Changing the Face of Campus Computing Kenji Takeda School of Engineering Sciences Ian Hardy Oz Parchment Southampton.

9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.

The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.

The SLAC Cluster Chuck Boeheim Assistant Director, SLAC Computing Services.

TechFair ‘05 University of Arlington November 16, 2005.

Alain Romeyer - 15/06/20041 CMS farm Mons Final goal : included in the GRID CMS framework To be involved in the CMS data processing scheme.

D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,

Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.

D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.

An Overview of PHENIX Computing Ju Hwan Kang (Yonsei Univ.) and Jysoo Lee (KISTI) International HEP DataGrid Workshop November 8 ~ 9, 2002 Kyungpook National.

Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.

Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.

A Design for KCAF for CDF Experiment Kihyeon Cho (CHEP, Kyungpook National University) and Jysoo Lee (KISTI, Supercomputing Center) The International Workshop.

Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.

International Workshop on HEP Data Grid Nov 9, 2002, KNU Data Storage, Network, Handling, and Clustering in CDF Korea group Intae Yu*, Junghyun Kim, Ilsung.

CDF Offline Production Farms Stephen Wolbers for the CDF Production Farms Group May 30, 2001.

Software Scalability Issues in Large Clusters CHEP2003 – San Diego March 24-28, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek RHIC.

21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.

Laboratório de Instrumentação e Física Experimental de Partículas GRID Activities at LIP Jorge Gomes - (LIP Computer Centre)

SLAC Site Report Chuck Boeheim Assistant Director, SLAC Computing Services.

Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.

RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.

22nd March 2000HEPSYSMAN Oxford Particle Physics Site Report Pete Gronbech Systems Manager.

2-3 April 2001HEPSYSMAN Oxford Particle Physics Site Report Pete Gronbech Systems Manager.

JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.

CASPUR Site Report Andrei Maslennikov Lead - Systems Amsterdam, May 2003.

IDE disk servers at CERN Helge Meinhard / CERN-IT CERN OpenLab workshop 17 March 2003.

RAL Site report John Gordon ITD October 1999

PC clusters in KEK A.Manabe KEK(Japan). 22 May '01LSCC WS '012 PC clusters in KEK s Belle (in KEKB) PC clusters s Neutron Shielding Simulation cluster.

CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.

Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.

International Workshop on HEP Data Grid Aug 23, 2003, KNU Status of Data Storage, Network, Clustering in SKKU CDF group Intae Yu*, Joong Seok Chae Department.

Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.

The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

MC Production in Canada Pierre Savard University of Toronto and TRIUMF IFC Meeting October 2003.

Scientific Computing Facilities for CMS Simulation Shams Shahid Ayub CTC-CERN Computer Lab.

15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK

D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

Hans Wenzel CDF CAF meeting October 18 th -19 th CMS Computing at FNAL Hans Wenzel Fermilab  Introduction  CMS: What's on the floor, How we got.

Oct. 6, 1999PHENIX Comp. Mtg.1 CC-J: Progress, Prospects and PBS Shin’ya Sawada (KEK) For CCJ-WG.

10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.

Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.

11 October 2000Iain A Bertram - Lancaster University1 Lancaster Computing Facility zStatus yVendor for Facility Chosen: Workstations UK yPurchase Contract.

Running clusters on a Shoestring US Lattice QCD Fermilab SC 2007.

Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.

Compute and Storage For the Farm at Jlab

Southwest Tier 2.

Who’s in charge in there?

Linux Cluster Tools Development

Cluster Computers.

Presentation transcript:

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 2 Outline Computing problems in High Energy Physics Clusters at Fermilab Hardware Configuration Software Management Tools Future Plans

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 3 Fermilab In Batavia, IL. Since 1972, highest energy accelerator in the world.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 4 Accelerator Collides protons and antiprotons at 2 TeV

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 5 Coarse Parallelism Basic idea: Each “event” is independent Code doesn’t vectorize well or need SMP 1000’s of instructions per byte of I/O Need lots of small, cheap computers Have used VAX, MC68020, IBM, SGI workstations, now Linux PC’s.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 6 Types of Computing at Fermilab Simulation of detector response Data acquisition Event reconstruction Data analysis Theory calculations (Beowulf-like) Linux clusters used in all of the above!

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 7 Physics Motivation Three examples: Fixed target experiment (~1999) Collider experiment (running now) CMS experiment (running 5 years in future)

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 8 Fermilab E871 Called HyperCP, ran in 1997 and particles per event 10 billion events written to tape tapes, 5 Gb apiece More than 100 Tb of data! Analysis recently completed, about 1 yr.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 9 Run II Collider Experiments CDF and D0—just starting to run now Expected data rate 1 Tb/day tracks per event Goal: To reconstruct events as fast as they come in.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 10 CDF Detector

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 11 Mass Storage System 1 Pb-capacity tape robot (ADIC) Mammoth tape drives, 11 Mb/sec Two tape drives per Linux PC Unix-like filespace to keep track of files Network-attached storage, can deliver up to 100 Mb/sec throughput.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 12 Mass Storage System

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 13 Reconstruction Farm Five farms currently installed 340 dual CPU nodes in all, MHZ 50 Gb disk each, 512 Mb RAM One I/O node, SGI Origin 2000, 1 Tb disk, 4 CPU’s, 2 x Gigabit Ethernet. Farms Batch System software to coordinate batch jobs

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 14 Farms I/O Node SGI O x 400 MHz 2 Gb Ethernet 1 Tb disk

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 15 Farm Workers MHz Dual PIII 50 Gb disk 512 Mb RAM

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 16 Farm Workers 2U dual PIII 750 MHz, 50 Gb disk. 1Gb RAM.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 17 Data mining and analysis facility SGI Origin 2000, 176 processors 5 Terabytes of disk and growing Used for repetitive analysis of small subsets of data Wouldn’t need the SMP but it is the easiest way to get a lot of processors near a lot of disk.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 18 CMS Project

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 19 CMS Project Scheduled to run in 2005 at CERN’s LHC (Geneva, Switzerland) Fermilab is managing US contribution. Every 40 ns, expect 25 collisions Each collision makes particles 1-10 petabytes of data has to be distributed around the world Will need at least of today’s fastest PC’s

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 20 Qualified Vendors We evaluate vendors on hardware reliability, competency in Linux, service quality, and price/performance. Vendors chosen for desktops and farm workers 13 companies submitted evaluation units, five chosen in each category

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 21 Fermi Linux Based on Red Hat Linux 6.1 (7.1 coming soon) Add a number of security fixes Follow all kernel and installer updates Updates sent out to ~1000 nodes by Autorpm Qualified vendors ship machines with it preloaded.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 22 ICABOD Vendor ships system with Linux OS loaded. Expect scripts: –Reinstall the system if necessary –Change root password, partition disks –Configure static IP address –Install kerberos and ssh keys

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 23 Burn-in All nodes go through 1 month burn-in test. Load both CPU (2 x Disk (Bonnie) Network test Monitor temperatures and current draw. Reject if more than 2% down time.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 24 Management tools

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 25 NGOP Monitor (Display)

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 26 NGOP Monitor (Display)

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 27 FBSNG Farms Batch System, Next Generation Allows parallel batch jobs which may be dependent on each other Abstract and flexible resource definition and management Dynamic configuration through API Web-based interface

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 28 Future plans Next level of integration—1 “pod” of six racks plus switch, console server, display. Linux on disk servers, for NFS/NIS “chaotic” analysis servers and compute farms to replace big SMP boxes Find NFS replacement (SAN?) Abandon tape altogether?