JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.

Slides:



Advertisements
Similar presentations
10-Feb-00 CERN Building a Regional Centre A few ideas & a personal view CHEP 2000 – Padova 10 February 2000 Les Robertson CERN/IT.
Advertisements

Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.
LHC experimental data: From today’s Data Challenges to the promise of tomorrow B. Panzer – CERN/IT, F. Rademakers – CERN/EP, P. Vande Vyvre - CERN/EP Academic.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
1 Andrew Hanushevsky - HEPiX, October 6-8, 1999 Mass Storage For BaBar at SLAC Andrew Hanushevsky Stanford.
PetaByte Storage Facility at RHIC Razvan Popescu - Brookhaven National Laboratory.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
The Mass Storage System at JLAB - Today and Tomorrow Andy Kowalski.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
The SLAC Cluster Chuck Boeheim Assistant Director, SLAC Computing Services.
Outline IT Organization SciComp Update CNI Update
Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
Grid Computing Chip Watson Jefferson Lab Hall B Collaboration Meeting 1-Nov-2001.
Hosted by Designing a Backup Architecture That Actually Works W. Curtis Preston President/CEO The Storage Group.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Snapshot of the D0 Computing and Operations Planning Process Amber Boehnlein For the D0 Computing Planning Board.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
Scientific Computing Experimental Physics Lattice QCD Sandy Philpott May 20, 2011 IT Internal Review 12GeV Readiness.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Jan. 17, 2002DØRAM Proposal DØRACE Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Remote Analysis Station ArchitectureRemote.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
Spending Plans and Schedule Jae Yu July 26, 2002.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
Large Scale Parallel File System and Cluster Management ICT, CAS.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility (formerly CEBAF - The Continuous Electron Beam Accelerator Facility)
US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National Laboratory.
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
Sep 02 IPP Canada Remote Computing Plans Pekka K. Sinervo Department of Physics University of Toronto 4 Sep IPP Overview 2 Local Computing 3 Network.
S.Jarp CERN openlab CERN openlab Total Cost of Ownership 11 November 2003 Sverre Jarp.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
IDE disk servers at CERN Helge Meinhard / CERN-IT CERN OpenLab workshop 17 March 2003.
RAL Site Report John Gordon HEPiX/HEPNT Catania 17th April 2002.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Tier 1 at Brookhaven (US / ATLAS) Bruce G. Gibbard LCG Workshop CERN March 2004.
MC Production in Canada Pierre Savard University of Toronto and TRIUMF IFC Meeting October 2003.
US ATLAS Tier 1 Facility Rich Baker Deputy Director US ATLAS Computing Facilities October 26, 2000.
Feb. 13, 2002DØRAM Proposal DØCPB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Partial Workshop ResultsPartial.
Batch Software at JLAB Ian Bird Jefferson Lab CHEP February, 2000.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
Hans Wenzel CDF CAF meeting October 18 th -19 th CMS Computing at FNAL Hans Wenzel Fermilab  Introduction  CMS: What's on the floor, How we got.
W.A.Wojcik/CCIN2P3, HEPiX at SLAC, Oct CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center URL:
PADME Kick-Off Meeting – LNF, April 20-21, DAQ Data Rate - Preliminary estimate Tentative setup: all channels read with Fast ADC 1024 samples, 12.
26. Juni 2003Bernd Panzer-Steindel, CERN/IT1 LHC Computing re-costing for for the CERN T0/T1 center.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Hall D Computing Facilities Ian Bird 16 March 2001.
PC Farms & Central Data Recording
UK GridPP Tier-1/A Centre at CLRC
LHC Computing re-costing for
OffLine Physics Computing
Lee Lueking D0RACE January 17, 2002
Proposal for a DØ Remote Analysis Model (DØRAM)
Presentation transcript:

JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001

2 Reconstruction & Analysis Farm 350 Linux CPU ~10 K SPECint95 Batch system: LSF + local Java layer + web interface Reconstruction & Analysis Farm 350 Linux CPU ~10 K SPECint95 Batch system: LSF + local Java layer + web interface Lattice QCD cluster(s) 40 Alpha Linux 256 P4 Linux (~Mar 02) – 0.5 Tflop Batch system: PBS + Web portal Lattice QCD cluster(s) 40 Alpha Linux 256 P4 Linux (~Mar 02) – 0.5 Tflop Batch system: PBS + Web portal clients Jefferson Lab Mass Storage & Farms August TB unmanaged disk pools DM1 DM10 Tape storage system slot STK silos 8 Redwood, , drives 10 (Solaris, Linux) Data movers with ~ 300 GB buffer each Gigabit Ethernet or Fiberchannel Software – JASMine 15 TB Experiment cache pools 2 TB Farm cache 0.5 TB LQCD cache pool JASMine managed mass storage sub-systems

3 Tape storage Current –2 STK silos (12,000 tape slots) –28 drives 8 Redwood, , Redwoods to be replaced by 10 more 9940 FY are MB/s Outlook –(Conservative?) Tape roadmap has > 500 GB tapes by FY06 at speeds of >= 60 MB/s –FNAL model (expensive ADIC robots + lots of commodity drives) does not work – they are moving to STK ’s

4 Disk storage Current –~ 30 TB of disk Mix of SCSI and IDE disk on Linux servers: –~ 1 TB per dual CPU with Gigabit interface – matches load, I/O, and network throughput Costs for IDE - $10K / TB, performance as good as SCSI Outlook –This model scales by a small factor (10 ? but not 100?) –Need a reliable global filesystem (not NFS) –Tape costs will remain ~ factor 5 cheaper than disk for some time Fully populated silo with 10 drives today ~ $2K/TB, disk ~$10K/TB –Investigations in hand to consider large disk farms to replace tape Issues are power, heat, manageability, error rates Consider –Compute more, store less Store metadata, re-compute data as needed rather than storing and moving it; computing is (and will become more and more) cheaper than storage Good for eg. Monte Carlo – generate as needed on modest sized (but very powerful) farms

5 ClustersClusters Current –Farm, 350 Linux cpu, Latest: 2 dual 1 GHz systems in 1u box (i.e. 4 cpu) Expect modest expansion over next few years (up to 500 cpu?) –LQCD ~ 40 Alpha now, 256 P4 in FY02, growth to 500 – 1000 cpu in 5 years (goal is 10 TFlop) –We know how to manage systems of this complexity with relatively few people Outlook –Moore’s law (still works) – expect raw cpu to remain cheap –Issues will become power and cooling –Several “server blade” systems being developed using Transmeta (low power) chips – 3u rack backplane with 10 dual systems slotted in – prospect of even denser compute farms MC farm on your desk? – generate on demand

6 First purchases, 9 duals per 24” rackLast summer, 16 duals (2u) GB cache (8u) per 19” rack Recently, 5 TB IDE cache disk (5 x 8u) per 19” Intel Linux Farm

7 LQCD Clusters 16 single Alpha 21264, dual Alpha (Linux Networks), 2000

8 NetworksNetworks Current –Machine room & campus backbone is all Gigabit Ethernet 100 Mbit to desktops –Expect affordable 10 Gb in 1-2 years –WAN (ESnet) is OC-3 (155 Mb/s) Outlook –Less clear – expect at least 10 Gb and probably another generation (100 Gb?) by Hall D –Expect ESnet to be >= OC-12 (622 Mb/s) –Would like WAN speeds to be comparable to LAN for successful distributed (grid) computing models –We are involved in ESnet/Internet 2 task force to ensure bandwidth is sufficient on LHC (= Hall D) timescales

9 FacilitiesFacilities Current –Computer Center is close to full – esp. with LQCD cluster New Building –Approved (CD-0) to start design in FY03 –Expect construction FY04, occupation FY05? –Extension to Cebaf Center, will include: 10,000 ft 2 machine room (current is < 3000 & full) –Will leave 2 silos in place, but move other equipment –Designed to be extensible if needed –Need this space to allow growth and sufficient cooling (there is now factor 2-5 gap between computing power densities and cooling abilities…) –Building will provide also provide space for ~ people

10 SoftwareSoftware Mass storage software –JASMine – written at JLAB, designed with Hall-D data rates in mind Fully distributed & scalable – 100 MB/s today, limited only by number and speed of drives Will be part of JLAB Grid software – cache manager component works remotely, –Demo system JLAB-FSU under construction Batch software –Farm : LSF with a Java layer –LQCD: PBS with a web portal –Merge these technologies, provide grid portal access to compute and storage resources: Built on Condor-G, Globus, SRB, JLAB web-services as part of PPDG collaboration

11 SummarySummary Technology and facilities outlook is good The Hall D computing goals will be readily achievable Actual facilities design and ramp-up must be driven by a well founded Hall D computing model –The computing model should be based on a distributed system –Make use of appropriate technologies The design of the computing model needs to be started now!