JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.

JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001

Ian.Bird@jlab.org 2 Reconstruction & Analysis Farm 350 Linux CPU ~10 K SPECint95 Batch system: LSF + local Java layer + web interface Reconstruction & Analysis Farm 350 Linux CPU ~10 K SPECint95 Batch system: LSF + local Java layer + web interface Lattice QCD cluster(s) 40 Alpha Linux 256 P4 Linux (~Mar 02) – 0.5 Tflop Batch system: PBS + Web portal Lattice QCD cluster(s) 40 Alpha Linux 256 P4 Linux (~Mar 02) – 0.5 Tflop Batch system: PBS + Web portal clients Jefferson Lab Mass Storage & Farms August 2001 10 TB unmanaged disk pools DM1 DM10 Tape storage system 12000 slot STK silos 8 Redwood, 10 9940, 10 9840 drives 10 (Solaris, Linux) Data movers with ~ 300 GB buffer each Gigabit Ethernet or Fiberchannel Software – JASMine 15 TB Experiment cache pools 2 TB Farm cache 0.5 TB LQCD cache pool JASMine managed mass storage sub-systems

Ian.Bird@jlab.org 3 Tape storage Current –2 STK silos (12,000 tape slots) –28 drives 8 Redwood, 10 9840, 10 9940 Redwoods to be replaced by 10 more 9940 FY02 9940 are 60 GB @ 10 MB/s Outlook –(Conservative?) Tape roadmap has > 500 GB tapes by FY06 at speeds of >= 60 MB/s –FNAL model (expensive ADIC robots + lots of commodity drives) does not work – they are moving to STK + 9940’s

Ian.Bird@jlab.org 4 Disk storage Current –~ 30 TB of disk Mix of SCSI and IDE disk on Linux servers: –~ 1 TB per dual CPU with Gigabit interface – matches load, I/O, and network throughput Costs for IDE - $10K / TB, performance as good as SCSI Outlook –This model scales by a small factor (10 ? but not 100?) –Need a reliable global filesystem (not NFS) –Tape costs will remain ~ factor 5 cheaper than disk for some time Fully populated silo with 10 drives today ~ $2K/TB, disk ~$10K/TB –Investigations in hand to consider large disk farms to replace tape Issues are power, heat, manageability, error rates Consider –Compute more, store less Store metadata, re-compute data as needed rather than storing and moving it; computing is (and will become more and more) cheaper than storage Good for eg. Monte Carlo – generate as needed on modest sized (but very powerful) farms

Ian.Bird@jlab.org 5 ClustersClusters Current –Farm, 350 Linux cpu, Latest: 2 dual 1 GHz systems in 1u box (i.e. 4 cpu) Expect modest expansion over next few years (up to 500 cpu?) –LQCD ~ 40 Alpha now, 256 P4 in FY02, growth to 500 – 1000 cpu in 5 years (goal is 10 TFlop) –We know how to manage systems of this complexity with relatively few people Outlook –Moore’s law (still works) – expect raw cpu to remain cheap –Issues will become power and cooling –Several “server blade” systems being developed using Transmeta (low power) chips – 3u rack backplane with 10 dual systems slotted in – prospect of even denser compute farms MC farm on your desk? – generate on demand

Ian.Bird@jlab.org 6 First purchases, 9 duals per 24” rackLast summer, 16 duals (2u) + 500 GB cache (8u) per 19” rack Recently, 5 TB IDE cache disk (5 x 8u) per 19” Intel Linux Farm

Ian.Bird@jlab.org 7 LQCD Clusters 16 single Alpha 21264, 1999 12 dual Alpha (Linux Networks), 2000

Ian.Bird@jlab.org 8 NetworksNetworks Current –Machine room & campus backbone is all Gigabit Ethernet 100 Mbit to desktops –Expect affordable 10 Gb in 1-2 years –WAN (ESnet) is OC-3 (155 Mb/s) Outlook –Less clear – expect at least 10 Gb and probably another generation (100 Gb?) by Hall D –Expect ESnet to be >= OC-12 (622 Mb/s) –Would like WAN speeds to be comparable to LAN for successful distributed (grid) computing models –We are involved in ESnet/Internet 2 task force to ensure bandwidth is sufficient on LHC (= Hall D) timescales

Ian.Bird@jlab.org 9 FacilitiesFacilities Current –Computer Center is close to full – esp. with LQCD cluster New Building –Approved (CD-0) to start design in FY03 –Expect construction FY04, occupation FY05? –Extension to Cebaf Center, will include: 10,000 ft 2 machine room (current is < 3000 & full) –Will leave 2 silos in place, but move other equipment –Designed to be extensible if needed –Need this space to allow growth and sufficient cooling (there is now factor 2-5 gap between computing power densities and cooling abilities…) –Building will provide also provide space for ~ 150-200 people

Ian.Bird@jlab.org 10 SoftwareSoftware Mass storage software –JASMine – written at JLAB, designed with Hall-D data rates in mind Fully distributed & scalable – 100 MB/s today, limited only by number and speed of drives Will be part of JLAB Grid software – cache manager component works remotely, –Demo system JLAB-FSU under construction Batch software –Farm : LSF with a Java layer –LQCD: PBS with a web portal –Merge these technologies, provide grid portal access to compute and storage resources: Built on Condor-G, Globus, SRB, JLAB web-services as part of PPDG collaboration

Ian.Bird@jlab.org 11 SummarySummary Technology and facilities outlook is good The Hall D computing goals will be readily achievable Actual facilities design and ramp-up must be driven by a well founded Hall D computing model –The computing model should be based on a distributed system –Make use of appropriate technologies The design of the computing model needs to be started now!

JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.

Similar presentations

Presentation on theme: "JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.

Similar presentations

Presentation on theme: "JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001."— Presentation transcript:

Similar presentations

About project

Feedback