Download presentation
Presentation is loading. Please wait.
Published byIlene Perry Modified over 9 years ago
1
1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008
2
2 Users and JLab IT Ed Brash is User Group Board of Directors’ representative on the IT Steering Committee. Physics Computing Committee (Sandy Philpott) Helpdesk and CCPR requests and activities Challenges –Constrained budget Staffing Aging infrastructure –Cyber Security
3
3 Computing and Networking Infrastructure Andy Kowalski
4
4 CNI Outline Helpdesk Computing Wide Area Network Cyber Security Networking and Asset Management
5
5 Helpdesk Hour 8am-12pm M-F –Submit a CCPR via http://cc.jlab.org/http://cc.jlab.org/ –Dial x7155 –Send email to helpdesk@jlab.orghelpdesk@jlab.org Windows XP, Vista and RHEL5 Supported Desktops –Migrating older desktops Mac Support?
6
6 Computing Email Servers Upgraded –Dovecot IMAP Server (Indexing) –New File Server and IMAP Servers (Farm Nodes) Servers Migrating to Virtual Machines Printing –Centralized Access via jlabprt.jlab.org –Accounting Coming Soon Video Conferencing (working on EVO)
7
7 Wide Area Network Bandwidth –10Gbps WAN and LAN backbone –Offsite Data Transfer Servers scigw.jlab.org(bbftp) qcdgw.jlab.org(bbcp)
8
8 Cyber Security Challenge The threat: sophistication and volume of attacks continue to increase. –Phishing Attacks Spear Phishing/Whaling are now being observed at JLab. Federal, including DOE, requirements to meet the cyber security challenges require additional measures. JLab uses a risk based approach that incorporates achieving the mission while at the same time dealing with the threat.
9
9 Cyber Security Managed Desktops –Skype Allowed From Managed Desktops On Certain Enclaves Network Scanning Intrusion Detection PII/SUI (CUI) Management
10
10 Networking and IT Asset Management Network Segmentation/Enclaves –Firewalls Computer Registration –https://reggie.jlab.org/user/index.phphttps://reggie.jlab.org/user/index.php Managing IP Addresses –DHCP Assigns all IP addresses (most static) Integrated with registration Automatic Port Configuration –Rolling out now –Uses registration database
11
11 Scientific Computing Chip Watson & Sandy Philpott
12
12 SciComp Outline Upgrading the farm Expanding disk cache and /work Migrating to a new tape library (silo) Planning for 12 GeV LQCD
13
13 Farm Evolution Motivation Capacity upgrades –Re-use of HPC clusters Movement to Open Source –O/S upgrade –Change from LSF to PBS
14
14 Farm Evolution Timetable Nov 07: Auger/PBS available – RHEL3 - 35 nodes Jan 08: Fedora 8 (F8) available – 50 nodes May 08: Friendly-user mode; IFARML4,5 Jun 08: Production –F8 only; IFARML3 + 60 nodes from LSF IFARML alias Jul 08: IFARML2 + 60 nodes from LSF Aug 08: IFARML1 + 60 nodes from LSF Sep 08: RHEL3/LSF->F8/PBS Migration complete –No renewal of LSF or RHEL for cluster nodes
15
15 Farm F8/PBS Differences Code must be recompiled –2.6 kernel –gcc 4 Software installed locally via yum –cernlib –Mysql Time limits: 1 day default, 3 days max stdout/stderr to ~/farm_out Email notification
16
16 Farm Future Plans Additional nodes –from HPC clusters CY08: ~120 4g nodes CY09-10: ~60 6n nodes –Purchase as budgets allow Support for 64 bit systems when feasible & needed
17
17 Storage Evolution Deployment of Sun x4500 “thumpers” Decommissioning of Panasas (old /work server) Planned replacement of old cache nodes
18
18 Tape Library Current STK “Powderhorn” silo is nearing end-of-life –Reaching capacity & running out of blank tapes –Doesn’t support upgrade to higher density cartridges –Is officially end-of-life December 2010 Market trends –LTO (Linear Tape Open) Standard has proliferated since 2000 –LTO-4 is 4x density, capacity/$, and bandwidth of 9940b: 800 GB/tape, $100/TB, 120 MB/s –LTO-5, out next year, will double capacity, 1.5x bandwidth: 1600 GB/tape, 180 MB/s –LTO-6 will be out prior to the 12 GeV era 3200 GB/tape, 270 MB/s
19
19 Tape Library Replacement Competitive procurement now in progress –Replace old system, support 10x growth over 5 years Phase 1 in August –System integration, software evolution –Begin data transfers, re-use 9940b tapes Tape swap through January 2 PB capacity by November DAQ to LTO-4 in January 2009 Old silo gone in March 2009 End result: breakeven on cost by the end of 2009!
20
20 Long Term Planning Continue to increase compute & storage capacity in most cost effective manner Improve processes & planning –PAC submission process –12 GeV Planning…
21
E.g.: Hall B Requirements Event Simulation20122013201420152016 SPECint_rate2006 sec/event1.8 Number of events1.00E+12 Event size (KB)20 % Stored Long Term10%25% Total CPU ( SPECint_rate2006 )5.7E+04 Petabytes / year (PB)25555 Data Acquisition Average event size (KB)20 Max sustained event rate (kHz)0010 20 Average event rate (kHz)0010 Average 24-hour duty factor (%)0% 50%60%65% Weeks of operation / year00030 Network (n*10gigE)11111 Petabytes / year0.0 2.22.4 1 st Pass Analysis20122013201420152016 SPECint_rate2006 sec/event 1.5 Number of analysis passes001.5 Event size out / event size in22222 Total CPU ( SPECint_rate2006 )0.0E+00 7.8E-038.4E-03 Silo Bandwidth (MB/s)00900 1800 Petabytes / year0.0 4.44.7 Total SPECint_rate20065.7E+04 SPECint_rate2006 / node600900135020253038 # nodes needed (current year)9563422819 Petabytes / year25512
22
22 LQCD Computing JLab operates 3 clusters with nearly 1100 nodes, primarily for LQCD plus some accelerator modeling National LQCD Computing Project (2006-2009: BNL, FNAL, JLab; USQCD Collaboration) LQCD II proposal 2010-2014 would double the hardware budget to enable key calculations JLab Experimental Physics & LQCD computing share staff (operations & software development) & tape silo, providing efficiencies for both
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.