NLIT May 26, 2010 Page 1 Computing Jefferson Lab Users Group Meeting 8 June 2010 Roy Whitney CIO & CTO.

Slides:



Advertisements
Similar presentations
XenData SXL-5000 LTO Archive System Turnkey video archive system with near-line LTO capacities scaling from 210 TB to 1.18 PB, designed for the demanding.
Advertisements

Scale-out Central Store. Conventional Storage Verses Scale Out Clustered Storage Conventional Storage Scale Out Clustered Storage Faster……………………………………………….
JLab Status & 2016 Planning April 2015 All Hands Meeting Chip Watson Jefferson Lab Outline Operations Status FY15 File System Upgrade 2016 Planning for.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
Computing Resources Joachim Wagner Overview CNGL Cluster MT Group Cluster School Cluster Desktop PCs.
12 GeV Era Computing (CNI) Andy Kowalski May 20, 2011.
Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.
Joey Snow Technical Evanglist Microsoft Corporation SESSION CODE: WSV310.
VMware vCenter Server Module 4.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Technical Support Windows Server Support Operations Backup & Recovery Services.
Scientific Computing at Jefferson Lab Petabytes, Petaflops and GPUs Chip Watson Scientific Computing Group Jefferson Lab Presented at CLAS12 Workshop,
May 2010 Graham Heyes Data Acquisition and Analysis group. Physics division, JLab Data Analysis Coordination, Planning and Funding.

UCL Site Report Ben Waugh HepSysMan, 22 May 2007.
IT in the 12 GeV Era Roy Whitney, CIO May 31, 2013 Jefferson Lab User Group Annual Meeting.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
Outline IT Organization SciComp Update CNI Update
Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS.
30-Jun-04UCL HEP Computing Status June UCL HEP Computing Status April DESKTOPS LAPTOPS BATCH PROCESSING DEDICATED SYSTEMS GRID MAIL WEB WTS.
1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008.
Computing and IT Update Jefferson Lab User Group Roy Whitney, CIO & CTO 10 June 2009.
Scientific Computing Experimental Physics Lattice QCD Sandy Philpott May 20, 2011 IT Internal Review 12GeV Readiness.
HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National.
From Virtualization Management to Private Cloud with SCVMM 2012 Dan Stolts Sr. IT Pro Evangelist Microsoft Corporation
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility HEPiX – Fall, 2005.
Workshop KEK - CC-IN2P3 KEK new Grid system 27 – 29 Oct. CC-IN2P3, Lyon, France Day2 14: :55 (40min) Koichi Murakami, KEK/CRC.
PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.
JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.
Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
Workshop on Computing for Neutrino Experiments - Summary April 24, 2009 Lee Lueking, Heidi Schellman NOvA Collaboration Meeting.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
May 25-26, 2006 LQCD Computing Review1 Jefferson Lab 2006 LQCD Analysis Cluster Chip Watson Jefferson Lab, High Performance Computing.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility (formerly CEBAF - The Continuous Electron Beam Accelerator Facility)
BOINC: Progress and Plans David P. Anderson Space Sciences Lab University of California, Berkeley BOINC:FAST August 2013.
The DCS lab. Computer infrastructure Peter Chochula.
CASPUR Site Report Andrei Maslennikov Lead - Systems Rome, April 2006.
1 Cluster Development at Fermilab Don Holmgren All-Hands Meeting Jefferson Lab June 1-2, 2005.
Virtualization Supplemental Material beyond the textbook.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Jefferson Lab Site Report Sandy Philpott HEPiX Fall 07 Genome Sequencing Center Washington University at St. Louis.
Compute and Storage For the Farm at Jlab
LQCD Computing Project Overview
Computational Requirements
JLab Auger Auger is the interface to JLab’s data analysis cluster (“the farm”) Controls batch job submissions Manages input/output from jobs Provides details.
LQCD Computing Operations
TYPES OF SERVER. TYPES OF SERVER What is a server.
Computing Infrastructure for DAQ, DM and SC
Scientific Computing At Jefferson Lab
Outline IT Division News Other News
Lee Lueking D0RACE January 17, 2002
Data Management Components for a Research Data Archive
Jefferson Lab Scientific Computing Update
Presentation transcript:

NLIT May 26, 2010 Page 1 Computing Jefferson Lab Users Group Meeting 8 June 2010 Roy Whitney CIO & CTO

NLIT May 26, 2010 Page 2 Scientific Computing at JLab Significant computing capacity for experiments and theory –1400 compute servers –leading edge GPU cluster (single most powerful dedicated LQCD resource) –growing repository of 6 GeV data –rapidly growing disk capacity and bandwidth, ~ 600 TB disk, half on a 20 Gb/s Infiniband fabric –exploiting & developing leading edge open source tools Lustre file system, Auger / PBS / Maui batch system, Jasmine storage Software developments –multi-threading and multi-GPU libraries for high performance computing –LQCD algorithm development, performance optimizations –highly scalable analysis database for LQCD –system management tools Planning for 12 GeV

NLIT May 26, 2010 Page 3 It is past time! Just do it! Over 50% of the JLab 64 bit farm is idle! 32 bit Farm 64 bit Farm Migrate to 64 bit

NLIT May 26, 2010 Page 4 Scientific Computing – Data Analysis Batch compute node farm –Auger software system, Maui/PBS –480 (60 x 8) 64-bit cores; 3GB RAM/core; GigE mostly Intel Nehalems, hyperthreaded A few AMD systems –240 (120 x 2) 32-bit cores; 0.5GB RAM/core; 100mbit Half Intel Xeons, hyperthreaded Half Pentium Ds Migrate completely to 64-bit computing! –Decommission oldest 32-bit nodes by this fiscal year end: 30 September 2010 –Decommission ALL 32-bit nodes by end of calendar year –Run 64-bit native or 32-bit emulation modes

NLIT May 26, 2010 Page 5 Scientific Computing – Mass Storage IBM TS3500 Tape Library –8 LTO-4 tape drives; 800GB uncompressed –4 LTO-5 tape drives; 1500GB uncompressed –5500 media slots –Over 4PB stored currently Still plan to eject oldest unused media All MSS data still available on tape, through copies to current media at each major technology leap

NLIT May 26, 2010 Page 6 NP analysis planning and coordination Physics Division offline analysis is advancing input to scientific computing requirements (contact Graham Heyes): –Gathering requirements estimates from workgroups including comparisons with historical trends and analysis benchmarks –Enhancing communication Regular meetings of offline coordinators at JLab plus workshops, e.g. CLAS 12 Monitoring of PAC submissions and soon to run experiments New web presence at data.jlab.org for cross-hall efforts –Plans are under development to assure the long-term viability of the data. –The current computing plan will lead the Lab into the 12 GeV era with computing resources capable of supporting the simulation, calibration, reconstruction and analysis needs in the 12 GeV era.

NLIT May 26, 2010 Page 7 Science per Dollar for (some) LQCD Capacity Applications Mflops / $ QCDSP A 300 node cluster, optimized to a limited number of science problems, yields a very cost effective platform. GPUs beat that by a factor of ten! Vector Supercomputers, including the Japanese Earth Simulator USQCD Clusters QCDOC GPUs are a highly cost effective technology Japanese Earth Simulator 2006 BlueGene/L 2007 BlueGene/P GPU cluster ?

NLIT May 26, 2010 Page 8 Helpdesk –Open M-F, 8am – 4:30pm Desktops – upgrades in process –Windows 7 is the default and suggested version –RedHat EL 5.X is the default and suggested version Wireless Networks –jlab – for anyone with a CUE account WPA2 protection Supports both JLab managed and personal computers –jlab_guest – for anyone without a CUE account WEP (key available a front desk or help desk) Personal computers –jlab_secure to be retired this year –All require computer registration –Coverage has been expanded in all major buildings Computing Support Activities

NLIT May 26, 2010 Page 9 Computing Support Activities Telecommunications Old technology, capacity issues and 12GeV construction –Moving to VoIP Pilot in FY10 Use existing data network Deploy with new construction Cyber Security –Certification & Accreditation in FY10 for new Authority To Operate –Recently performed penetration testing and evaluation of mitigations and controls No vulnerabilities found on centrally managed machines All vulnerabilities found and exploited were on user managed machines (guest and level-2) –Patches not up to date, poor configurations Working on CUE support for Macs

NLIT May 26, 2010 Page 10 Computing Support Activities Going Green –Virtual Machines 140 virtual machines on 5 physical machines –Desktop Power Management 750 desktops being managed –Combined, these two actives reduce greenhouse gas emissions by 753,000 lbs per year and save enough energy to power 60 average U.S. homes each year Evaluating Google Cloud –Calendar and