JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA www.jlab.org Sandy Philpott.

Slides:



Advertisements
Similar presentations
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Advertisements

PowerEdge T20 Customer Presentation. Product overview Customer benefits Use cases Summary PowerEdge T20 Overview 2 PowerEdge T20 mini tower server.
PowerEdge T20 Channel NDA presentation Dell Confidential – NDA Required.
Oracle Exadata for SAP.
Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance.
© 2010 VMware Inc. All rights reserved Confidential Performance Tuning for Windows Guest OS IT Pro Camp Presented by: Matthew Mitchell.
JLab Status & 2016 Planning April 2015 All Hands Meeting Chip Watson Jefferson Lab Outline Operations Status FY15 File System Upgrade 2016 Planning for.
IHEP Site Status Jingyan Shi, Computing Center, IHEP 2015 Spring HEPiX Workshop.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
1 Andrew Hanushevsky - HEPiX, October 6-8, 1999 Mass Storage For BaBar at SLAC Andrew Hanushevsky Stanford.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
HEPIX - Spring 2015 Tony Wong (BNL).  Yearly purchase cycle of hardware for RACF timed with U.S. gov’t fiscal year (October to September)  Aim for delivery.
Enterprise Storage Our Journey Thus Far John D. Halamka MD CIO, Harvard Medical School and Beth Israel Deaconess Medical Center.
Lustre at Dell Overview Jeffrey B. Layton, Ph.D. Dell HPC Solutions |
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Computing/Tier 3 Status at Panjab S. Gautam, V. Bhatnagar India-CMS Meeting, Sept 27-28, 2007 Delhi University, Delhi Centre of Advanced Study in Physics,
High Performance Computing G Burton – ICG – Oct12 – v1.1 1.
ASGC 1 ASGC Site Status 3D CERN. ASGC 2 Outlines Current activity Hardware and software specifications Configuration issues and experience.
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
Outline IT Organization SciComp Update CNI Update
UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008.
Nov 1, 2000Site report DESY1 DESY Site Report Wolfgang Friebel DESY Nov 1, 2000 HEPiX Fall
Business Intelligence Appliance Powerful pay as you grow BI solutions with Engineered Systems.
Computing and IT Update Jefferson Lab User Group Roy Whitney, CIO & CTO 10 June 2009.
NLIT May 26, 2010 Page 1 Computing Jefferson Lab Users Group Meeting 8 June 2010 Roy Whitney CIO & CTO.
Scientific Computing Experimental Physics Lattice QCD Sandy Philpott May 20, 2011 IT Internal Review 12GeV Readiness.
HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National.
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
INDIACMS-TIFR Tier 2 Grid Status Report I IndiaCMS Meeting, April 05-06, 2007.
Diamond Computing Status Update Nick Rees et al..
S&T IT Research Support 11 March, 2011 ITCC. Fast Facts Team of 4 positions 3 positions filled Focus on technical support of researchers Not “IT” for.
PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.
Experience with the Thumper Wei Yang Stanford Linear Accelerator Center May 27-28, 2008 US ATLAS Tier 2/3 workshop University of Michigan, Ann Arbor.
Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
May 25-26, 2006 LQCD Computing Review1 Jefferson Lab 2006 LQCD Analysis Cluster Chip Watson Jefferson Lab, High Performance Computing.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
CEA DSM Irfu IRFU site report. CEA DSM Irfu HEPiX Fall 0927/10/ Computing centers used by IRFU people IRFU local computing IRFU GRIF sub site Windows.
CASPUR Site Report Andrei Maslennikov Lead - Systems Amsterdam, May 2003.
PC clusters in KEK A.Manabe KEK(Japan). 22 May '01LSCC WS '012 PC clusters in KEK s Belle (in KEKB) PC clusters s Neutron Shielding Simulation cluster.
CASPUR Site Report Andrei Maslennikov Lead - Systems Rome, April 2006.
1 Cluster Development at Fermilab Don Holmgren All-Hands Meeting Jefferson Lab June 1-2, 2005.
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.
Evangelos Markatos and Charalampos Gkikas FORTH-ICS Athens, th Mar Institute of Computer Science - FORTH Christos.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
UTA Site Report Jae Yu UTA Site Report 7 th DOSAR Workshop Louisiana State University Apr. 2 – 3, 2009 Jae Yu Univ. of Texas, Arlington.
Office of Administration Enterprise Server Farm September 2008 Briefing.
U N C L A S S I F I E D LA-UR Leveraging VMware to implement Disaster Recovery at LANL Anil Karmel Technical Staff Member
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Multicore Applications in Physics and Biochemical Research Hristo Iliev Faculty of Physics Sofia University “St. Kliment Ohridski” 3 rd Balkan Conference.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Jefferson Lab Site Report Sandy Philpott HEPiX Fall 07 Genome Sequencing Center Washington University at St. Louis.
BeStMan/DFS support in VDT OSG Site Administrators workshop Indianapolis August Tanya Levshina Fermilab.
Compute and Storage For the Farm at Jlab
Experience of Lustre at QMUL
Mattias Wadenstein Hepix 2012 Fall Meeting , Beijing
Experience of Lustre at a Tier-2 site
Scientific Computing At Jefferson Lab
Jefferson Lab Scientific Computing Update
Presentation transcript:

JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott HEPiX Nebraska a– October 13, 2014

Updates Since our Annecy meeting… 12 GeV Accelerator status Computing –Haswells ordered –Continue core swaps for best-match load balancing Disk Storage -Newest storage servers -openZFS -New Lustre MDS system -Path to Lustre 2.5 Facilities update Looking ahead

12 GeV Accelerator Status

Computing Latest procurement – 104 Experimental Physics nodes dual Intel E5-2670v3 Haswell 12 core, 2.3 GHz, 32 GB DDR memory GlueX (Hall D) testing indicates these cores are 50% faster than the Sandy Bridge / Ivy Bridge cores already in the farm – measured by events/sec/core on an 18 core part also at 2.3 GHz; system scaled linearly through all cores, so no memory bandwidth bottlenecks at a mere 12 cores per CPU New trick helps USQCD, neutral for Experimental Physics: USQCD queue for 16 core nodes is always full; queue for 8 core nodes often sags, so give two 8 core nodes to experimental physics, take in return one 16 core node similar to our core exchange approach described in the spring talk, but now takes into account the type of load currently manual, soon (we hope) automatic

Disk Storage Currently Lustre 1 PB on 30 OSSs each with 30 * 1/2/3 TB disks, RAID6 –8.1 GB / sec aggregate bandwidth, 100 MB/s – 1 GB/s single stream ZFS servers 250 TB –Move to ZFS on Linux - retire 5 year old SunFire Thors, continue using our 2 year old Oracle 320 appliance New disk hardware: 4 dual Xeon E5-2630v2 CPUs, 30*4TB and 4*500GB SATA Enterprise disk drives, LSI I RAID Controller with backup, 2*QDR ConnectX3 ports –With RAID-Z, don’t need hardware RAID … JBOD …

Storage Evolution Lustre Upgrade and Partitioning New Dell MDS procured –2 R720s, E v2 2.1GHz 6C, 64 GB RDIMM, 2 * 500GB 7.2K SATA –PowerVault MD3200 6G SAS, dual 2G Cache Controller, 6 * 600GB 10K disk Upgrade from 1.8 to 2.5, partition by performance –Plan 2 pools: fastest/newest, and older/slower –Begin using striping, and all stripes will be fast (or all slow) –By the end of 2014, this will be in production, with “inactive” projects moved from the main partition into the older, slower partition, freeing up highest performance disk space for active projects –Use openZFS, rather than ext4 / ldiskfs ? Other sites’ Lustre migration plans and experience?

Facilities Update Computer Center Efficiency Upgrade and Consolidation Computer Center HVAC and power improvements in 2015 to allow consolidation of the Lab computer and data centers to assist in meeting DOE Computer Center power efficiency goals. Staged approach, to minimize downtime

Looking ahead Rest of 2014 … Increase the farm by ~ double as Halls come online. Upgrade Lustre; deploy 4 new OSSs (30 * 4 TB RAID6); move to ZFS on Linux. Begin using Puppet? Deploy workflow tool for farm. Continue to automate core sharing / load balancing. Hire a 3 rd SysAdmin – 2016 Computer Center Efficiency Upgrade and Consolidation Operate current HPC resources (minus oldest gaming cards): run the late Fall 2009 clusters through June 2015, and mid 2010 clusters through June longer than usual due to absence of hardware for Experimental Physics grows to match the size of LQCD, enabling efficient load balancing (with careful balance tracking) – 2017 JLab will be the deployment site for the first cluster of LQCD-ext II. This resource will be installed in the current location of the 9q / 10q clusters (same power and cooling, thus lower installation costs). Continue to grow physics farm to meet 12GeV computing requirements. Final configuration will use ~20,000 cores.