State of HCC 2012 Dr. David R. Swanson Director, Holland Computing Center
Nature Communications, July 17, 2012 Nebraska Supercomputing Symposium 2012
HCC CPU Hour Usage 2012 Nebraska Supercomputing Symposium 2012 Zeng (Quant Chem) 4.5M Starace (AMO Phys) 2.7M Rowe (Climate) 2.0M NanoScience 6.4M Comp Bio 3.0M Comp Sci 1.7M Physics 0.7M Mech E 0.4M
High Performance Computing Xiao Zeng, Chemistry, UNL (prior slide) DFT and Car Parrinello MD HPC – tightly coupled codes Requires expensive low-latency local network (infiniband) Requires high-performance storage (Panasas, Lustre) Requires highly reliable hardware Nebraska Supercomputing Symposium 2012
Eureka! A Higgs! (or at least something currently indistinguishable) "I think we have it. We have discovered a particle that is consistent with a Higgs boson." –CERN Director-General Rolf Heuer Nebraska Supercomputing Symposium 2012
US CMS Tier2 Computing Nebraska Supercomputing Symposium 2012
Compact Muon Solenoid (CMS) 5.5 mi Large Hadron Collider Nebraska Supercomputing Symposium 2012
CMS Grid Computing Model Nebraska Supercomputing Symposium 2012
Eureka! A Higgs! (or at least something currently indistinguishable) Ca. 50 PB of CMS data in entirety Over 1 PB currently at HCC’s “Tier2”, 3500 cores Collaboration at many scales –HCC and Physics Department –Over 2700 scientists worldwide –International Grid Computing Infrastructure –Data grid as well –UNL closely linked to KU, KSU physicists via a jointly hosted “Tier3” Nebraska Supercomputing Symposium 2012
Data Intensive HTC Huge database Requires expensive high-bandwidth wide area network (dwdm fiber) Requires high-capacity storage (HDFS, dCache) HTC – loosely coupled codes Requires hardware Nebraska Supercomputing Symposium 2012
Outline HCC Overview New User report HCC-Go Moving Forward (after break) –Next purchase –It’s the Data, stupid… –Other Issues Nebraska Supercomputing Symposium 2012
Outline New User report HCC-Go Moving Forward (next section) –Next purchase (motivation) –New Communities –PIVOT –It’s the Data, stupid… Nebraska Supercomputing Symposium 2012
HOLLAND COMPUTING CENTER OVERVIEW Nebraska Supercomputing Symposium 2012
NU Holland Computing Center has a University-wide mission to –Facilitate and perform computational and data intensive research –Engage and train NU researchers, students, and other state communities –This includes you! –HCC would be delighted to collaborate Nebraska Supercomputing Symposium 2012
Computational Science – 3 rd Pillar Experiment Theory Computation/Data Nebraska Supercomputing Symposium 2012
Lincoln Resources 10 staff Red Sandhills 5,000 compute cores 3 PetaBytes storage in HDFS Nebraska Supercomputing Symposium 2012
Sandhills “Condominium Cluster” 44 nodes X 32-core, 128 GB, IB Lustre (175 TB) Priority Access –$HW + $50/month –4 groups currently SLURM Nebraska Supercomputing Symposium 2012
Omaha Resources 3 Staff Firefly Tusker 10,000 compute cores 500 TB storage New offices soon: 158J PKI Nebraska Supercomputing Symposium 2012
Tusker 106*64= 6784 cores 256 GB/node 2 nodes w/ 512 GB 360 TB Lustre –100 TB more en route QDR IB 43 TFlop Nebraska Supercomputing Symposium 2012
Tusker ¼ footprint of Firefly ¼ the power 2X the TFLOPS 2X the storage Fully utilized Maui/Torque Nebraska Supercomputing Symposium 2012
In between … HCC (UNL) to Internet2: 10 gbps HCC (Schorr) to HCC (PKI): 20 gbps Allows us to do some interesting things –“overflow” jobs to/from Red –DYNES project –Xrootd mechanism Nebraska Supercomputing Symposium 2012
HCC Staff HPC Applications Specialists –Dr. Adam Caprez –Dr. Ashu Guru –Dr. Jun Wang –Dr. Nicholas Palermo System Administrators –Dr. Carl Lundstedt –Garhan Attebury –Tom Harvill –John Thiltges –Josh Samuelson –Dr. Brad Hurst Nebraska Supercomputing Symposium 2012
HCC Staff Other Staff –Dr. Brian Bockelman –Joyce Young GRAs –Derek Weitzel –Chen He –Kartik Vedalaveni –Zhe Zhang Undergraduates –Carson Crawford –Kirk Miller –Avi Knecht –Phil Brown –Slav Ketsman –Nicholas Nachtigal –Charles Cihacek Nebraska Supercomputing Symposium 2012
HCC Campus Grid Holland Computing Center resources are combined into an HTC campus grid –10,000 cores, 500 TB in Omaha –5,000 cores, 3 PB in Lincoln –All tied together via a single submission protocol using OSG software stack –Straightforward to expand to OSG sites across the country, as well as to EC2 (cloud) –HPC jobs get priority; HTC ensures high utilization Nebraska Supercomputing Symposium 2012
HCC Model for a Campus Grid Me, my friends and everyone else Grid Campus Local 25 Nebraska Supercomputing Symposium 2012
HCC & Open Science Grid National, distributed computing partnership for data- intensive research –Opportunistic computing –Over 100,000 cores –Supports the LHC experiments, other science –Funded for 5 more years –Over 100 sites in the Americas –Ongoing support for 2.5 (+3) FTE at HCC Nebraska Supercomputing Symposium 2012
It Works! Nebraska Supercomputing Symposium 2012
HCC Networking Monitoring Nebraska Supercomputing Symposium 2012
OSG Resources Nebraska Supercomputing Symposium 2012
Working philosophy Use what we buy –These pieces of infrastructure are linked, but improve asynchronously –Depreciation is immediate –Leasing is still more expensive (for now) –Buying at fixed intervals mitigates risk, increases ROI –Space, Power and Cooling have a longer life span Share what we aren’t using –Share opportunistically – retain local ownership –Consume opportunistically – there is more to gain! –Collaborators, not just consumers –Greater good vs. squandered opportunity Nebraska Supercomputing Symposium 2012
Working philosophy A Data deluge is upon us Support is essential –If you only build it, they still may not come –Build incrementally and buy time for user training –Support can grow more gradually than hardware Links to national and regional infrastructure are critical –Open Source Community –GPN access to Internet2 –Access to OSG, XSEDE resources –Collaborations with fellow OSG experts –LHC Nebraska Supercomputing Symposium 2012
HCC New Users FY UNL- City UNL- East UNOUNMC Outside NU system (74)33 (10)75 (19)30 (17)112 (26) (95)50 (17)105 (30)35 (5)130 (18) Nebraska Supercomputing Symposium 2012
New User Communities Theatre, Fine Arts/Digital Media, Architecture Psychology, Finance UNMC Puerto Rico PIVOT collaborators Nebraska Supercomputing Symposium 2012
HCC NEW USER REPORT: HEATH ROEHR Nebraska Supercomputing Symposium 2012
HCC-GO : DR. ASHU GURU Nebraska Supercomputing Symposium 2012
MOVING FORWARD Nebraska Supercomputing Symposium 2012
NEW PURCHASE Nebraska Supercomputing Symposium 2012
$2M for … More computing –need ca. 100 TF to hit Top500 for Jun 2013 –Likely use all of funds to hit that amount More storage –Near-line archive (9 PB) –HDFS Specialty hardware –GPGPU/Viz –Mic hardware Nebraska Supercomputing Symposium 2012
More computing How much RAM/core? Currently almost always oversubscribed Large scale jobs almost impossible (> 2000 core) Safest investment – will use right away Firefly due to be retired soon – EOL Nebraska Supercomputing Symposium 2012
More computing Nebraska Supercomputing Symposium 2012
More Computing Nebraska Supercomputing Symposium 2012
More storage Most rapidly growing demand Growing contention, can’t just queue up Largest unmet need (?) Nebraska Supercomputing Symposium 2012
Storage for $2M $2M HDFS cluster –250 nodes –4000 cores (Intel) –9.0 PB (RAW) –128 GB / node Nebraska Supercomputing Symposium 2012
Other options GPGPUs most Green option for computing Highest upside for raw power (Top500) Mic even compatible with x86 codes SMP uniquely meets some needs, easiest to use/program Bluegene, Tape silo, … Nebraska Supercomputing Symposium 2012
HCC personnel timeline Nebraska Supercomputing Symposium 2012
HCC networking timeline Nebraska Supercomputing Symposium 2012
HCC cpu timeline 900X Nebraska Supercomputing Symposium 2012
HCC storage timeline 30,000X Nebraska Supercomputing Symposium 2012
Composite Timeline Data increase/ CPU Cores = 33 Data increase/ WAN bandwidth = 150 It takes a month to move 3 PB at 10 Gb/sec Power < 100X increase, largely constant last 3 years Nebraska Supercomputing Symposium 2012
Storage at HCC Affordable, Reliable, High Performance, High Capacity –Pick 2 –So multiple options /home /work /shared Currently, no /archive Nebraska Supercomputing Symposium 2012
/home Reliable Low performance –No W from workers ZFS Rsync’ed pair, one in Omaha, one in Lincoln Backed up incrementally, requires severe quotas Nebraska Supercomputing Symposium 2012
/work High performance High(er) capacity Not permanent storage Lenient quotas More robust, more reliable “scratch space” Subject to purge as needed Nebraska Supercomputing Symposium 2012
/share Purchased by given group Exported to both Lincoln and Omaha machines Usually for capacity, striped for some reliability Nebraska Supercomputing Symposium 2012
Storage Strategy Maintain /home for precious files –Could be global Maintain /work for runtime needs –Remain local to cluster Create /share for near-line archive –3-5 year time frame (or less) –Use for accumulating intermediate data, then purge –Global access Nebraska Supercomputing Symposium 2012
Storage strategy Permanent archival has 3 options –1) library –2) Amazon glacier Currently $120/TB/year –3) tape system Nebraska Supercomputing Symposium 2012
HCC Data Visualizations Fish! HadoopViz OSG Google Earth Web-based monitoring – – Nebraska Supercomputing Symposium 2012
Other discussion topics Maui vs. SLURM Queue length policy Education approaches –This (!) –Tutuorials (next!) –Afternoon workshops –Semester courses –Individual presentations/meetings –Online materials Nebraska Supercomputing Symposium 2012
©2007 The Board of Regents of the University of Nebraska NU Administration (UNL, NRI) NSF, DOE, EPSCoR, OSG Holland Foundation CMS: Ken Bloom, Aaron Dominguez HCC: Drs. Brian Bockelman, Adam Caprez, Ashu Guru, Brad Hurst, Carl Lundstedt, Nick Palmero, Jun Wang. Garhan Attebury, Tom Harvill, Josh Samuelson, John Thiltges Chen He, Derek Weitzel